Evaluating two-sample tests for differences in survival in the presence of long-term survivors

Subodh Selukar Co-Author
 
Yu Bi First Author
 
Yu Bi Presenting Author
 
Tuesday, Aug 5: 11:20 AM - 11:35 AM
1927 
Contributed Papers 
Music City Center 
Time-to-event data with long-term survivors (L-TS), subjects who never experience the event, occur in diverse fields (e.g., cancer, credit default risk, recidivism). Conventional two-sample tests (e.g., log-rank test [LR]) ignore L-TS, and several alternatives exist, but they have not been comprehensively compared. We compared 7 methods via simulation: LR, three weighted log-rank tests (WLR), two adaptive tests (two-stage or Yang-Prentice [YP]), and a correctly specified parametric model. We assessed the impact of sample size and follow-up time on type I error and power across varying effect sizes. When one or both groups lack L-TS, the LR, WLR and YP typically have the highest power, but order varies. When both groups have L-TS, these tests have non-monotonic power as a function of follow-up time, but parametric models have monotonic increasing power and the highest power at the longest follow-up time. Patterns are consistent across sample sizes. We explain non-monotonicity by differential deviation from proportional hazards depending on follow-up time. This impacts study planning in the setting of L-TS, as naïve use of conventional LR can have counterintuitive properties.

Keywords

Survival analysis

Long-term survivors

Log-rank test

Mixture cure model

Adaptive tests 

Main Sponsor

Biometrics Section