A simulation study evaluating the predictive performance of Cox proportional hazards model and machine learning methods for time-to-event data

Jenny Häggström Co-Author
Umeå University
 
Marie Eriksson Co-Author
Umeå University
 
Josline Adhiambo Otieno First Author
Umeå University
 
Josline Adhiambo Otieno Presenting Author
Umeå University
 
Sunday, Aug 3: 4:20 PM - 4:35 PM
1103 
Contributed Papers 
Music City Center 

Description

Many data-driven risk prediction models have been developed for analysing time-to-event data. However, choosing the most suitable model for accurate predictions in a specific medical application remains a challenge. Simulation enables effective comparison based on equal-sized datasets. This study provided a comprehensive evaluation of the survival prediction performance of random survival forests, eXtreme Gradient Boosting, deep neural networks (DeepSurv), and Cox proportional hazards (PH) model, using both simulated and real datasets. We assessed model performance using C-index and Integrated Brier Score. The evaluation was performed under varying sample sizes, censoring proportions, addition of noise variables, and in the presence of different types of model misspecification. All the models improved in predictive performance with larger sample sizes but declined with higher censoring and with increase in number of noise variables. Tree-based models demonstrated promising predictive performance compared to the Cox PH model and DeepSurv in the presence of misspecification and large number of noise variables. The Cox PH model performed well with larger sample sizes and fewer noise variables. It also performed well when the model was correctly specified or had only minor misspecification.

Keywords

Simulation

Machine Learning

Survival analysis

Prediction modelling

Stroke 

Main Sponsor

Section on Statistical Learning and Data Science