Machine Learning Approaches to Identify Neonates at Risk for Post-Discharge Mortality in Dar es Sala

Chris Rees Co-Author
Emory University School of Medicine; Children’s Healthcare of Atlanta
 
Rodrick Kisenge Co-Author
Muhimbili University of Health and Allied Sciences
 
Evance Godfrey Co-Author
Muhimbili University of Health and Allied Sciences
 
Readon Ideh Co-Author
John F. Kennedy Medical Center
 
Julia Kamara Co-Author
John F. Kennedy Medical Center
 
Ye-Jeung Coleman-Nekar Co-Author
John F. Kennedy Medical Center
 
Abraham Samma Co-Author
Muhimbili University of Health and Allied Sciences
 
Hussein Manji Co-Author
Muhimbili University of Health and Allied Sciences; The Aga Khan Health Services
 
Christopher Sudfeld Co-Author
Harvard T.H. Chan School of Public Health
 
Michelle Niescierenko Co-Author
Boston Children’s Hospital; Harvard Medical School
 
Claudia Morris Co-Author
Emory University School of Medicine; Children’s Healthcare of Atlanta
 
Todd Florin Co-Author
Ann & Robert H. Lurie Children's Hospital of Chicago
 
Christopher Duggan Co-Author
Harvard T.H. Chan School of Public Health; Boston Children’s Hospital
 
Karim Manji Co-Author
Muhimbili University of Health and Allied Sciences
 
Rishikesan Kamaleswaran Co-Author
Department of Biostatistics and Bioinformatics, Duke University
 
Adrianna Westbrook First Author
Emory University
 
Adrianna Westbrook Presenting Author
Emory University
 
Wednesday, Aug 6: 9:35 AM - 9:40 AM
2311 
Contributed Speed 
Music City Center 

Description

Machine learning (ML) can increase discriminatory value in risk assessment tools compared to traditional regression. We explored the performance of ML models, compared to a previously derived logistic regression model (area under the curve [AUC]=0.77, 10 variables), for predicting all-cause mortality within 60 days post-discharge among neonates from two national referral hospitals in sub-Saharan Africa.
In a prospective cohort of 2,294 neonates (3% mortality rate), data were randomly split (80% training, 20% testing). We addressed class imbalance with Synthetic Minority Oversampling and selected variables via minimum-Redundancy maximum-Relevance. We trained random forest, XGBoost, hist gradient boosting, support vector machine (SVM), and neural network models, optimizing hyperparameters via 5-fold cross-validation.
Hist gradient, random forest, and XGBoost achieved AUCs of 0.99 with six variables. Neural network (AUC=0.97) required eight, and SVM (AUC=0.89) required 17 but was computationally heavy. ML models outperformed logistic regression (p<0.001). Selecting parsimonious, high-accuracy, low-cost models are key for feasible clinical implementation.

Keywords

Machine learning

Prediction modeling

Logistic regression

Model performance

Risk prediction 

Main Sponsor

Section on Statistics in Epidemiology