Print Close

Machine Learning Approaches to Identify Neonates at Risk for Post-Discharge Mortality in Dar es Sala

Presented During: SPEED 7: Biostatistics and Applied Statistics, Part 1

Chris Rees Co-Author
Emory University School of Medicine; Children’s Healthcare of Atlanta

Rodrick Kisenge Co-Author
Muhimbili University of Health and Allied Sciences

Evance Godfrey Co-Author
Muhimbili University of Health and Allied Sciences

Readon Ideh Co-Author
John F. Kennedy Medical Center

Julia Kamara Co-Author
John F. Kennedy Medical Center

Ye-Jeung Coleman-Nekar Co-Author
John F. Kennedy Medical Center

Abraham Samma Co-Author
Muhimbili University of Health and Allied Sciences

Hussein Manji Co-Author
Muhimbili University of Health and Allied Sciences; The Aga Khan Health Services

Christopher Sudfeld Co-Author
Harvard T.H. Chan School of Public Health

Michelle Niescierenko Co-Author
Boston Children’s Hospital; Harvard Medical School

Claudia Morris Co-Author
Emory University School of Medicine; Children’s Healthcare of Atlanta

Todd Florin Co-Author
Ann & Robert H. Lurie Children's Hospital of Chicago

Christopher Duggan Co-Author
Harvard T.H. Chan School of Public Health; Boston Children’s Hospital

Karim Manji Co-Author
Muhimbili University of Health and Allied Sciences

Rishikesan Kamaleswaran Co-Author
Department of Biostatistics and Bioinformatics, Duke University

Adrianna Westbrook First Author
Emory University

Adrianna Westbrook Presenting Author
Emory University

Wednesday, Aug 6: 9:35 AM - 9:40 AM
2311
Contributed Speed

Music City Center

Presentation

Description

Machine learning (ML) can increase discriminatory value in risk assessment tools compared to traditional regression. We explored the performance of ML models, compared to a previously derived logistic regression model (area under the curve [AUC]=0.77, 10 variables), for predicting all-cause mortality within 60 days post-discharge among neonates from two national referral hospitals in sub-Saharan Africa.
In a prospective cohort of 2,294 neonates (3% mortality rate), data were randomly split (80% training, 20% testing). We addressed class imbalance with Synthetic Minority Oversampling and selected variables via minimum-Redundancy maximum-Relevance. We trained random forest, XGBoost, hist gradient boosting, support vector machine (SVM), and neural network models, optimizing hyperparameters via 5-fold cross-validation.
Hist gradient, random forest, and XGBoost achieved AUCs of 0.99 with six variables. Neural network (AUC=0.97) required eight, and SVM (AUC=0.89) required 17 but was computationally heavy. ML models outperformed logistic regression (p<0.001). Selecting parsimonious, high-accuracy, low-cost models are key for feasible clinical implementation.

Keywords

Machine learning

Prediction modeling

Logistic regression

Model performance

Risk prediction

Main Sponsor

Section on Statistics in Epidemiology