Advanced Multiple Imputation for Missing Sociodemographic and Laboratory Characterization Data in Infectious Disease Surveillance
Soyoun Park
Speaker
Centers for Disease Control and Prevention
Yunmi Chung
Co-Author
Centers for Disease Control and Prevention
Stephen Mugel
Co-Author
Centers for Disease Control and Prevention
Melissa Arvay
Co-Author
Centers for Disease Control and Prevention
Monday, Aug 3: 2:35 PM - 2:50 PM
2590
Contributed Papers
Thomas M. Menino Convention & Exhibition Center
CDC's Active Bacterial Core surveillance (ABCs) monitors invasive bacterial diseases among about 45.9 million people across 10 U.S. sites. A challenge in ABCs is missing sociodemographic (e.g., race) and laboratory characterization (e.g., bacterial subtypes) data. Non-random missing data can bias stratified disease estimates. Therefore, we developed a multi-step multiple imputation approach utilizing random forest models to capture complex predictor interactions, mitigate multicollinearity, and account for hierarchical structure of laboratory characteristics. The approach leverages sociodemographic and clinical characteristics to enhance imputation under non-random missingness. We further proposed a decision tree-based framework to characterize the complex missing data mechanisms inherent in ABCs, conducted simulations, and assessed multiple metrics of imputation accuracy. Results demonstrate improved precision and validity of imputed demographic data, leading to more reliable estimates; the race misclassification rate was only 2.4%, despite approximately 20% missingness. This framework can be broadly applicable to public health surveillance systems with non-random missing data.
Multiple imputation
Missing not at random (MNAR)
Hierarchical data structure
Infectious disease surveillance
Main Sponsor
Section on Statistics in Epidemiology
You have unsaved changes.