Clustering-Informed Shared-Structure Variational Autoencoder for Missing Data Imputation
Mithat Gonen
Co-Author
Memorial Sloan-Kettering Cancer Center
Yuan Chen
Co-Author
Memorial Sloan Kettering Cancer Center
Sunday, Aug 3: 3:05 PM - 3:20 PM
0943
Contributed Papers
Music City Center
Despite advancements in managing healthcare data, missing data in Electronic Health Records (EHR) and patient-reported health data remain a challenge, compromising their usability in healthcare analytics. Conventional imputation methods face limitations such as difficulties in capturing complex non-linear relationships, extended computation times, and constraints in addressing various types of missing data mechanisms. To address this, we propose the clustering-informed shared-structure variational autoencoder (CISS-VAE), building upon the powerful generative Bayesian neural networks. This model can effectively capture complex associations and accommodate various missing data mechanisms, including missing not at random (MNAR). We also develop iterative learning algorithms that further enhance missing data imputation accuracy while preventing overfitting. Comprehensive simulations demonstrate our model's superior accuracy compared to traditional and contemporary methods. We apply our method to EHR data from early-stage breast cancer patients at Memorial Sloan Kettering Cancer Center, aiming to mitigate the impact of missing data and enhance health monitoring and analyses.
Missing Data Imputation
Variational Autoencoder
Missing Not at Random
Electronic Health Records
Main Sponsor
Section on Statistics in Epidemiology
You have unsaved changes.