Missing data imputation via truncated Gaussian factor analysis with application to metabolomics data
Monday, Aug 4: 11:45 AM - 11:50 AM
2248
Contributed Speed
Music City Center
In metabolomics, which involves the study of small molecules in biological samples, data are often acquired via mass spectrometry, resulting in high-dimensional, highly correlated datasets with frequent missing values. Both missing at random (MAR), due to acquisition or processing errors, and missing not at random (MNAR), often caused by values falling below detection thresholds, are common. Imputation is thus a critical component of downstream analysis. We propose a novel Truncated Gaussian Infinite Factor Analysis (TGIFA) model to address these challenges. By incorporating truncated Gaussian assumptions, TGIFA respects the physical constraints of the data, while the use of an infinite latent factor framework eliminates the need to pre-specify the number of factors. Our Bayesian inference approach jointly models MAR and MNAR mechanisms and, via a computationally efficient exchange algorithm, provides posterior uncertainty quantification for both imputed values and missingness types. We evaluate TGIFA through extensive simulation studies and apply it to a urinary metabolomics dataset, where it yields sensible and interpretable imputations with associated uncertainty estimates.
Missing data
Metabolomics
Imputation
Infinite factor model
Mass spectrometry data
Main Sponsor
Section on Physical and Engineering Sciences
You have unsaved changes.