Application of machine learning methods in the imputation of heterogeneous co-missing data
Monday, Aug 4: 10:35 AM - 10:50 AM
2241
Contributed Papers
Music City Center
Ordinary imputation methods may not be able to handle heterogeneous co-missing data, such as the lung function measures from the spirometry test in population-based studies. This work aims to review and evaluate various statistical and machine learning imputation methods for estimating the prevalence of impaired lung function, such as chronic obstructive pulmonary disease (COPD), using data from public surveys on aging studies. Unsupervised learning (clustering) methods improve multiple imputations. The k-prototype method outperforms DBSCAN as it can handle categorical data more effectively. Direct imputations based on the predicted values of random forests and artificial neural networks are unsatisfactory. When combined with multiple imputations, the k-prototype clustering method appears to be the most suitable one for imputing missing spirometry values. Even if the imputation functions are not the same as those used in simulation, the k-prototype method can improve the estimates of the MI methods.
co-missing
heterogeneous
multiple imputations
machine learning
Main Sponsor
Section on Statistical Learning and Data Science
You have unsaved changes.