34: The Use of Multiple Imputation for Missing Data in A Health-Related Study

Bin Ge First Author
University of Missouri-Columbia
 
Bin Ge Presenting Author
University of Missouri-Columbia
 
Tuesday, Aug 5: 10:30 AM - 12:20 PM
1135 
Contributed Posters 
Music City Center 
Multiple imputation of missing data has been an active area of statistics research before the big data era. In this project, we study the use of multiple imputation approach to a health-related data set with eight identified variables with data missing rates from 0 to 16%. We conducted multiple imputations (simple random) on this data set.
Furthermore, to investigate the use of multiple imputation in a variety of missing data structures and missing data rates, we generated incomplete data sets from the complete data set obtained from the health-related data. The generated incomplete data sets were analyzed with logistic regression by using multiple imputation to handle missing data. The results of regression analysis on those incomplete data sets were compared with the one obtained from analysis of complete data set. Our results suggest that estimation using five imputations is similar to those using 100 imputations with the logistic regression analysis. Our results indicate that the missing data has substantial
influence on coefficients, odds ratios, and p-values in logistic regression analysis, especially when the missing rate is high. In such cases, even with multiple imputati

Keywords

Missing data

multiple imputation

simulation study

logistic regression

Health-related study 

Main Sponsor

Section on Statistical Computing