Monday, Aug 4: 2:00 PM - 3:50 PM
0171
Invited Paper Session
Music City Center
Room: CC-104E
Machine Learning
Propensity Score
Survey Data
Applied
Yes
Main Sponsor
Social Statistics Section
Co Sponsors
Section on Statistical Learning and Data Science
Survey Research Methods Section
Presentations
Unit nonresponse is a frequent issue in sample surveys, and naive estimates that do not account for nonrespondents can result in biased outcomes. Common nonresponse adjustment techniques, such as logistic regression and tree-based methods, rely on specific model assumptions that may not hold true, particularly when dealing with highly non-linear and high-dimensional nonresponse mechanisms. In contrast, deep neural network methods have demonstrated effectiveness in managing such complexities. In this paper, we propose the application of deep neural networks for nonresponse adjustment in complex survey data. We compare our approach with established methods, including logistic regression, generalized additive models, and tree-based techniques, through both simulation studies and real-world applications. Our results highlight the advantages of deep neural networks in improving the accuracy of nonresponse adjustments.
Keywords
Machine learning
Deep learning
Nonresponse
Survey Data
​Declining response rates and data collection interruptions are resulting in missing data complexity that traditional missing data techniques used in Census Bureau survey processing may not flexibly capture. At the same time, availability and linkability of administrative records and third party data has improved allowing for more informative response propensity models. We present a study of inverse probability weighting (IPW) to adjust for unit nonresponse using traditional statistical models (non-ML) and machine learning (ML) algorithms adapted for complex survey data. We share various measures for model comparisons and for visualizing geographically-differentiated results. This work presents a case study of the value and advantage of ML and non-ML model-based IPW nonresponse adjustment using auxiliary sources with multiple years of American Community Survey data.
Keywords
missing data
nonresponse
survey data
boosting
mapping visualizations
In recent years, there has been a significant interest in machine learning in national statistical ones. Thanks to their flexibility, these methods may prove useful at the nonresponse treatment stage.
After an introduction to statistical learning procedures, we will discuss some of the advantages and challenges associated with their use for the treatment of unit nonresponse. We will discuss the relationship between precise predictions and precise estimation. The results of an extensive simulation study will be presented to illustrate these points. Finally, the problem of selecting or aggregating several statistical learning procedures will be discussed.
Keywords
Survey sampling
Nonresponse
Propensity score estimation
Aggregation