Recent Innovations in Analyzing Complex Survey Data with Missing Values
Monday, Aug 4: 8:30 AM - 10:20 AM
0167
Invited Paper Session
Music City Center
Room: CC-102B
Calibration
Missing data
Survey data
Applied
Yes
Main Sponsor
Survey Research Methods Section
Co Sponsors
Government Statistics Section
International Association of Survey Statisticians
Presentations
National statistical offices are increasingly using machine learning (ML) to improve survey estimates. ML methods help handle high-dimensional data and capture complex relationships, improving survey accuracy. In this presentation, we discuss a double/debiased ML framework for handling item nonresponse while ensuring valid statistical inference with ML-based imputation. We also present theoretical and simulation results that illustrate the framework's effectiveness across different scenarios.
Keywords
Imputation
Item nonresponse
Machine learning
Variance estimation
Doubly robust estimator
Calibrated imputation
A challenge with combining probability and nonprobability surveys is that the mechanism of selection into the nonprobability sample may depend on the variable of interest. Many existing methods to combine probability and nonprobability samples ignore this type of nonignorable selection. We postulate a quantile regression model that naturally supplies an instrumental variable. This enables us to propose a method to combine probability and nonprobability samples that accounts for the possibility of nonignorable selection.
Most treatment effect estimation studies assume missing at random (MAR), meaning that treatment selection probability depends only on observed covariates. However, in real-world applications, selection probability often depends on potential outcomes, leading to biased treatment effect estimates if not properly addressed. Instrumental variables (IVs) offer a potential solution to unobserved confounding, but identifying valid IVs and verifying their assumptions is challenging. This talk proposes an iterative estimation-solving algorithm that bypasses the IV assumption and imputes potential outcomes using semiparametric quantile regression under a missing not at random (MNAR) framework. We discuss model identification under MNAR and establish the theoretical properties of the proposed estimator, including its convergence and large-sample behavior. Simulation studies and a real-data application further validate the effectiveness of our approach.
You have unsaved changes.