Recent Innovations in Analyzing Complex Survey Data with Missing Values

Darcy Steeg Morris Chair
U.S. Census Bureau
 
Jae-Kwang Kim Discussant
Iowa State University
 
Sixia Chen Organizer
 
Monday, Aug 4: 8:30 AM - 10:20 AM
0167 
Invited Paper Session 
Music City Center 
Room: CC-102B 

Keywords

Calibration

Missing data

Survey data 

Applied

Yes

Main Sponsor

Survey Research Methods Section

Co Sponsors

Government Statistics Section
International Association of Survey Statisticians

Presentations

Statistical Inference for a Finite Population Mean with Machine Learning-Based Imputation for Missing Survey Data

National statistical offices are increasingly using machine learning (ML) to improve survey estimates. ML methods help handle high-dimensional data and capture complex relationships, improving survey accuracy. In this presentation, we discuss a double/debiased ML framework for handling item nonresponse while ensuring valid statistical inference with ML-based imputation. We also present theoretical and simulation results that illustrate the framework's effectiveness across different scenarios. 

Keywords

Imputation

Item nonresponse

Machine learning

Variance estimation

Doubly robust estimator

Calibrated imputation 

Speaker

David Haziza, University of Ottawa

A Quantile Regression Approach to Combining Probability and Nonprobability Surveys Under Nonignorable Selection

A challenge with combining probability and nonprobability surveys is that the mechanism of selection into the nonprobability sample may depend on the variable of interest. Many existing methods to combine probability and nonprobability samples ignore this type of nonignorable selection. We postulate a quantile regression model that naturally supplies an instrumental variable. This enables us to propose a method to combine probability and nonprobability samples that accounts for the possibility of nonignorable selection. 

Speaker

Emily Berg

Estimating Marginal Treatment Effects Using Quantile Regression Imputation Under a Non-Ignorable Selection Assumption

Most treatment effect estimation studies assume missing at random (MAR), meaning that treatment selection probability depends only on observed covariates. However, in real-world applications, selection probability often depends on potential outcomes, leading to biased treatment effect estimates if not properly addressed. Instrumental variables (IVs) offer a potential solution to unobserved confounding, but identifying valid IVs and verifying their assumptions is challenging. This talk proposes an iterative estimation-solving algorithm that bypasses the IV assumption and imputes potential outcomes using semiparametric quantile regression under a missing not at random (MNAR) framework. We discuss model identification under MNAR and establish the theoretical properties of the proposed estimator, including its convergence and large-sample behavior. Simulation studies and a real-data application further validate the effectiveness of our approach. 

Speaker

Cindy Yu, Iowa State University