Boosted Pseudo-Weighting for Nonprobability Samples to Improve Population Inference

Yan Li Speaker
University of Maryland, College Park
 
Sunday, Aug 3: 5:05 PM - 5:25 PM
Topic-Contributed Paper Session 
Music City Center 
Nonprobability samples have rapidly emerged to address time-sensitive priority topics in various fields. While these data are timely, they are prone to selection bias. To mitigate selection bias, a wide body of literature in survey research has explored the use of propensity-score (PS) adjustment methods to enhance the population representativeness of nonprobability samples, using probability-based survey samples as external references. A recent advancement, the 2-step PS-based pseudo-weighting adjustment method (2PS by Li 2024), has been shown to improve upon recent developments with respect to mean squared error. However, the effectiveness of these methods in reducing bias critically depends on the ability of the underlying
propensity model to accurately reflect the true (self-)selection process, which is challenging with
parametric regression. In this study, we propose a set of pseudo-weight construction methods, 2PS-ML, which utilize both machine learning (ML) methods (to estimate PSs) and 2PS (to construct pseudo-weights based on the ML-estimated PSs), offering greater flexibility compared to logistic regression-based methods. We compare the proposed 2PS-ML pseudoweights, based on gradient boosting, with existing methods including 2PS. The proposed methods are evaluated numerically via simulation studies and empirically using the naïve unweighted National Health and Nutrition Examination Survey III sample, while taking the 1997 National Health Interview Survey as the reference, to estimate various health outcomes.