A Unified Framework for Semiparametrically Efficient Semi-supervised Learning
Zichun Xu
Speaker
University of Washington, Department of Biostatistics
Tuesday, Aug 5: 10:35 AM - 10:50 AM
Topic-Contributed Paper Session
Music City Center
We consider statistical inference under a semi-supervised setting where we have access to both a labeled dataset and an unlabeled dataset . We ask the question: under what circumstances, and by how much, can incorporating the unlabeled dataset improve upon inference using the labeled data? To answer this question, we investigate semi-supervised learning through the lens of semiparametric efficiency theory. We characterize the efficiency lower bound under the semi-supervised setting for an arbitrary inferential problem, and show that incorporating unlabeled data can potentially improve efficiency if the parameter is not well-specified. We then propose two types of semi-supervised estimators: a safe estimator that imposes minimal assumptions, is simple to compute, and is guaranteed to be at least as efficient as the initial supervised estimator; and an efficient estimator, which --- under stronger assumptions --- achieves the semiparametric efficiency bound. Our findings unify existing semiparametric efficiency results for particular special cases, and extend these results to a much more general class of problems. Moreover, we show that our estimators can flexibly incorporate predicted outcomes arising from "black-box" machine learning models, and thereby achieve the same goal as prediction-powered inference (PPI), but with superior theoretical guarantees. We also provide a complete understanding of the theoretical basis for the existing set of PPI methods. Finally, we apply the theoretical framework developed to derive and analyze efficient semi-supervised estimators in a number of settings, including M-estimation, U-statistics, and average treatment effect estimation, and demonstrate the performance of the proposed estimators in simulation.
Semi-supervised learning
Influence function
Nonparametric regression
Prediction-powered inference
Black-box machine learning model
You have unsaved changes.