Surrogate-powered Regularized Estimation: Semi-Supervised Modeling with Multi-Wave Sampling
Yong Chen
Co-Author
University of Pennsylvania, Perelman School of Medicine
Jianmin Chen
First Author
University of Pennsylvania, Perelman School of Medicine
Jianmin Chen
Presenting Author
University of Pennsylvania, Perelman School of Medicine
Tuesday, Aug 5: 12:05 PM - 12:20 PM
2269
Contributed Papers
Music City Center
Surrogate-powered modeling is an emerging approach in semi-supervised learning that improves statistical efficiency by integrating large-scale unlabeled data with a small labeled dataset using multiple surrogate outcomes. This framework is particularly useful in risk modeling with electronic health records (EHR), where gold-standard outcomes are limited due to costly chart reviews, while algorithm-generated surrogates are widely available. Key challenges include effectively combining labeled and unlabeled data with multiple surrogates and designing efficient sampling rules for chart reviews. To address these, we propose a multi-wave sampling strategy to adaptively approximate the optimal sampling rule and introduce a novel semi-supervised estimator with first-order bias correction and sparse regularization to reduce estimation errors. The estimator is asymptotically normal, unbiased, and improves statistical efficiency. Extensive numerical studies demonstrate its effectiveness in reducing mean-squared error.
EHR data
semi-supervised learning
surrogate regression
bias-reduction
Main Sponsor
Section on Statistics in Epidemiology
You have unsaved changes.