A Maximin Optimal Approach for Sampling Designs in Two-phase Studies
Sunday, Aug 3: 4:05 PM - 4:25 PM
Topic-Contributed Paper Session
Music City Center
Data collection costs can vary widely across variables in data science tasks. Two-phase designs are often employed to save data collection costs. In two-phase studies, inexpensive variables are collected for all subjects in the first phase, and expensive variables are measured for a subset of subjects in the second phase based on a predetermined sampling rule. The estimation efficiency under two-phase designs relies heavily on the sampling rule. Existing literature primarily focuses on designing sampling rules for estimating a scalar parameter in some parametric models or specific estimating problems. However, real-world scenarios are usually model-unknown and involve two-phase designs for model-free estimation of a scalar or multi-dimensional parameter. In this talk, we present a maximin criterion to design an optimal sampling rule based on semiparametric efficiency bounds. The proposed method is model-free and applicable to general estimating problems. The resulting sampling rule can minimize the semiparametric efficiency bound when the parameter is scalar and improve the bound for every component when the parameter is multi-dimensional. Simulation studies demonstrate that the proposed designs reduce the variance of the resulting estimator in various settings. The implementation of the proposed design is illustrated in a real data analysis.
Cost-effective sampling
Efficient influence function
Incomplete data
Semiparametric efficiency
Subsample
You have unsaved changes.