Selection Bias Correction for Imbalanced Samples

An-Chiao Liu Speaker
Utrecht University
 
Thursday, Aug 7: 8:30 AM - 10:20 AM
Topic-Contributed Paper Session 
Music City Center 
Selection bias correction is often applied with the two-sample setup. That is, along with the nonprobability sample we are interested in, a probability sample sharing some common auxiliary variables is used for constructing correction weights for the nonprobability sample. The two-sample setup allows one to calculate weighted estimates for population parameters of interest based on the nonprobability sample. Since the nonprobability sample is usually easy to collect, we often have a large nonprobability sample and a small probability sample. The imbalance of the two samples may cause difficulties in modeling the propensity of units to be included in the nonprobability sample. This presentation discusses some often-seen solutions for imbalanced samples in machine learning literature, i.e., undersampling, Synthetic Minority Oversampling Technique (SMOTE), and a mixture of both. A selection bias correction framework is adjusted to incorporate the imbalance solutions.

Keywords

TBD