40: Optimal Sampling under Class Imbalance: A Kernel-Based IPW
Estimator for Efficient Classification
Tuesday, Aug 5: 10:30 AM - 12:20 PM
2667
Contributed Posters
Music City Center
Various studies have been conducted to design classification models in situations where human error is present or where the population distribution is not precisely known. However, research explicitly addressing imbalanced data is still in its early stages. In this context, we propose a novel optimal sampling method that enhances classification performance without requiring additional data collection or sacrificing the desirable distributional properties of the classification model. Among optimal sampling methods, the Inverse ProbabilityWeighted (IPW) estimator is utilized to sub-sample more informative instances from the dataset. In particular, under imbalanced data settings, the amount of available information is more closely tied to the number of positive instances than to the total data size. Therefore, all positive instances are retained, and the negative instances are substantially reduced using a non-uniform sampling strategy, thereby improving estimation efficiency. This study derives the asymptotic distribution of the IPW estimator combined with a kernel-based method and shows that the proposed estimator is not only unbiased but also consistent. Furthermore, through extensive simulation studies and application to a real dataset, we demonstrate that the proposed method remains effective under imbalanced data and unspecified model settings. The results confirm that the proposed estimator achieves superior efficiency compared to existing methods.
Active learning
Optimal sampling
Imbalanced data
Label noise
Binary classification
Semi-supervised learning
Main Sponsor
Section on Statistical Learning and Data Science
You have unsaved changes.