Toward Well-Calibrated Risk Estimation with Biased Training Data
Conference: Women in Statistics and Data Science 2025
11/13/2025: 11:45 AM - 1:15 PM EST
Panel
The added value of candidate predictors for risk modelling is routinely evaluated by comparing the performance of models with or without including candidate predictors. Such comparison is most meaningful when the estimated risk is unbiased in the target population. Oftentimes, data for standard predictors in the base model is richly available from the target population, but data for candidate predictors are available only from nonrepresentative convenience samples. While the base model can be naively updated using the study data without recognizing the discrepancy between the underlying distribution of the study data and that in the target population, the resultant risk estimates and the evaluation of the candidate predictors are biased. We proposed a semiparametric method for model fitting that enables unbiased assessment of model improvement without requiring a representative sample from the target population, thereby overcoming a major bottleneck in practice. I will discuss how a data analysis project inspired this methodological effort, leading to a novel approach tailored to practical needs. I will describe how this method underpinned a recently well-scored scientific grant proposal, demonstrating how a novel statistical methodology can drive and enable innovative scientific endeavors.
Speaker
Jinbo Chen, University of Pennsylvania
You have unsaved changes.