11/13/2025: 11:45 AM - 1:15 PM EST
Panel
Statistical innovation is often judged by novelty -- but what if we instead celebrated our methods' meaning? This session highlights the real-world motivations, challenges, and impact that drive methodological development. Rather than diving deep into equations, in this session four statisticians share the stories behind their work: why the methods were needed; what gap they fill; and how they contribute to science, policy, or public health.
This session is for anyone who believes that statistical methods are not just intellectual exercises, but tools to understand and improve the world. Talks will explore bias in risk prediction, the responsible use of predictions in research, scalable validation of electronic health records, and optimizing cancer screening programs. Each project is rooted in a real-world setting-like healthcare, clinical research, or data science practice-where thoughtful methods can lead to meaningful improvements.
Sponsored by the Caucus for Women in Statistics and Data Science, this session brings together researchers in our field to emphasize the "why" behind their statistical methods development in a celebration of our broad applications.
Statistical Applications
Electronic Health Records
Risk Prediction
Measurement Error
Cancer Screening
Machine Learning
Organizer
Lucy D'Agostino McGowan, Wake Forest University
Target Audience
Mid-Level
Tracks
Knowledge
Women in Statistics and Data Science 2025
Presentations
From applications in structural biology to the analysis of electronic health record data, predictions from machine learning models increasingly complement costly gold-standard data in scientific inquiry. While "using predictions as data" enables scientific studies to scale in an unprecedented manner, appropriately accounting for inaccuracies in the predictions is critical to achieving trustworthy conclusions from downstream statistical inference.
In this talk, I will explore the methodological and practical impacts of using predictions as data across various applications. I will introduce our recently proposed method for bias correction and draw connections with modern methods and classical statistical approaches dating back to the 1960s. I will also discuss ethical challenges of using predictions as data, underscoring the need for careful and thoughtful adoption of this practice in scientific research.
Speaker
Jesse Gronsbell
Data from Electronic health records (EHR) present a huge opportunity to operationalize a standardized whole-person health score in the learning health system and identify at-risk patients on a large scale, except they are prone to missingness and errors. Ignoring these data quality issues could lead to biased statistical results and incorrect clinical decisions. Validation of EHR data (e.g., through chart reviews) can provide better-quality data, but realistically, only a subset of patients' data can be validated and most protocols do not recover missing data. Using a representative sample of 1000 patients from the EHR at an extensive learning health system (100 of whom could be validated), we propose methods to design, conduct, and analyze statistically efficient and robust studies of the ALI and healthcare utilization. Targeted validation with an enriched protocol allowed us to ensure the quality and promote the completeness of the EHR. Findings from our validation study were incorporated into statistical models, which indicated that worse whole-person health was associated with higher odds of engaging in the healthcare system, adjusting for age.
Speaker
Sarah Lotspeich, Wake Forest University
The added value of candidate predictors for risk modelling is routinely evaluated by comparing the performance of models with or without including candidate predictors. Such comparison is most meaningful when the estimated risk is unbiased in the target population. Oftentimes, data for standard predictors in the base model is richly available from the target population, but data for candidate predictors are available only from nonrepresentative convenience samples. While the base model can be naively updated using the study data without recognizing the discrepancy between the underlying distribution of the study data and that in the target population, the resultant risk estimates and the evaluation of the candidate predictors are biased. We proposed a semiparametric method for model fitting that enables unbiased assessment of model improvement without requiring a representative sample from the target population, thereby overcoming a major bottleneck in practice. I will discuss how a data analysis project inspired this methodological effort, leading to a novel approach tailored to practical needs. I will describe how this method underpinned a recently well-scored scientific grant proposal, demonstrating how a novel statistical methodology can drive and enable innovative scientific endeavors.
Speaker
Jinbo Chen, University of Pennsylvania