S3: Speed Session 3

Conference: Women in Statistics and Data Science 2025
11/13/2025: 11:45 AM - 1:15 PM EST
Speed 

Presentations

01. A Bayesian likely responder approach for the analysis of randomized controlled trials

An important goal of precision medicine is to personalize medical treatment by identifying individuals who are most likely to benefit from a specific treatment. The Likely Responder (LR) framework, which identifies a subpopulation where treatment response is expected to exceed a certain clinical threshold, plays a role in this effort. However, the LR framework, and more generally, data-driven subgroup analyses, often fail to account for uncertainty in the estimation of model-based data-driven subgrouping. We propose a simple two-stage approach that integrates subgroup identification with subsequent subgroup-specific inference on treatment effects. We incorporate model estimation uncertainty from the first stage into subgroup-specific treatment effect estimation in the second stage, by utilizing Bayesian posterior distributions from the first stage.
We evaluate our method through simulations, demonstrating that the proposed Bayesian two-stage model produces better calibrated confidence intervals than naive approaches using state-of-the-art machine learning models. We apply our method to an international COVID-19 treatment trial, which shows substantial variation in treatment effects across data-driven subgroups. 

Presenting Author

Annan Deng, New York University School of Medicine

First Author

Annan Deng, New York University School of Medicine

02. Bayesian Tree Model for Binary and Categorical Data under Informative Sampling

Tree models are highly effective for analyzing survey data because they can manage numerous variables and the complex interactions often present in such datasets. Unlike their frequentist counterparts, Bayesian tree-based models naturally provided a measure of uncertainty to the model estimates produced. However, until recently, Bayesian design-consistent tree models that handle binary and categorical response data collected from complex sample designs, were not available. While several Bayesian tree modeling approaches have been developed for independent data, tree-based algorithms that account for the informative sample design for survey data remain lacking. Leveraging the flexibility of the Bayesian framework, we propose to extend the current research on Bayesian tree algorithms and develop tree-based models that effectively handle binary and categorical responses from survey data under informative sampling. We demonstrate our proposed models under a simulated setup, on the Consumer Expenditure Survey data and the American Community Survey data. 

Presenting Author

Diya Bhaduri, University of Missouri-Columbia

First Author

Diya Bhaduri, University of Missouri-Columbia

CoAuthor(s)

Scott Holan, University of Missouri/U.S. Census Bureau
Daniell Toth, US Bureau of Labor Statistics

03. Multi-state time-to-event modeling of Huntington Disease stage progression

Huntington's disease is a genetic neurodegenerative disorder characterized by progressive motor, cognitive, and behavioral impairments and caused by an expanded number of CAG repeats in the HTT gene. The Huntington's Disease Integrated Staging System (HD-ISS) classifies Huntington disease progression into four discrete stages using biological, clinical, and functional assessments, with stage criteria varying by age. Impairment due to Huntington disease is irreversible, so understanding and anticipating progression is critical to patient prognosis. While the HD-ISS captures patients' current disease stage, it does not compute the time to the next stage, information crucial for clinical management, patient knowledge, and future planning. To address this clinical need, we developed a statistical model to estimate the time to progression between HD-ISS stages using the PREDICT-HD study data. We employ a multi-state framework of accelerated failure time survival regression models with various distributions to analyze time-to-stage transition data. Across all stages, we incorporate genetic information and biological sex as covariates and estimate the mean time patients spend in each Huntington's disease stage. To select the final model, performance is evaluated using the area under the receiver operating characteristic curve to assess discrimination between those who will and will not transition from one stage to the next. Our selected model provides individualized estimates of stage progression timing to support clinical decision-making and patient care. 

Presenting Author

Madhuri Raman

First Author

Madhuri Raman

CoAuthor(s)

Jesus Vazquez
Sophia Cross, University of North Carolina at Chapel Hill
Yajie He
Aditya Krishnan, University of North Carolina at Chapel Hill
Dewei Lin, University of North Carolina at Chapel Hill
Sarah Lotspeich, Wake Forest University
Tanya Garcia, University of North Carolina at Chapel Hill

Withdrawn - 04. Using Generative AI to Simulate Realistic Millimeter-Wave Channel Measurements

Moving into the millimeter-wave (mmWave) wireless spectrum (30 – 300 GHz) is a critical next step for Wi-Fi, mobile devices, and many other applications that currently use sub-6 GHz bands, which are congested and do not offer the necessary data rates for future technology. Devices operating in the mmWave band, however, require advanced channel discovery (identifying paths between transmitter and receiver) and beam-steering capabilities in order to overcome the inherent limitations of these higher frequencies. Measuring mmWave channels in a lab is time-consuming, especially for time-varying channels, making it difficult to produce enough data to reliably evaluate different beam-steering algorithms or antenna designs. In this work, we explore the use of generative AI to produce synthetic channel data that could then be used in place of measured channel data. We discuss the challenges of creating realistic, dynamic mmWave channels by starting with simplistic numerical simulations to create a wide range of channel types and to add in extraneous channels that should be avoided by the beam-steering algorithm. We finish by training a generative AI algorithm on our simple simulated data, and evaluate whether this approach can produce unlimited new synthetic data with realistic properties. While the initial simulations naturally limit the dimension of the problem (perhaps to unrealistically small values), future work will improve on this and other features. 

Presenting Author

Lucas Koepke, National Institute of Standards and Technology

First Author

Lucas Koepke, National Institute of Standards and Technology

05. Overcoming Censoring in Predicting Huntington Disease Progression: a Comparative Modeling Study

Huntington disease (HD) is a genetically inherited neurodegenerative disease with progressively worsening symptoms including cognitive, psychological, and motor impairments. Accurately modeling time to HD diagnosis is essential for clinical trial design and patient treatment planning. Several statistical models have been proposed to model time to diagnosis, including Langbehn's model, the CAG-Age Product (CAP) model, the Prognostic Index Normed (PIN) model, and the Multivariate Risk Score (MRS) model. Because they differ in methodology, assumptions, and predictive accuracy, these models may yield conflicting predictions. These conflicts then create confusion for both patients and clinicians. We evaluate the theoretical foundations and empirical performance of the Langbehn, CAP, PIN, and MRS models via external validation using data from a new HD observational study, chosen for its independence from any model development, with the goal of informing model selection for future clinical trials. We begin with an assessment of each model's structure and assumptions to create a qualitative comparison of the models. We will consider practical factors such as covariate availability and model interpretability for clinical decision-making. Then, the models' empirical performance is evaluated and compared using this new HD observational study data. Performance metrics evaluating discrimination and calibration include Harrell's concordance statistic, the Brier score, calibration plots, and receiver operating characteristic curves. Although several models are available for predicting time to diagnosis, few studies have systematically compared their performance. This paper identifies methodological differences between these models and compares the models' empirical performance in terms of discrimination and calibration. Our findings aim to support and motivate the development and selection of robust models for use in clinical trial design and optimization in HD research. 

Presenting Author

Abigail Foes

First Author

Abigail Foes

CoAuthor(s)

Kyle Grosser, University of North Carolina
Stellen Li, University of North Carolina at Chapel Hill
Vraj Parikh, University of North Carolina at Chapel Hill
Tanya Garcia, University of North Carolina at Chapel Hill
Sarah Lotspeich, Wake Forest University

Withdrawn - 06. Modeling Longitudinal Microbiome Data with Application to a Melanoma Clinical Trial Evaluating Immune Checkpoint Inhibitor Treatment Response (NCT05102773)

The microbiome has increasingly been shown to play a fundamental role in human health. As microbiome research expands and microbial communities are better characterized there is growing interest in leveraging these complex data to reveal mechanistic insights and identify predictive biomarkers of disease. However, the high dimensional, and often zero-inflated nature of the data presents a unique statistical challenge – particularly in the context of longitudinal studies. Here, utilizing data from a prospective, longitudinal study of gut microbiome samples from patients with metastatic melanoma treated with immune checkpoint inhibitors (ICIs), we assess the efficacy of several zero-inflated mixed models for prediction of ICI response (RECIST at 12 weeks) and toxicity (≥ CTCAE grade 1). Gut microbiome samples from NCT05102773 (n=41, averaging 2.1 samples/patient) were processed and analyzed by metagenomic sequencing. Zero-inflated mixed models were fit on a subset of microbes using the NBZIMM R package, which supports modeling features tailored to microbiome data, including zero inflation. Both Gaussian models (with relative abundance data) and negative binomial models (with estimated raw counts) were evaluated. Model variations included the addition of a library size offset and correlation structure specification, including AR(1), CAR(1), and ARMA. Model metrics, such as MSE and AIC were calculated and used to evaluate model fit. The zero-inflated Gaussian mixed model with an independent correlation structure consistently had the lowest MSE and lowest AIC of the models tested, indicating its utility as a model for longitudinal microbiome samples, balancing accuracy and parsimony. Further work is needed to assess the model validity and utility across longitudinal microbiome datasets. 

Presenting Author

Caroline Dravillas, The Ohio State University Wexner Medical Center

First Author

Caroline Dravillas, The Ohio State University Wexner Medical Center

CoAuthor(s)

Nyelia Williams, The Ohio State University Wexner Medical Center
Marium Husain, Division of Medical Oncology, Department of Internal Medicine, The Ohio State University Comprehensi
Rebecca Hoyd, The Ohio State University Wexner Medical Center
Kari Kendra, Division of Medical Oncology, Department of Internal Medicine, The Ohio State University Comprehensi
Christin E. Burd, Department of Molecular Genetics, The Ohio State University
Daniel Spakowicz, The Ohio State University Wexner Medical Center

07. A Fast Scalable Spatiotemporal Modeling of task-fMRI Data

Accurate modeling of the blood-oxygen-level-dependent (BOLD) signal in task-based fMRI is crucial for interpreting neural activation, yet it remains challenging due to spatiotemporal variability in the hemodynamic response function (HRF). Traditional general linear models (GLMs) assume a fixed HRF across brain regions, limiting their ability to capture localized variations in amplitude, latency, and shape. While spatial smoothing and Bayesian methods improve estimation by borrowing strength across voxels, they often introduce bias, computational overhead, or depend on strict assumptions. We propose SPLASH (Spline-based Processing for Localized Adaptive Spatial Hemodynamics), a novel spatiotemporal modeling framework that integrates adaptive spatial smoothing with efficient temporal modeling of the HRF. SPLASH embeds HRF estimation within a unified spline-based regression model that flexibly adapts to both the spatial domain and local signal characteristics. By preserving cortical topology and allowing region-specific adaptiveness, SPLASH improves the accuracy of HRF amplitude estimation and enhances statistical power in detecting task-related activation. Through simulations and application to Human Connectome Project data, we demonstrate that SPLASH outperforms conventional approaches in estimation accuracy, robustness, and computational efficiency, offering a scalable solution for high-dimensional fMRI analyses. 

Presenting Author

Jungin Choi, Johns Hopkins University

First Author

Jungin Choi, Johns Hopkins University

CoAuthor(s)

Martin Lindquist, Johns Hopkins University
Abhirup Datta

08. Performance comparisons of power penalized regression against forward stepwise, lasso, and relaxed lasso

Best subset, forward stepwise, and lasso are existing methods for variable selection. The paper Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons'' by Hastie, Tibshirani, and Tibshrani (\textit{Statist. Sci.} \textbf{35(4)}, 579-592, November 2020) presents extensive simulation studies to compare these methods. The paper concludes that the relaxed lasso is the overall winner. Recently, Griffin (2023) proposed improved pathwise coordinate descent algorithms for power penalty regression, which generalized the $\ell_q$ penalty with $0

Presenting Author

Ning Duan, University of Massachusetts Amherst

First Author

Ning Duan, University of Massachusetts Amherst

CoAuthor(s)

Maryclare Griffin, University of Massachusetts Amherst
QIAN ZHAO, University of Massachusetts

09. Independent Samples T-Test vs. Mann-Whitney U Test: Strengths & Weaknesses

Often, biomedical scientists analyze right-skewed outcomes with small to moderate sample sizes using nonparametric methods. Alternative approaches, such as transformations, reduce interpretability of outcomes but may boost power in certain settings. This study compares the strengths and weaknesses of using two-sample t-tests and Mann-Whitney U tests, specifically for outcomes with right-skewed distributions. First, simulations are used to compare the power of each of these tests while also examining each test's interpretability and appropriateness in real world data. Second, real data examples will be given to demonstrate the differences in results and interpretability between methods. 

Presenting Author

Abigail Leonhard

First Author

Abigail Leonhard

10. Building Trust in AI: The Crucial Role of Human Centered Machine Learning in Safety Critical Domains

The integration of machine learning (ML) models in safety-critical domains like aviation, healthcare etc., not only present significant opportunities but also pose serious challenges. ML /AI models implemented in the safety critical domain are known to improve performance and decision-making, but they also pose unacceptable dangers in various applications.

Firstly, the presentation focuses on the need for interpretable ML models emphasizing on the reliability, safety and trustworthiness.

Secondly presentation walks you through the success of Human in loop interpretable machine learning models and their role in bridging gap between domain experts and ML. In order to achieve model transparency, the presentation will go over the techniques like feature importance analysis, visualization tools and model-agnostics methods. Examples of these techniques are LIME and SHAP. Talk will also discuss the tactics for integrating the overcoming scalability challenges and human feedback loops.

Overall, the objectives of the Lightning Talk are to deliver a thorough understanding of impact and real-time application of human-in-the-loop interpretable ML models in safety-critical environments. 

Presenting Author

Akshata Moharir

First Author

Akshata Moharir

11. Integrating Eye-Tracking into Education: A Case Study in Student Presentations

Eye-tracking technologies have become increasingly valuable in educational research, offering insight into students' attention, cognitive processing, and behavioral patterns. This poster presents a practical overview of how eye-tracking can be integrated into educational settings, with attention to workflow design, implementation challenges, and analytical
strategies. We illustrate this approach through a case study in a university-level presentation course. During students' final presentations, we recorded their eye movements using wearable eye-tracking glasses. This data is being analyzed alongside personality profiles (Big Five
Inventory) and academic performance indicators, including instructor-assigned presentation scores and students' self-reported sense of success. Our study focuses on key gaze metrics, such as fixation duration and visual attention patterns, to explore how individual differences may shape communication style and presentation outcomes. We also gathered student reflections on their experience with the technology such as how comfortable they felt, whether it affected their confidence or performance, and their willingness to use such tools again. This poster outlines the research process from equipment setup and calibration to ethical protocols, consent, and data preprocessing. We highlight strategies for interpreting gaze-based indicators and share insights into maintaining ecological validity while ensuring methodological control. Finally, we reflect on how eye-tracking data can complement qualitative and self-reported measures to provide a more nuanced understanding of student behavior. Our goal is to offer a practical framework for researchers and educators interested in using eye-tracking to support pedagogical innovation and educational insight. 

Presenting Author

Samyukta Vakkalanka, Georgetown University

First Author

Samyukta Vakkalanka, Georgetown University

12. Development of Autonomous Navigation and Strategic Decision-Making in Robotic Soccer

Robotic soccer is an ideal testbed for integrating autonomous navigation, deep learning, and adaptive motion control in dynamic environments. This research aims to develop an advanced robotic soccer system based on the JetHexa hexapod robot, powered by the NVIDIA Jetson Nano B01 and operating on ROS. JetHexa leverages a suite of cutting-edge technologies-including mainstream deep learning frameworks (You Only Look Once (YOLO) model training, MediaPipe development, and TensorRT acceleration) alongside a 3D depth camera and Lidar sensor-to deliver high-precision 2D mapping, Real-Time Appearance-Based Mapping (RTAB)-3D mapping navigation, multi-point navigation, TEB path planning, and dynamic obstacle avoidance. The research is structured around three specific aims: 1) Autonomous Navigation Enhancement: to develop robust SLAM algorithms that exploit the combined data from the 3D depth camera and Lidar sensor to produce high-resolution field maps, thereby ensuring precise localization, reliable multi-point navigation, and effective dynamic obstacle avoidance in a rapidly changing soccer environment; 2) Deep Learning-Based Strategic Decision-Making: to integrate and optimize deep learning models that enable real-time detection and classification of critical game elements-such as the soccer ball, goals, and opposing players-using frameworks like YOLO and MediaPipe accelerated by TensorRT, thus facilitating intelligent and context-aware decision-making during gameplay;
and 3) Adaptive Motion Control and Kinematics: to implement advanced inverse kinematics algorithms that support dynamic gait switching between tripod and ripple patterns and enable adaptive motion control, including specialized maneuvers like moonwalking, through the fine-tuning of parameters such as pitch, roll, direction, speed, height, and stride to maintain optimal stability and maneuverability across variable soccer field terrains. 

Presenting Author

Ariana Mondiri, Creighton University

First Author

Ariana Mondiri, Creighton University

CoAuthor

Steven Fernandes, Creighton University

Withdrawn - 14. Approximate Bayesian classifier for high-dimensional data

The Bayes classifier provides a way of performing probabilistic classification using a posterior distribution of the outcome given predictors. When there are many predictors, however, the need for estimating the high-dimensional covariance matrix, which yields prohibitively heavy computational cost, makes its applicability limited. A naive Bayes classifier is a popular alternative for handling high-dimensional data since its assumption on the conditional independence of features given class eases the burden associated with the high-dimensional covariance matrix. Despite its computational efficiency, naive Bayes classifier leads to poor statistical performance when the features are correlated, a case commonly observed in real-world data. To address such issue, we propose a new Bayesian classifier called the approximate Bayesian classifier. Our method is based on the Vecchia approximation that has played a crucial role in dimension reduction in recent Bayesian spatial modeling. To adapt the Vecchia approach for spatial modeling into a classification framework, we define the concept of neighborhood, which lies at the core of the Vacchia approximation, using the relationship between the coefficients and the correlations under the normality assumption. The performance of our proposed method is investigated via numerical studies. 

Presenting Author

Jieun Lee, Kyungpook National University, South Korea

First Author

Jieun Lee, Kyungpook National University, South Korea

Withdrawn - 15. Introducing the Migration Archetype Model (MAM): A Data-Driven Framework for Disaster-Induced Migration Analysis in the U.S. Virgin Islands

The U.S. Virgin Islands (USVI), a U.S. territory in the Caribbean, experienced a significant 20% population decline between 2010 and 2020. Much of this shift was driven by outmigration following three major disruptive events: the 2012 closure of the HOVENSA oil refinery (economic disaster), the 2017 Category 5 Hurricanes Irma and Maria (natural disaster), and the COVID-19 pandemic beginning in 2020 (biological disaster). In response to these events and their layered impact on labor, family, and cultural structures, we developed the Migration Archetype Model (MAM)-a multidisciplinary framework for analyzing disaster-induced migration in isolated, non-American Community Survey jurisdictions like the USVI.
MAM draws on decennial census data, labor studies, and microdata from the American Community Survey (ACS) to characterize migration flows by demographic traits, household composition, educational attainment, and employment sector. The model uses a symbolic structure grounded in human anatomy-head, torso, seat, and feet-to frame the impacts of migration across key dimensions such as brain drain, labor force disruption, family fragmentation, and sustainability.
This poster introduces MAM to a national research audience and presents its application across three distinct periods of migration. We show how ACS microdata, particularly Place of Birth and Year of Entry, can be effectively leveraged to analyze intra-national migration in territories excluded from the ACS sampling frame. Results demonstrate distinct migration archetypes by gender, age, race, and occupational background, with varied implications across disaster types. Finally, we discuss how MAM can inform public policy, economic recovery planning, and statistical modeling of migration trends in disaster-prone or geographically isolated regions. 

Presenting Author

Ayishih Bellew

First Author

Ayishih Bellew

CoAuthor

Lawanda Cummings, University of the Virgin Islands Eastern Caribbean Center