S1: Speed Session 1

Conference: Women in Statistics and Data Science 2025
11/12/2025: 3:00 PM - 4:30 PM EST
Speed 

Presentations

01. Fake Review Detection on Amazon using Deep Neural Network

In the era of e-commerce dominance, an increase in fake reviews on online shopping platforms compromises the integrity of consumer feedback systems. This study focuses on Amazon, a leading e-commerce platform in the United States, where fake reviews have become a significant concern. Given the limited availability of authentic datasets for analysis, we propose a novel methodology to differentiate between genuine and fraudulent reviews across verified and non-verified purchases. Our approach utilizes the bootstrap distribution of cosine similarity values, providing a robust statistical foundation for review classification. We present a comprehensive framework integrating Convolutional Neural Networks with word embedding and emotion-mining techniques through Natural Language Processing. Our method demonstrates exceptional performance, achieving an accuracy rate of over 96% in distinguishing fake reviews from user reviews. This research aims to foster trust in online marketplaces and protect consumers from misleading information by providing a powerful tool for fake review detection. 

Presenting Author

J.M. Thilini Jayasinghe, University of Dayton

First Author

J.M. Thilini Jayasinghe, University of Dayton

CoAuthor

Sachith Dassanayaka, Wittenberg University

02. Association between telemedicine or in-person follow-up visits and return hospitalization

When a patient is discharged from the emergency department (ED), it is crucial and often advised that patients seek follow-up care promptly to ensure recovery and prevent return to the hospital, especially for patients with acute exacerbation of chronic illness. Along with in-person visits, telemedicine is used as a means to deliver outpatient care, especially after the widespread adoption of telemedicine during Covid-19. The aim of the study is to determine if telemedicine or in-person follow-up visits is associated with differences in return hospitalization by using multivariable logistic regression and time-to-event model. The time of exposure, that is, telemedicine or in-person visit, is considered in two different ways in these two models – in the multivariable logistic regression time is measured discretely by week in which the follow-up visit occurred and in the time-to-event model time-varying exposure is considered.  

Presenting Author

Angira Mondal, University of Pennsylvania

First Author

Austin S. Kilaru, University of Pennsylvania

CoAuthor(s)

Angira Mondal, University of Pennsylvania
Sophia Jesteen, University of Pennsylvania
Dane Isenberg, University of Pennsylvania
Hashem Zikry, University of California, Los Angeles
Zachary F. Meisel, University of Pennsylvania

Withdrawn - 03. The Impact of Neighborhood Food Insecurity on Type-2 Diabetes Prevalence Amid Measurement Error

Disparities in healthy eating relate to disparities in well-being, which leads to disproportionate rates of diseases like type-2 diabetes in communities that face more challenges in accessing nutritious food. These challenges can be driven by individual- and neighborhood-level factors, like a person's distance from home to the nearest grocery store or the socioeconomic status of their community, respectively. Quantifying these disparities is key to developing targeted interventions, and there are limitations with the currently available methods and data that we are working to resolve. Namely, available data on disease rates are usually aggregate, which smooths over details about the individuals and communities within them. Further, aggregate disease data often comprise small area estimates, which carry additional uncertainty. In this project, we investigate the relationship between patients' food environment and the risk of diabetes using individual-level data from electronic health records at a large academic medical center. While this project used various health disparities methods and measures, this presentation will focus on quantifying whether patients with more food insecure households in their neighborhood face a higher burden of prevalent type-2 diabetes. Still, we face measurement error in the food environment variable (food insecurity) since they are collected using inaccurate distance calculations and survey data. Finally, we discuss the impact of using error-prone food environment measures to detect health disparities in these data. 

Presenting Author

Darcy Green, University of Chicago

First Author

Cassandra Hung, Wake Forest University

CoAuthor(s)

Darcy Green, University of Chicago
Sarah Lotspeich, Wake Forest University

04. ComBat-Predict Improves Generalizability of Traditional and Normative Cortical Thickness Modeling to a New Site

Neuroimaging is vital for the screening of atypical brain development and the diagnosis of neurodegenerative diseases at an early stage. To collect large samples necessary to model lifespan brain development, research consortiums aggregate images acquired across multiple study sites. Previous studies have demonstrated that this multi-site study design can lead to site-related bias, necessitating harmonization of these "site effects". However, current methodologies are unable to generalize to new sites outside the original harmonized sample, limiting translation to new sites or clinical practice. Here, we propose a method called ComBat-Predict (CB-Predict) extending the ComBat method for site effect adjustment, which extends to data from a new site with smaller sample sizes and unknown site effects. In data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), our proposed method mitigates bias in predicting cortical thickness measures when generalizing the model to new data. Furthermore, we demonstrate that our proposed harmonization method can reduce site-related variance in centile scores estimated using data from the Lifespan Brain Chart Consortium (LBCC). Altogether, our results demonstrate that CB-Predict effectively harmonizes new sites and thereby enables effective translation of neuroimaging models to additional samples. 

Presenting Author

Yao Xin, Medical University of South Carolina

First Author

Yao Xin, Medical University of South Carolina

CoAuthor(s)

Andrew Chen
Aaron Alexander-Bloch, University of Pennsylvania

05. The Role of Peer Networks and Demographic Factors in Adolescent Binge Drinking: A Cross-Sectional Study of Southern California High Schools

Background: While adolescent binge drinking has been on the decline, it remains a public health concern with adverse mental, physical, social, and academic outcomes. Given peers' crucial role in shaping behavior, it is important to examine demographic factors and peer network influence on binge drinking. Methods: Cross-sectional data were collected from 11 high schools in Southern California (N = 2,769). Friendship networks were mapped for each school, and network exposure variables were created based on the proportion of friends who engaged in binge drinking in the past 30 days. Logistic regression models were used to assess the relationships between gender, sexual identity, age, Hispanic ethnicity, friendship nominations, and exposure to friends who were binge drinking in the past 30 days on the likelihood of an individual engaging in binge drinking in the past 30 days. Results: Males were significantly less likely to participate in binge drinking compared to females (AOR = 0.56, p < 0.01), while Hispanic/Latinx students were 2.33 (p < 0.001) times more likely to engage in binge drinking compared to non-Hispanic ones, controlling for social network and demographic covariates. Adolescents identifying as sexual minorities showed a marginally significantly higher likelihood of binge drinking compared to heterosexual peers (AOR = 1.41, p < 0.10). Network exposure to binge drinking among friends was significantly associated with individual binge drinking in the past 30 days. Adolescents with all friends who binge drink were 3.13 (p < 0.05) times more likely to engage in binge drinking in the past 30 days compared to those with no friends who binge drink. Conclusions: These results highlight the importance of considering demographic factors and peer networks in understanding adolescent binge drinking behaviors. Interventions should address peer influence and demographic vulnerabilities, particularly among females, Hispanic/Latinx students, and sexual minorities. 

Presenting Author

Kristina Miljkovic, University of Southern California

First Author

Kristina Miljkovic, University of Southern California

CoAuthor(s)

Sarah Piombo, Department of Medical Oncology, Dana-Farber Cancer Institute/Harvard Cancer Center
Jessica Barrington-Trimis, University of Southern California
Thomas Valente, University of Southern California

06. Improved Inventory Distributions for Los Alamos National Laboratory's Area G Performance Assessment and Composite Analysis Modeling

Numerical models are essential in many fields, including radioactive waste management, but they often involve significant uncertainty. Practitioners may address this by (i) building increasingly complex models or (ii) accepting and characterizing uncertainty. Inventory modeling at Los Alamos National Laboratory's Material Disposal Area G (Area G) illustrates how the second approach can enhance analyses and reduce costs.

Area G is a low-level radioactive waste disposal facility comprising pits and shafts. It is subject to U.S. DOE Order 435.1, which requires performance and composite analyses (PA/CA). Current modeling uses triangular distributions to represent uncertainty in radionuclide inventories-defined by a minimum, mode, and maximum. Typically, the mode equals the inventory estimate, with multipliers of 0.1 and 10 for the minimum and maximum, respectively. This strongly right-skewed distribution results in a median over three times the input estimate, potentially inflating projected dose.

To address this, gamma and normal distributions were implemented to better align the modeled median with the input estimate while preserving the high-end uncertainty. These distributions offer more accurate central tendencies without underrepresenting upper bounds.

Initial PA/CA model results using gamma/normal distributions show reduced total radiological dose compared to the triangular case: roughly three-to-two-fold for shafts and up to fourfold for pits. Lower projected doses could justify a thinner cover at site closure, reducing the need for truckloads of material and associated construction. Thus, revising the inventory modeling approach may lead to significant taxpayer savings while maintaining analytical integrity. 

Presenting Author

Emmie Boettcher, Neptune and Company, Inc.

First Author

Emmie Boettcher, Neptune and Company, Inc.

07. Impacts of Migration on Coastal Adaptation Costs and Damages

Climate change is causing increasing coastal hazards around the world, including storm surge and sea level rise. Coastal adaptation models are useful tools for assessing the economic effects of these hazards on coastal regions. Migration is an important factor to consider in this context, as it is a human response to protect themselves against sea level rise. However, models´ ability to implement human migration in response to these environmental hazards remains limited, despite the growing importance of migration as a strategy to manage the risks posed by coastal flooding. To address this, we developed a model for county-level migration using U.S. Census data from 2016 to 2020 in the MimiCIAM, coastal impact and adaptation model, for the state of Florida. Additionally, we implement a fixed or flexible adaptation strategy. Our goal is to examine the uncertainties and sensitivities associated with migration-based adaptation strategies. Here, we show how migration is a critical feature of coastal adaptation efforts over the next century, enabling significant change in overall adaptation costs and damages. Previous studies have often viewed migration as an external factor or considered it a retreat. Our findings show that integrating migration into adaptation models can lead to a reduction in adaptation costs of up to $713 million for the entire coast of Florida. Moreover, while people prefer to migrate to larger cities like Miami and Hollywood, the results indicate that there are greater benefits to migrating to inland areas. These findings underscore the need to integrate dynamic human behavior into climate impact models, particularly in areas where coastal hazards are most severe. Forward migration modeling within the climate impact framework, our research contributes to more accurate projections of future population distributions and resource needs. 

Presenting Author

Carolina Estevez Loza, Rochester Institute of Tecnology

First Author

Carolina Estevez Loza, Rochester Institute of Tecnology

CoAuthor

Tony E. Wong, Rochester Institute of Technology

08. Empowering Environmental Education Through a Data Visualization Tool

Environmental health disparities in urban areas like Chicago present both a challenge and an opportunity for science education. In response, a growing effort has emerged to engage K–12 teachers and undergraduate students in hands-on environmental monitoring and data-driven learning. This project aims to build capacity for teaching air quality science in classrooms using low-cost mobile sensors such as AirBeam3, which measure fine particulate matter (PM2.5) in real time.

Students and teachers have used AirBeam3 sensors to study the relationships between air quality and public health by collecting and analyzing PM2.5 in their own communities. To support the interpretation of sensor data, often outputted in complex CSV formats, we developed an interactive data visualization dashboard that translates raw data into accessible, engaging visuals suitable for middle, high school, and undergraduate learners. This tool enhances both environmental science and data literacy education by making abstract concepts locally relevant and actionable. Interactive design was informed by feedback from educators, undergraduates, community members, and environmental professionals. The result is a replicable model for using local data to drive inquiry, critical thinking, and civic engagement in the classroom. 

Presenting Author

Nora Lee Reinhardt, Loyola University Chicago

First Author

Nora Lee Reinhardt, Loyola University Chicago

CoAuthor(s)

Mena Whalen, Loyola University Chicago
Ping Jing, Loyola University Chicago School of Environmental Sustainability

Withdrawn - 09. Neural Network-Based High-Dimensional Survival Analysis with Measurement Error

Although measurement error (ME) in the survival model has been addressed in some studies, attempts to account for ME in high-dimensional data for survival models are limited. We propose a neural network-based corrected score (NNCS) approach to simultaneously correct for biases caused by ME in both functional and scalar covariates. The NNCS approach approximates the conditional expectation of the latent true measures based on repeated observed measures. This approximation is incorporated into a corrected Cox score function, yielding estimators that are both consistent and asymptotically normal. The NNCS approach is flexible and data-driven, does not require strict parametric assumptions on the variables, and is adaptive to various survival models. Furthermore, it can capture complex nonlinear relationships between the latent true measures and observed measures. Simulation studies demonstrate that NNCS estimators consistently exhibit smaller bias across various scenarios. The proposed approach is applied to examine how device-based physical activity and self-reported sugar intake relate to overall death risk among U.S. adults. 

Presenting Author

Yuanyuan Luan

First Author

Yuanyuan Luan

CoAuthor(s)

Carmen Tekwe, Indiana University
Caihong Qin, Indiana University
Lan Xue, Oregon State University
Roger Zoh, Indiana University

10. Navigating Difficult Conversations in Healthcare: The Use of AI Chatbots in Training Undergraduate and Graduate Nursing Students

Nursing students at the undergraduate and graduate levels benefit from opportunities to practice having difficult conversations with patients and families. Artificial Intelligence (AI) applications provide a unique opportunity for learners to gain experience and hone this skill prior to human interaction. The purpose of this research is to develop a tiered experiential approach to preparing nursing students for difficult conversations prior to interacting with real-life standardized patients. Our team aims to adapt, pilot, and evaluate an AI-based communication tool for student engagement in practicing difficult conversations. Approximately 80 undergraduate and graduate nursing students will first practice difficult conversations via an AI chatbot before engaging with a standardized patient (appropriately leveled for students). Rubrics will be utilized for course outcomes evaluation and learners will complete a post-activity survey to determine student perceptions regarding the usability and efficacy of the AI-based communication tool. 

Presenting Author

Samantha Phillips, Creighton University

First Author

Samantha Phillips, Creighton University

CoAuthor(s)

Lindsay Iverson, Creighton University
Tamara Oliver, Creighton University
Rachel Malander, Creighton University
Steven Fernandes, Creighton University

11. Health Inequities from Climate Change-related Extreme Weather Events among Adults in California

Background: Extreme weather linked to climate change is increasing in frequency and severity, with cascading health effects disproportionately affecting communities of color and low-income communities. This study examines the associations between extreme weather and physical and mental health among adults and whether this relationship varies by socioeconomic status, race, and ethnicity.
Methods: From the 2023 California Health Interview Survey, we analyzed data on 14,319 adults who reported experiencing any extreme weather event (heat waves, wildfires, or wildfire smoke) in the past two years. Self-reported health outcomes included current asthma status, mental or physical health harm attributable to extreme weather, and psychological distress (Kessler Psychological Distress Scale [moderate: 5-12; severe: ≥13]). Inverse probability weighting adjusted for age, income, race, ethnicity, and smoking. Firth logistic regression models assessed health effects of extreme weather, with analyses stratified by race, ethnicity, and income.
Results: Health harm from extreme weather varied by event and sociodemographic group. For example, multiracial individuals most frequently reported mental health harm from wildfire smoke (20%, p<0.001). Compared to unexposed adults, wildfire exposure was associated with higher odds of asthma (adjusted odds ratio [aOR]: 1.3, 95% confidence interval [CI]: 1.2–1.4). Heat waves were linked to severe psychological distress (aOR: 4.9, 95% CI: 4.5–5.4) and physical health harm (aOR: 7.4, 95% CI: 5.8–9.6). In stratified analyses, wildfire exposure had the strongest association with asthma among low-income adults (aOR: 1.58, 95% CI: 1.18-2.11).
Conclusions: Extreme weather adversely impacts physical and mental health, with disproportionate effects across marginalized socioeconomic, racial, and ethnic groups. Climate adaptation strategies must address the compounding burdens of structural inequities within communities of color and poverty. 

Presenting Author

Kathy Hoang, University of California, Los Angeles

First Author

Kathy Hoang, University of California, Los Angeles

12. Legal Counsel's Impact on Eviction Case Outcomes in Dallas County

Evictions yield persistent and severe health, social, educational, employment, and financial consequences. Unlike criminal or civil court proceedings, evictions are ruled over by a Justice Court in Texas, with fewer tenant protections like court recordings or a right to counsel. Many "right to counsel" (RTC) campaigns for eviction defense in major US cities have used pilot programs as evidence for the positive effect that universal access to counsel could potentially have. The pilot program in Dallas, Texas, has seen positive effects of representing eviction tenants in Dallas County. Using records of representation supplied by the Dallas Eviction Advocacy Center (DEAC) along with Dallas County Court docket data, the authors found evidence that DEAC representation is associated with a 3.81 times higher odds of receiving an Appealed case status and 8.29 times higher odds of a Dismissed case status compared to an unfavorable Judgment case status, relative to no representation. 

Presenting Author

Alexandra Thibeaux, Southern Methodist University

First Author

Alexandra Thibeaux, Southern Methodist University

CoAuthor(s)

Banu Pullaiahnaidu, Southern Methodist University
Bethel Kumsa, Southern Methodist University

13. Advancing multilevel Bayesian network with efficient Bayesian inference

Bayesian networks (BN) provide a powerful framework for modeling complex dependencies and reasoning, under uncertainty across diverse applications. A multilevel Bayesian network (MBN) combines BNs with multilevel modeling, facilitating their applications in datasets involving correlated observations. However, the robustness of multilevel models is highly affected by the number of clusters and cluster size. Thus, the reproducibility of MBN is questionable in small sample scenarios. Bayesian methods facilitate the integration of prior knowledge, thereby robustifying inference for small samples sizes. Bayesian inference is often performed using Markov Chain Monte Carlo methods which is known to be computationally intensive. Therefore, this study aims to use the integrated nested Laplace approximation (INLA) to efficiently compute the local network scores during structure learning, and subsequent parameter learning. In addition, the study investigates the prior sensitivity in structure and parameter learning of MBN and compares the results with MBN based on the maximum likelihood estimation (MLE) technique. The study uses simulation study considering data with different numbers of clusters (20, 30, 50), with five individuals per cluster in each scenario. For each scenario, we considered the usual log-Gamma and Penalized Complexity (PC) priors on the precision parameters in the single random effect case, and Wishart and LKJ priors for cases with more than one random effects. Results show that the structure and parameters of MBN are sensitive under different prior settings and the performance of MBN with a log-gamma prior on the precision parameter of each local network is higher as compared to MBN fitted with MLE. 

Presenting Author

Bezalem Eshetu Yirdaw, University of South Africa

First Author

Bezalem Eshetu Yirdaw, University of South Africa

CoAuthor(s)

Legesse Kassa Debusho, University of South Africa
Harvard Rue, King Abdullahi University of Science and Technology
Janet Van Niekerk, King Abdullahi University of Science and Technology and University of Pretoria