Contributed Poster Presentations: Section on Statistics in Epidemiology

Shirin Golchi Chair
McGill University
 
Tuesday, Aug 5: 2:00 PM - 3:50 PM
4121 
Contributed Posters 
Music City Center 
Room: CC-Hall B 

Main Sponsor

Section on Statistics in Epidemiology

Presentations

07: A Biological Additivity and Interaction Model to Bridge Epidemiology and Toxicology

Modeling the effects from mixtures of exposures is of interest to both epidemiology and toxicology. Due to differences in data and settings, mutually exclusive methods have been developed across the two fields. We take the opportunity to develop a new methodology that borrows advantages from both fields and allows for knowledge to flow across domains. We develop a technique called BAI-LVM that accounts for biological additivity in a mixture response as modeled in toxicology. We show how a straightforward statistical model does not account for biological additivity and how various models from epidemiology relate to each other regarding biological assumptions. Our method produces latent individual dose response curves, providing an easy way to inject prior knowledge from toxicology. The HAND model for biological additivity model is implemented given a consensus that it is biologically most plausible. Simulation studies demonstrate the performance across different scenarios and an application to epidemiological data is provided. 

Keywords

biological additivity

epidemiology

toxicology

Bayesian

dose response 

Co-Author(s)

Shanshan Zhao, NIEHS/NIH
Alexander Keil, National Cancer Institute, Division of Cancer Epidemiology and Genetics
Zhen Chen, NICHD/NIH
Paul Albert, National Cancer Institute

First Author

Daniel Zilber, NIEHS

Presenting Author

Daniel Zilber, NIEHS

08: A Hierarchical Bayesian Model for Mixture Exposure and Count Data in Environmental Health

Modeling count-based health outcomes in environmental research presents challenges like correlated exposures, non-linear interactions, and spatiotemporal dependencies. We propose a hierarchical Bayesian model that incorporates the negative binomial distribution via data augmentation to address these complexities. This framework integrates variable selection, effect estimation, and hotspot detection to improve inference in exposure-outcome relationships.
We evaluated the model through simulations across eight scenarios, varying exposure correlation, interaction effects, and dependence. Each scenario included 100 datasets of 504 observations across 21 spatial units and 24 time points. The model's utility was further demonstrated using real-world air pollution data.
The proposed approach consistently identified influential exposures, estimated effects, and detected hotspot areas, particularly with appropriate spatiotemporal dependencies. By leveraging the negative binomial distribution, it accounted for data dispersion without additional adjustments. This model provides a robust, unified framework for analyzing count outcomes in environmental health research and policy-making. 

Keywords

Count Data

Mixture Exposure

Bayesian Kernel Machine Regression

Air Pollution

Environmental Health 

Co-Author(s)

Boubakari Ibrahimou, Florida International University
Zoran Bursac, Florida International University

First Author

Ning Sun

Presenting Author

Ning Sun

09: Analysis of Factors Influencing Maternal Parenting Anxiety in Japan Using a Proportional Odds Model

Postpartum depression is a mental disorder experienced by approximately one in seven women within the first year after childbirth. It is considered to have a complex causal relationship over time, making it difficult to evaluate in a classic cohort study that assesses cause and identifies onset at a single point in time. Our study focused on the psychological factors contributing to postpartum depression, specifically examining factors related to parenting anxiety. Our study used Japanese birth cohort data that followed 1701 pregnant women recruited between 2003 and 2005. To identify relevant factors, we used ANOVA for continuous variables and chi-square tests or Fisher's exact tests for categorical variables. A proportional odds model was then applied, with variable selection performed using stepwise method. The results showed that having a playmate for the child and someone to consult or assist with childcare were significantly associated with reduced parenting anxiety at all time points. Given that these factors remained significant over time, the study highlights the importance of continuous, rather than one-time, interventions to support mothers. 

Keywords

Postpartum depression

Parenting anxiety

Proportional odds model

Stepwise method

Maternal mental health

Childcare support 

Co-Author

Ayano Takeuchi, Keio University

First Author

Shota Sonoda

Presenting Author

Shota Sonoda

10: Applications of Psychometric Models: Analyzing smell as an early predictor of Parkinson's disease

As populations age, finding new reliable ways to identify Parkinson's disease is increasingly vital. In addition, identifying biomarkers for early progression of Parkinson's disease will help accelerate the clinical evaluation of the efficacy of new interventions. We investigate the University of Pennsylvania Smell Identification Test (UPSIT) as a diagnostic tool for early stages of Parkinson's disease. To do this, we utilize psychometric models, specifically the Rasch and 3 parameter logistic models, to investigate both if this test can identify Parkinson's disease in early stages of development, and also to gain insights into the psychometric properties of the specific questions on the UPSIT, such as which questions perform well and which questions perform poorly. 

Keywords

Parkinson's Disease

Rasch Model

3PL Model 

Co-Author(s)

David James, Novartis
Joel Greenhouse, Carnegie Mellon University

First Author

Alexander Brick

Presenting Author

Alexander Brick

11: Balanced Subgroup Discovery Via Matching, Decision Trees, and Randomization Inference

We recently developed a method for data-driven heterogeneous treatment effect subgroup discovery that combines matching and decision trees (mTree). Matching leads to balance within discovered subgroups, overcoming the limitation that insufficient balance in subgroups may lead to findings that cannot be replicated. Decision trees are popular in medicine because they are an effective decision-making technique providing high classification accuracy with a simple representation of gathered knowledge, i.e., they are not “Black Boxes”.

Our previous work did not propose an approach for statistical inference within subgroups. In the typical superpopulation inference framework, re-using data for hypothesis generation and hypothesis testing creates type 1 error issues. To overcome this challenge, we adopt the randomization inference framework wherein the goal is to make inferences about treatment effects in the sample alone. In this work we extend the mTree method to accommodate time-to-event outcomes and develop new randomization inference estimators of within-subgroup additive and multiplicate treatment effects. The methods are applied to a systolic blood pressure intervention trial. 

Keywords

Heterogeneous treatment effects

Subgroup discovery

Matching

Decision trees

Randomization inference 

First Author

Joseph Rigdon, Wake Forest School of Medicine

Presenting Author

Joseph Rigdon, Wake Forest School of Medicine

12: Bayesian Spatial Scan Statistics for Under-Reported Data

The spatial scan statistic is a cornerstone for disease monitoring and outbreak detection. Common implementations typically assume that case counts are correctly observed. However, underreporting frequently occurs in real-world settings, leading to biased estimates and compromising the accuracy of disease surveillance efforts. This study proposes a novel Bayesian spatial scan statistic to address the challenges posed by under-reported case counts in outbreak detection. By accounting for misreporting, the proposed framework enhances the accuracy and robustness of disease cluster identification. Comparisons with existing methods and applications to COVID-19 data demonstrate its superior ability to provide reliable inferences despite reporting limitations. 

Keywords

Spatial Clustering

Underreporting

Bayesian Statistics

Spatial scan statistics

COVID-19 

Co-Author

Joon Jin Song, Baylor University

First Author

Nathen Byford, Baylor University

Presenting Author

Nathen Byford, Baylor University

13: Calculating Correlation in Observational Studies when only One Variable Contains Repeated Measures

Correlations of variables with multiple measurements often arise in human observational studies. In such studies, variables of interest may include household level measurements as well as participant level measurements. Correlations between participant data and household data are difficult to calculate due to the imbalance of available samples across variables.
While several approaches have been developed to estimate correlation when both variables contain repeated measures, less work has explored the scenario where only one variable contains repeated measures and not both. This work evaluates several correlation approaches for this scenario to compare the approaches developed for repeated measures data. Using simulated data, comparisons are made across the following correlation approaches: Pearson's, subject level averaging, regression models, and mixed effects models with compound symmetry covariance matrix. Several simulated scenarios are considered, including varying the underlying true correlation values and different noise levels. Mean Squared Error (MSE), confidence interval width and coverage probabilities are used to assess the methods. 

Keywords

correlation

repeated measures

observational data 

Co-Author(s)

Carrie Fleming, Corteva
Alexa Neumann, Corteva
Elizabeth Sweeney, Corteva
Yushan Gu, Corteva

First Author

Xiaoyi Sopko

Presenting Author

Xiaoyi Sopko

14: Cause-Specific Mortality Long-Term Projections over Country, Sex, and Age

Reliable and actionable mortality forecasts are crucial for reducing premature deaths and improving healthy life expectancy at national and global levels. While significant progress has been made over the past two decades, the United Nations Population Division projects a slowdown in mortality gains, posing a challenge to ongoing public health efforts. A critical gap in cause-specific mortality forecasts by sex and age at country-specific and global levels hinders the ability of organizations like the WHO to effectively redirect resources and accelerate progress. In this talk, we address the challenges of projecting cause-specific mortality over a long forecast horizon, including data quality, regional heterogeneity, the complexity of competing risks, and preventing unreasonably extreme trends over the long term. We will extend the singular value decomposition (SVD) based Lee-Carter model into a Bayesian setting to provide country-, age-, and gender-specific projections with uncertainty bounds. The improved mortality estimation and forecasts support WHO's efforts to accelerate progress in healthy life expectancy and prevent premature mortality worldwide. 

Keywords

Mortality Forecasting

Bayesian Hierarchical Modeling

Time Series

Multiple Populations 

Co-Author(s)

Le Bao, Penn State University
Zehang Li, UCSC

First Author

Ryan Halstater

Presenting Author

Ryan Halstater

15: Clustering County-Level COVID-19 Death Trends Using INAR Models

This project applies the integer-valued autoregressive (INAR) clustering method by Roick et al. (2021) to analyze daily COVID-19 deaths in US counties. Unlike national and state data, county-level death counts are often low (0–5 daily deaths), making traditional time series clustering methods unreliable. We test whether INAR-based mixtures, designed for autocorrelated integer-valued data, can better group counties with similar mortality patterns. Using CDC data, we cluster counties based on daily death trajectories. We then compare clusters to demographic factors (e.g., vaccination rates, population density), identifying shared trends. For example, rural counties with low healthcare access form distinct clusters compared to urban areas. Preliminary results suggest that INAR models outperform distance-based methods (e.g., DTW) for low-count data. This approach highlights the importance of tailored statistical methods for discrete health data, common in disease tracking and local policy evaluations. The poster will present visualizations of cluster patterns, model diagnostics, and insights into how INAR methods can address challenges in analyzing sparse county-level data. 

Keywords

Time series

INAR models

COVID-19 mortality

county-level clustering

low-count data 

Co-Author(s)

Aaron Rivera, California State University Fullerton
Matheus Bartolo Guerrero, California State University Fullerton

First Author

Elijah Amirianfar

Presenting Author

Elijah Amirianfar

16: Estimating effects of LMTPs on rates of change in health outcomes with repeated measures data

Longitudinal modified treatment policies (LMTPs) quantify the effects of interventions that depend on the natural value of exposure, generalizing policy-relevant quantities, such as "stochastic" and "shift" interventions. The current LMTP estimation approach yields effects on outcomes measured at the end of a study; however, repeated measures data often contains time-varying outcomes measured at each visit and interest may lie in estimating effects on the rate of change in these outcomes over time. For example, one may wish to quantify the effect of an LMTP on the rate of progression of a disease. We extend the LMTP approach to estimate the effect on change in a time-varying outcome over time and propose a hypothesis testing framework to formally test whether the outcome trajectory under an LMTP differs from the natural outcome trajectory. Repeated measures data also frequently has unique data complications that must be considered, such as irregular visit times, where the visit timing varies among individuals from some pre-specified time. We propose an extension to our work that permits effect estimation and hypothesis testing for an LMTP in a setting with irregular visit times. 

Keywords

Causal inference

Longitudinal data

Modified treatment policies

Rates of change

Nonparametric 

Co-Author

Daniel Malinsky

First Author

Anja Shahu

Presenting Author

Anja Shahu

17: Estimation of Disease Incidence from Cross-Sectional Data: The CESE Method

Given the wide availability of multi-wave cross-sectional studies, methods that potentially strengthen causal inference are attractive. We propose the Cross-sectional Enrichment by Sample Extrapolation (CESE) method, which matches observations from serial cross-sectional data to find proxies for longitudinal trajectories. Outcomes events from later waves can be paired with exposure data from earlier waves to estimate exposure-outcome associations. A major strength is that CESE does not require longitudinal data. In this abstract, we will describe the statistical assumptions required and demonstrate an application of CESE to estimate incident substance use disorder (SUD). Using a cross-sectional survey (n=117,590), we match individuals across two calendar years; n=10,444 participated in both years. Using a combination of Mahalanobis distance and greedy matching, CESE matched 24.48% of returning participants to themselves. Among chronic pain patients, an estimated 5.5% had incident SUD, similar to an estimate of iatrogenic opioid abuse among pain patients(4.7%). The CESE method is a potential tool for using multi-wave cross sectional data to estimate population-level incidence. 

Keywords

Matching

Cross-sectional data collection

Population survey

Epidemiology 

Co-Author(s)

Karilynn Rockhill, Rocky Mountain Poison & Drug Center, Denver Health and Hospital Authority
Debashis Ghosh, University of Colorado, School of Public Health
Alison Abraham, University of Colorado

First Author

Joshua Black, Rocky Mountain Poison and Drug Safety

Presenting Author

Joshua Black, Rocky Mountain Poison and Drug Safety

18: Evaluating the Generalizability of Commercial Healthcare Claims Data

Large healthcare claims databases, which aggregate claims from commercial insurers, are increasingly being used to generate real-world evidence in medical research. Nearly 10,000 manuscripts have been published, and the pace of output is accelerating. Despite their widespread use, these databases have not been rigorously vetted against ground-truth data. Representation in such datasets has been found to be systematically biased along racial and socioeconomic lines. These same factors are known to be effect modifiers for a myriad of conditions and treatments in medicine, and the combination of inconsistent sampling and effect modification can give rise to external validity bias. In [Dahlen Deng & Charu 2024], we undertook the most detailed empirical analysis of external validity bias in healthcare claims data to date, focusing on the rates of a comprehensive set of inpatient procedures, for which a unique ground-truth dataset exists. We found large variation in the extent of the bias across procedures, including 22.8% that were underestimated by more than a factor of 2. Further, we found a significant relationship between social determinants of health and the magnitude of bias. 

Keywords

external validity bias

healthcare claims databases 

Co-Author

Vivek Charu

First Author

Alex Dahlen, New York University, School of Global Public Health

Presenting Author

Yaowei Deng

19: Exploring Factors Influencing the Development of Food Allergy in Infancy

Food allergies result from the complex interaction of genetic and environmental factors over time. In this study, we will utilize birth cohort study data that target food allergy and track births over time, and will use incidence data from two time points to quantify the impact of risk. We used data from the Japanese Birth Cohort Study, which followed 1550 individuals from gestational age. That study began recruiting between 2003 and 2005 and continues to track to the present. We searched for children who had not developed food allergy at 1 year of age and who would develop food allergy at 3 years of age. Logistic regression analysis was used to examine the risk of developing food allergy given the explanatory variables. Maternal history of allergic disease, infant eczema, atopic dermatitis, egg removal had an increased risk of food allergy. Duvets had a reduced risk of food allergy. In genetic factors, there was a trend toward an increased risk of developing food allergy if the mother had a history of allergic disease or had already developed other allergies. In environmental factors, elimination of eggs increased the risk, and use of a down comforter decreased the risk. 

Keywords

food allergy

infants

genetic factors

environmental factors

infant eczema

atopic dermatitis 

Co-Author

Ayano Takeuchi, Keio University

First Author

Keita Fukai

Presenting Author

Keita Fukai

20: Integrate Meta-analysis into Specific Study for Estimating Conditional Average Treatment Effect

Randomized controlled trials are the standard method for estimating causal effects, ensuring statistical power and confidence through adequate sample sizes. However, achieving sufficient sample sizes is often challenging. This study proposes a novel method to estimate the average treatment effect (ATE) in a target population by integrating and reconstructing information from previous trials with only summary statistics of outcomes and covariates via meta-analysis. The proposed approach combines meta-analysis, transfer learning, and weighted regression. Unlike existing methods, which estimate the ATE based on the distribution of source trials, our method directly estimates the ATE for the target population. The proposed method requires only the means and variances of outcomes and covariates from the source trials and is theoretically valid under the covariate shift assumption, regardless of the distribution of covariates in the source trials. Simulations and real-data analyses demonstrate that the proposed method yields a consistent estimator and achieves higher statistical power than the estimator derived solely from the target trial. 

Keywords

conditional average treatment effect

meta-analysis

transfer learning

weighted linear regression 

Co-Author

Masahiro Kojima, The Institute of Statistical Mathematics, Japan

First Author

Keisuke Hanada, Osaka University

Presenting Author

Keisuke Hanada, Osaka University

21: Learning Collaborative to Reduce Tobacco and Cancer Disparity (LEAD) Initiative

Smoking is the leading cause of preventable death (CDC, 2024). Disparities in Missouri exist among tobacco users including cancer patients. Smoking rates are higher in underrepresented populations with treatment being lower. Health organizations and clinicians are crucial to providing tobacco treatment. More than 70% of individuals who smoke see a clinician annually and report a desire to stop. The aims of the project include defining tobacco treatment, scaling the model to rural Missouri, and reducing disparities.
Data for patients who smoke were obtained from "Informatics for Integrating Biology and the Bedside (i2b2)". The LEAD intervention will evaluate the rate for counseling to quit smoking, comparing pre and post intervention. Overall, baseline data shows cancer patients (5.1% cancer patients vs 3.5% non-cancer), older age (3.1% 65+ years old vs 1.2% less than 65 years) and white patients (4.0% white vs 1.7% other races) received a higher rate of counseling. We hypothesize that these disparities will be reduced in post-intervention. As the project proceeds with implementation, the goal is to ensure clinicians are consistent with providing tobacco treatment to patients. 

Keywords

Cancer

Tobacco Treatment

Healthcare disparities

Rural Health

Patient Outcomes

Common Data Model (CDM) 

Co-Author(s)

Kevin Everett, University of Missouri Columbia
Li-Shiun Chen, Washington University School of Medicine in St. Louis
Misty Phillips, University of Missouri Columbia

First Author

Jennifer Bryant

Presenting Author

Jennifer Bryant

22: Factor analysis Methods for Mosquito-Borne Disease forecasting in Brazil

Vectors are living organisms, such as mosquitoes, which transmit disease between humans. Vector-borne diseases have a high global burden, particularly among the world's poorest populations. The worst effects can be mitigated with advanced warning of these climate-mediated diseases. In Brazil, the infectious diseases Dengue, Chikungunya and Zika all co-circulate. The same mosquito vectors spread the diseases, which pose significant health and mortality risks. Infodengue is a surveillance system in Brazil for these three diseases, with a granularity of spatio-temporal data rarely seen in such systems. The system does not currently explicitly predict future cases. We present a factor analysis model, a flexible data reduction technique, in the Bayesian framework for disease forecasting with extensions to this technique relevant to our problem, including spatiotemporal modelling and joint modelling of shared vectors. 

Keywords

Factor Analysis

Forecasting

Vector-Borne Disease

Integrated Nested Laplace Approximations 

Co-Author

Silvia Liverani, Queen Mary University of London

First Author

Rowan Morris

Presenting Author

Rowan Morris

23: Machine Learning Approaches for Predicting In-Hospital Mortality in Drug Overdose Patients

Machine learning (ML) algorithms are effective in predicting clinical outcomes. This study aimed to identify ML models with the best performance for predicting mortality and possibly improving patient outcomes in drug overdose care. This study included data on 1452 patients seen at Emergency Departments in East Texas (9/1/2021-12/31/2024) for overdose care. Forty features were selected for six ML models, including decision tree, gradient boosting, logistic regression, neural network, random forest, and support vector machine to predict in-hospital mortality. ML models were compared by the area under the receiver operating characteristics curve (AUC) and KS (Youden). The analysis revealed that the random forest model was the best with superior AUC and KS values. The five most crucial features in prediction across all models are the Glasgow coma scale, systolic blood pressure, BMI, age, and diastolic blood pressure at admission. The random forest model was the best-performing ML model, making it more reliable in predicting mortality with the potential to significantly impact clinical practice, underlining the importance of such research in predictive modeling in Addiction medicine. 

Keywords

Drug overdose death

predicting in-hospital mortality

machine learning algorithms 

Co-Author(s)

Emmanuel Elueze, Department of Graduate Medical Education, The University of Texas Tyler School of Medicine
Karan Singh, Department of Epidemiology and Biostatistics, The University of Texas Tyler School of Medicine

First Author

Tuan Le, UT Tyler School of Medicine

Presenting Author

Tuan Le, UT Tyler School of Medicine

24: Generalized Simple Graphical Rules for Assessing Selection Bias

Selection bias is a major obstacle toward valid causal inference in epidemiology. Over the past decade, several simple graphical rules based on causal diagrams have been proposed as the sufficient identification conditions for addressing selection bias and recovering causal effects. However, these simple graphical rules are usually coupled with specific identification strategies and estimators. In this article, we show two important cases of selection bias that cannot be addressed by these simple rules and their estimators: one case where selection is a descendant of a collider of the treatment and the outcome, and the other case where selection is affected by the mediator. To address selection bias in these two cases, we construct identification formulas by the g-computation and the inverse probability weighting (IPW) methods based on single-world intervention graphs (SWIGs). They are generalized to recover the average treatment effect by adjusting for post-treatment upstream causes of selection. We propose two IPW estimators and their variance estimators to recover the average treatment effect in the presence of selection bias in these two cases. 

Keywords

selection bias

causal inference

causal diagrams

SWIG 

First Author

Yichi Zhang, Yale School of Public Health

Presenting Author

Haidong Lu, Yale University

25: Machine Learning of Smoking Relapse

Machine learning model can help identify multifaceted factors influencing tobacco transitions. A random forest model is developed to predict smoking relapse, focusing on racial disparities and vaping characteristics. Data are drawn from the Population Assessment of Tobacco and Health (PATH) Study adult interview files. Former combustible cigarette smokers at baseline (Wave 5) were followed up one year later (Wave 6). Predictors (n=100) include a wide range of social demographics, psychosocial factors, health status, tobacco and substance use behaviors, and vaping characteristics. The findings reveal notable racial disparities in smoking relapse predictors, along with distinct roles of vaping characteristics across racial groups. Unique social, behavioral, and health factors are crucial for improving smoking cessation outcomes. 

Keywords

e-cigarettes

random forest

PATH study 

First Author

Hongying Dai, University of Nebraska Medical Center

Presenting Author

Hongying Dai, University of Nebraska Medical Center

26: Impacts of Immortal Time Bias on Assessing the Effects of Immune-related Adverse Events on Survival

Despite advancements in immune checkpoint inhibitors (ICIs) for cancer, ICIs can trigger immune syndromes called immune-related adverse events (irAEs), often linked with survival. However, many time-to-event studies overlook immortal time bias. We examined this bias using time-naïve, time-dependent, and landmark analyses in 3343 cancer patients at OSU CCC. Incident irAEs were defined as any gastrointestinal, pulmonary, dermatological, endocrine, or hepatobiliary AEs post-ICI infusion. Kaplan-Meier and Simon-Makuch methods calculated cumulative incidence, and Cox models evaluated survival with and without time dependence. A total of 1739 patients died, and 48.5% experienced irAEs. Median survival was 10.2 months, with a median time to first irAE of 1.4 months. Time-naïve analysis showed irAEs were associated with significantly lower mortality (P <0.01) while time-dependent analysis showed higher mortality (P <0.01). Cox models yielded HR 1.41 (95% CI: 1.27-1.55) with time dependence and HR 0.86 (95% CI: 0.78-0.94) without. Using different landmark values did not mitigate bias. Accounting for time dependence is needed to avoid biased interpretations of irAEs effects on survival. 

Keywords

Immortal time bias

irAE

ICI

Time-Dependent 

Co-Author(s)

Demond Handley, The Ohio State University
Aditi Shendre, The Ohio State University
Lang Li, Ohio State University
Mohamed Elsaid

First Author

Yesung Kweon, The Ohio State University

Presenting Author

Yesung Kweon, The Ohio State University

27: Matched Design with Re-entry and Missingness for Comparative Effectiveness in Membranous Nephropathy

Clinical trials comparing treatment effectiveness for rare diseases such as membranous nephropathy (MN) can be limited by short follow-up and small sample sizes. We demonstrate how a matched design combined with sequential re-entry and multiple imputation can be applied to observational data to generate reliable comparative effectiveness evidence while maximizing sample size. Individuals can have multiple eligible treatment initiations with this approach, and incomplete cases are retained. Propensity scores estimated with a GEE were used in 1:1 matching without replacement with hard matching on treatment history. Hazard ratios with robust confidence intervals that account for multilevel non-nested clustering were obtained in each imputed dataset and pooled. Restricted mean survival times with appropriate bootstrap confidence intervals were also pooled. An analog to per-protocol analysis censored individuals if they stopped adhering to treatment protocol and used inverse probability-of-censoring weights to address artificial censoring. Our application compared the long-term effectiveness of two immunosuppressants for MN, and results were consistent with a shorter 24-month trial. 

Keywords

comparative effectiveness

matching

sequential re-entry

multiple imputation

rare disease

robust variance estimation 

Co-Author(s)

Laura Mariani, University of Michigan
Nicholas Seewald, University of Pennsylvania
Jarcy Zee, University of Pennsylvania

First Author

Meghan Gerety, University of Pennsylvania

Presenting Author

Meghan Gerety, University of Pennsylvania

28: Improved Covariate-Constrained Randomization Strategies to Better Balance Baseline Covariates

Cluster randomized trials are often used to evaluate diverse types of interventions in which groups of individuals are randomized, and the interventions are delivered at the cluster level. These types of randomized trials do not always effectively balance cluster- and individual-level characteristics, resulting in a higher risk of bias. We implemented covariate-constrained randomization (CCR) in a longitudinal cluster-randomized de-implementation trial with over 40 hospitals enrolled to evaluate two de-implementation strategies for reducing overuse of continuous pulse oximetry monitoring in children with bronchiolitis. CCR was performed using the baseline over-monitoring rate of each hospital and two other hospital characteristics, which were strong independent predictors of outcome. The current metrics for balance in CCR only consider the mean levels of covariates between arms, ignoring the full distributions of covariates. We examine the impact of outliers in covariates, particularly in combination with a small number of clusters on the randomization. We propose several strategies, including a stratified randomization procedure, to improve the covariate balance at baseline. 

Keywords

Covariate-Constrained Randomization

Cluster Randomized Trials

Implementation Science 

Co-Author(s)

Zi Wang, Penn
Spandana Makeneni, CHOP
Courtney Wolk, Penn
Rinad Beidas, Northwestern
Christopher Bonafide, CHOP
Enrique Schisterman, University of Pennsylvania
Rui Xiao, University of Pennsylvania

First Author

Kaitian Jin, University of Pennsylvania

Presenting Author

Jennifer Faerber

29: Measurement Error Models for Mediation Analysis

Mediation pathway often involves multiple mediators and selecting true mediators is an essential step in addressing key scientific questions. We propose a novel adoption of the Measurement Error Model (MEM) framework in mediation analysis for mediator selection. The MEM framework enables variable selection by deliberately introducing measurement errors to predictors, identifying variables whose predictive utility is most sensitive to such perturbations. When introducing a certain amount of measurement error into the mediation pathway and distributing across multiple mediators, the optimization of the joint MEM likelihood will assign the majority of measurement errors to mediators that are not important in the mediation system while maintaining important mediators less impacted, effectively achieving variable selection. This approach is readily to extend naturally to path selection for identifying true mediators. We demonstrate the efficacy of the proposed method through extensive simulations across various scenarios, comparing its performance with existing approaches. 

Keywords

Mediation Analysis

Measurement Error Models 

Co-Author

Mengling Liu, New York University Grossman School of Medicine

First Author

Chen Liang, New York University

Presenting Author

Chen Liang, New York University

30: Methods for Addressing Unmeasured Confounding in Observational Studies

A key challenge in estimating the causal effect of a treatment on an outcome in observational studies is unmeasured confounding, which causes bias. Traditional techniques such as propensity score-based matching, stratification, and marginal structural models can control for measured confounding but are inadequate to deal with unmeasured confounding. Several advanced methods have been proposed to tackle unmeasured confounding, such as the instrumental variable (IV) approach, regression discontinuity design (RDD), and difference in difference (DID). These methods exploit assignment mechanisms that determine treatment status but are not related to any unmeasured confounding. In this presentation, we will first explore the issues arising from unmeasured confounding, then provide an overview of commonly used methods for addressing unmeasured confounding, including the assumptions, key concepts, and implementations. Finally, we will review examples of how these advanced methods are applied in clinical and healthcare research. 

Keywords

Unmeasured confounding

causal inference

instrumental variable

regression discontinuity

observational studies 

Co-Author

Julia Ma, AbbVie

First Author

Yingyi Liu, AbbVie

Presenting Author

Yingyi Liu, AbbVie

31: Modeling Count Time Series with Spatial Dependence: A COM-Poisson INGARCH Approach

We introduce a novel time series model that integrates Integer-Valued Generalized Autoregressive Conditional Heteroskedasticity (INGARCH) dynamics with a COM-Poisson distribution, incorporating a spatial modeling term to account for spatial dependence. The COM-Poisson distribution allows for overdispersion and underdispersion in count data, making it more flexible for capturing real-world phenomena. The GARCH component models the time-varying conditional variance of the process, while the spatial term accounts for the influence of neighboring data points, enabling the model to address spatial correlations. This approach provides a comprehensive framework for analyzing time series count data with both heteroskedasticity and spatial dependence, which is particularly useful in fields such as epidemiology and infectious disease. The COM-Poisson INGARCH spatial model benefits public health and health policy researchers by allowing for more accurate predictions. This will assist public health officials and policymakers to make evidence-based decisions and improve public health outcomes. The model's performance is evaluated through simulation studies and applied to a real-world dataset. 

Keywords

COM-Poisson

Integer-valued GARCH models

Spatial Modeling

Time Series of Counts 

Co-Author(s)

Isuru Ratnayake, Kansas University Medical Center
Prabhakar Chalise, University of Kansas Medical Center
Dinesh Pal Mudaranthakam

First Author

Stephanie Colwell, University of Kansas Medical Center

Presenting Author

Stephanie Colwell, University of Kansas Medical Center

32: Modeling the longitudinal relationship between stress and sex hormones and self-reported pain using Latent Growth Curve Analysis in Adolescents

In this study, daily diary data from the publicly available Texas Longitudinal Study of Adolescent Stress Resilience and Health dataset was used to examine the longitudinal relationship between a series of stress and sex hormones with self-reported pain (i.e. headache, back pain, stomach pain). Latent growth modeling within a generalized structural equation modelling framework was used to assess these relationships, accounting for individual differences with random intercepts and slopes and adjustment for other covariates. In the original study, a total of 975 students from 9th grade high school completed a self-reported daily diary on their mental and physical health for 10 days alongside salivary samples measuring cortisol, corticosterone, cortisone, DHEA-s, testosterone, estradiol, and progesterone. Higher corticosterone (β = -0.24, p = 0.048) and testosterone (β = -0.40, p = 0.042) levels were significantly linked to a lower likelihood of back pain, while cortisol showed a trend toward a positive association (β = 0.40, p = 0.052) with back pain, but significantly predicted with higher likelihood of headache (β = 0.432, p = 0.004). 

Keywords

Latent Growth Curve Analysis

Structural Equation Modeling

Longitudinal Study

Hormone-Pain Relationship

Texas Longitudinal Study 

Co-Author(s)

Zaiba Jetpuri, UT Southwestern Medical Center
Chance Strenth, UT Southwestern Medical Center

First Author

Bhaskar Thakur, UT Southwestern Medical Center

Presenting Author

Bhaskar Thakur, UT Southwestern Medical Center

33: PM2.5 Speciation Chemical Interactions and Neonatal Respiratory Distress Syndrome (RDS)

After linking the Florida de-identified birth records data to PM Speciation chemicals data, logistic regression analyses were conducted to assess associations between maternal exposure to PM2.5 speciation metals during pregnancy and the risk of neonatal respiratory distress syndrome (RDS), adjusting for various covariates. Study findings highlight the multifaceted nature of RDS risk, reaffirming known risk factors such as preterm birth, low birth weight, and maternal health conditions. Complex interactions among pollutants and maternal health factors were observed, emphasizing the importance of considering synergistic effects in risk assessment. Additionally, race and ethnicity were significant moderating factors, with nuances observed in Hispanic subgroups. Maternal demographics, pregnancy complications, and maternal PM2.5 pollutant exposure affect risk of RDS through complex interactions. Targeted interventions that reduce exposure to harmful pollutants, particularly among high-risk populations, may mitigate RDS burden. 

Keywords

respiratory distress syndrome (RDS)

PM2.5 speciation chemicals

air pollutants

particulate matter

metals

interactions 

Co-Author(s)

Boubakari Ibrahimou, Florida International University
Ning Sun

First Author

Shelbie Raposo

Presenting Author

Shelbie Raposo

34: Prediction of frailty using gut microbiota by machine learning methods

Epidemiological data have not been used much for forecasting, as most of them are used for confirmatory risk assessment, but there is a growing need to predict frailty in a hyper-aged society in Japan. Preventing frailty is crucial in aging societies because frailty is one of the main risk factors for loss of independence in older adults. We focused on gut microbiota, which previous studies have shown to be associated with frailty. We conducted a comprehensive exploration of the involvement of gut microbiota in the two factors of frailty for elderly Japanese subjects. Our study subjects were 798 Japanese country side residents aged 65 years or older. In this study, frailty wes explored using the L1-logistic regression and Backward/Forward method of logistic regression and Random Forest, which is a tree-based variable selection method.). As a result, two gut microbiota associated with psychological frailty were found. The results obtained in this study were found to be involved in frailty from previous studies. It was suggested that gut microbiota may play an important role in psychological frailty in the Japanese. 

Keywords

Frailty

Loss of independence

Gut microbiota

Psychological frailty 

Co-Author

Ayano Takeuchi, Keio University

First Author

Takumi Irie, 一般財団法人千葉県環境財団

Presenting Author

Takumi Irie, 一般財団法人千葉県環境財団

35: Race-Ethnic Disparities in Pediatric Cancer Mortality Risk: Insights from SEER Data

Pediatric cancers vary greatly in etiology, treatment response, and mortality, with significant disparities by histology-based cancer type and race-ethnicity. Using the SEER dataset of 101,328 pediatric cancer patients (1975-2016), we employed the frailty model to estimate mortality risk disparities among racial and ethnic groups. Non-Hispanic African American (NHAA) and Hispanic patients had higher mortality risks than non-Hispanic Caucasians, with NHAA patients facing the worst outcomes. The highest disparities (aHR 1.50-2.00) were in cancers of the digestive system, liver, endocrine system, acute lymphocytic leukemia, urinary systems, chronic leukemia, and Hodgkin lymphoma. Moderate disparities (aHR 1.20-1.49) were in cancers of the brain, CNS, eye and orbit, female genital system, acute myeloid leukemia (AML), other leukemias, non-Hodgkin lymphoma, and soft tissue including the heart. Low disparities (aHR 1.10-1.19) were in cancers of bones and joints, and the adrenal gland. Identifying distinct patterns and exploring subgroups in longitudinal trajectories is a key research interest. 

Keywords

Race-Ethnic Disparity

Precision Estimate

Pediatric Cancer Mortality

Frailty Model

SEER Data

Pediatric Oncology Outcomes 

Co-Author

Md Jobayer Hossain, Nemours Biomedical Research, A.I. DuPont Children's Hospital

First Author

Araf Jahin

Presenting Author

Araf Jahin

36: Spatiotemporal Trends in County- Level Prevalence of Chronic Obstructive Pulmonary Disease among U.S

Chronic obstructive pulmonary disease (COPD) is a leading cause of death in the U.S. While overall COPD prevalence remained stable nationally during 2011–2021, data to assess local level trends are lacking. By applying CDC's PLACES methodology, we estimated county level COPD prevalence and variance (σ2) among adults ≥18 years during 2011–2021 using annual Behavioral Risk Factor Surveillance System data, census county-level population estimates and 5-year American Community survey data. A Bayesian hierarchical regression model was constructed for county-level COPD prevalence over time. It was assumed to follow a normal distribution, with mean modeled as a linear function of the year for overall trend, county-level random slopes by years to capture the temporal trend for each county, and variance assumed to be σ2. A conditional autoregressive model was incorporated to account for the spatial dependency among counties. Results showed that 53 counties exhibited significant increasing trends in COPD estimates, 87 counties had decreasing trends, and the rest remained stable. Findings suggest the importance of monitoring trends in areas where public health interventions are needed. 

Keywords

Behavioral Risk Factor Surveillance System

Bayesian hierarchical regression

Chronic obstructive pulmonary disease

CDC’s PLACES

conditional autoregressive model 

First Author

Yan Wang, CDC

Presenting Author

Yan Wang, CDC

37: Stochastic EM for Multistate Models of HIV Progression with Interval-Censored longitudinal data

The International Epidemiology Databases to Evaluate AIDS (IeDEA) is a global research consortium that provides extensive HIV/AIDS data worldwide. In this study, we propose multistate models (MSMs) to characterize HIV progression across clinical stages while addressing data complexities, including interval-censored and clustered event history data where we propose a Stochastic Expectation-Maximization (Stochastic EM) algorithm to reduce computation intensity. We use simulation to evaluate the performance of these proposed methods and apply the method to Central-Africa IeDEA data to evaluate the impact of the World Health Organization's 2015 Treat-All Policy. 

Keywords

Stochastic EM

multistate model

interval-censored data

random effects

Treat-All policy

IeDEA 

Co-Author

Hongbing Zhang, University of Kentucky

First Author

Babatunde Aluko, University of Kentucky

Presenting Author

Babatunde Aluko, University of Kentucky

38: Tailoring BART for Environmental Mixture Studies

Various methods have been developed to investigate complex and collective effects of environmen-tal mixtures on human health. Tree ensemble methods are known for their stability and accuracy in identifying highly correlated and high-dimensional features in the statistical literature, but their use has not been well studied for environmental mixtures analysis.
We tailored the Bayesian Additive Regression Trees (BART) model for environmental mixtures analysis, which allowed a smooth response surface and incorporated confounder adjustment, for both continuous and binary outcomes. We further encompassed component-wise and hierarchical variable selection to accommodate scientific grouping of chemicals. Additionally, we proposed to quantify the marginal contributions of each chemical in the mixture through the Generalized Additive Model (GAM) approximations. A thorough investigation on the proposed approaches was conducted through simulations and a case study with the National Health and Nutrition Examination Survey (NHANES) 2001-2002 data on how persistent organic pollutants influenced leukocyte telomere length, in comparison with the Bayesian Kernel Machine Regression (BKMR) which is one of the most popular mixtures methods.
Our simulation studies demonstrated that the modified BART produced results comparable or superior to BKMR in recovering the true exposure-response surface for both continuous and binary outcomes, especially when chemical groups were considered, with significantly reduced computational time. Both methods effectively identified relevant chemical groups under hierarchical variable selection, but modified BART better distinguished important components within groups. Our case study confirmed these findings, with similar groups identified but different within-group importances estimated. GAM plots accurately summarized individual exposure effects for both modified BART and BKMR fitted results.
We recommend the modified BART as a stable and fast response surface model for environmental mixture analysis, particularly for large sample sizes, binary outcomes and grouped chemicals. GAM approximation is a practical tool for interpreting individual chemical effect in mixtures analysis. 

Keywords

Soft BART

BKMR

Environmental mixture analysis

GAM approximation 

Co-Author(s)

Zhen Chen, NICHD/NIH
Shanshan Zhao, NIEHS/NIH

First Author

Kaizong Ye

Presenting Author

Kaizong Ye

39: Validation of Sodium and Potassium Intake Determined from Urine Samples

Some 60 of 109 study participants provided sodium (Na) and potassium (K) intakes based on 24-hour urine collection samples, and on formulae applied to data from spot urine samples. The Normal-based, percentile, bias-corrected and the bias-corrected and accelerated methods gave confidence interval (CI) estimates for within-pair differences and correlation coefficients (CCs). All bootstrap CIs and achieved significance levels (ASLs) of a test statistic indicated no statistical significance of the mean of the within-pair differences for Na intake. The ASL but not the observed and bootstrap CIs from the truncated distribution for the formula-based estimates of K was statistically different from 0. The bootstrap and observed estimates of the correlation coefficients were statistically significant for all except the correlation of Na intakes based on the PAHO/WHO formula and 24-hour collections. For all other comparisons the bootstrap samples, the observed data and the ASL yielded different conclusions. These results indicate that, to determine validity of formula-based approaches to Na and K intake estimation using urine samples from Jamaicans, there should be use of other approac 

Keywords

validation, bootstrap

urine samples

achieved significance levels 

Co-Author(s)

Nadia Bennett, Caribbean Institute for Health Reseatch
Trevor Ferguson, Caribbean Institute for Health Reseatch

First Author

Novie Younger-Coleman, Caribbean Institute for Health Research (formerly TMRI), UWI, Jamaica

Presenting Author

Novie Younger-Coleman, Caribbean Institute for Health Research (formerly TMRI), UWI, Jamaica

40: What Goes Wrong in Prediction Models if you Ignore Mortality?

Sudden death (SD) is a primary cause of death in the US. While clinical guidelines recommend treating patients of all ages at high risk of SD with an implantable cardiac defibrillator (ICD), questions have been raised regarding benefits for older patients where the force of mortality is high. To investigate this, the PIPER-ICD Study examines the end-of-life experience among older ICD patients to identify longitudinal markers predictive for treatment success. Based on the PIPER-ICD data and extensive simulations, we illustrate how predictions of marker trajectories go wrong if they ignore mortality. Crucially, standard methods fail to acknowledge the joint distribution of the marker and mortality: linear mixed models and joint models imply a "pretend reality" in which patients are considered immortal; marginal methods marginalize over death, treating mortality as a statistical nuisance instead of an outcome of inherent clinical value. We also discuss the clinical implications of this failure, specifically in relation to how health care decisions are made. 

Keywords

Mortality

Longitudinal Marker

Joint models

Marginal models

Linear mixed models

Prediction 

Co-Author(s)

Harrison Reeder
Sebastien Haneuse, Harvard T.H. Chan School of Public Health
Daniel Kramer, Beth Israel Deaconess Medical Center, Harvard Medical School

First Author

Stephanie Armbruster, Harvard University

Presenting Author

Stephanie Armbruster, Harvard University