Tuesday, Aug 5: 2:00 PM - 3:50 PM
4121
Contributed Posters
Music City Center
Room: CC-Hall B
Main Sponsor
Section on Statistics in Epidemiology
Presentations
Modeling the effects from mixtures of exposures is of interest to both epidemiology and toxicology. Due to differences in data and settings, mutually exclusive methods have been developed across the two fields. We take the opportunity to develop a new methodology that borrows advantages from both fields and allows for knowledge to flow across domains. We develop a technique called BAI-LVM that accounts for biological additivity in a mixture response as modeled in toxicology. We show how a straightforward statistical model does not account for biological additivity and how various models from epidemiology relate to each other regarding biological assumptions. Our method produces latent individual dose response curves, providing an easy way to inject prior knowledge from toxicology. The HAND model for biological additivity model is implemented given a consensus that it is biologically most plausible. Simulation studies demonstrate the performance across different scenarios and an application to epidemiological data is provided.
Keywords
biological additivity
epidemiology
toxicology
Bayesian
dose response
Modeling count-based health outcomes in environmental research presents challenges like correlated exposures, non-linear interactions, and spatiotemporal dependencies. We propose a hierarchical Bayesian model that incorporates the negative binomial distribution via data augmentation to address these complexities. This framework integrates variable selection, effect estimation, and hotspot detection to improve inference in exposure-outcome relationships.
We evaluated the model through simulations across eight scenarios, varying exposure correlation, interaction effects, and dependence. Each scenario included 100 datasets of 504 observations across 21 spatial units and 24 time points. The model's utility was further demonstrated using real-world air pollution data.
The proposed approach consistently identified influential exposures, estimated effects, and detected hotspot areas, particularly with appropriate spatiotemporal dependencies. By leveraging the negative binomial distribution, it accounted for data dispersion without additional adjustments. This model provides a robust, unified framework for analyzing count outcomes in environmental health research and policy-making.
Keywords
Count Data
Mixture Exposure
Bayesian Kernel Machine Regression
Air Pollution
Environmental Health
Postpartum depression is a mental disorder experienced by approximately one in seven women within the first year after childbirth. It is considered to have a complex causal relationship over time, making it difficult to evaluate in a classic cohort study that assesses cause and identifies onset at a single point in time. Our study focused on the psychological factors contributing to postpartum depression, specifically examining factors related to parenting anxiety. Our study used Japanese birth cohort data that followed 1701 pregnant women recruited between 2003 and 2005. To identify relevant factors, we used ANOVA for continuous variables and chi-square tests or Fisher's exact tests for categorical variables. A proportional odds model was then applied, with variable selection performed using stepwise method. The results showed that having a playmate for the child and someone to consult or assist with childcare were significantly associated with reduced parenting anxiety at all time points. Given that these factors remained significant over time, the study highlights the importance of continuous, rather than one-time, interventions to support mothers.
Keywords
Postpartum depression
Parenting anxiety
Proportional odds model
Stepwise method
Maternal mental health
Childcare support
As populations age, finding new reliable ways to identify Parkinson's disease is increasingly vital. In addition, identifying biomarkers for early progression of Parkinson's disease will help accelerate the clinical evaluation of the efficacy of new interventions. We investigate the University of Pennsylvania Smell Identification Test (UPSIT) as a diagnostic tool for early stages of Parkinson's disease. To do this, we utilize psychometric models, specifically the Rasch and 3 parameter logistic models, to investigate both if this test can identify Parkinson's disease in early stages of development, and also to gain insights into the psychometric properties of the specific questions on the UPSIT, such as which questions perform well and which questions perform poorly.
Keywords
Parkinson's Disease
Rasch Model
3PL Model
We recently developed a method for data-driven heterogeneous treatment effect subgroup discovery that combines matching and decision trees (mTree). Matching leads to balance within discovered subgroups, overcoming the limitation that insufficient balance in subgroups may lead to findings that cannot be replicated. Decision trees are popular in medicine because they are an effective decision-making technique providing high classification accuracy with a simple representation of gathered knowledge, i.e., they are not “Black Boxes”.
Our previous work did not propose an approach for statistical inference within subgroups. In the typical superpopulation inference framework, re-using data for hypothesis generation and hypothesis testing creates type 1 error issues. To overcome this challenge, we adopt the randomization inference framework wherein the goal is to make inferences about treatment effects in the sample alone. In this work we extend the mTree method to accommodate time-to-event outcomes and develop new randomization inference estimators of within-subgroup additive and multiplicate treatment effects. The methods are applied to a systolic blood pressure intervention trial.
Keywords
Heterogeneous treatment effects
Subgroup discovery
Matching
Decision trees
Randomization inference
The spatial scan statistic is a cornerstone for disease monitoring and outbreak detection. Common implementations typically assume that case counts are correctly observed. However, underreporting frequently occurs in real-world settings, leading to biased estimates and compromising the accuracy of disease surveillance efforts. This study proposes a novel Bayesian spatial scan statistic to address the challenges posed by under-reported case counts in outbreak detection. By accounting for misreporting, the proposed framework enhances the accuracy and robustness of disease cluster identification. Comparisons with existing methods and applications to COVID-19 data demonstrate its superior ability to provide reliable inferences despite reporting limitations.
Keywords
Spatial Clustering
Underreporting
Bayesian Statistics
Spatial scan statistics
COVID-19
Correlations of variables with multiple measurements often arise in human observational studies. In such studies, variables of interest may include household level measurements as well as participant level measurements. Correlations between participant data and household data are difficult to calculate due to the imbalance of available samples across variables.
While several approaches have been developed to estimate correlation when both variables contain repeated measures, less work has explored the scenario where only one variable contains repeated measures and not both. This work evaluates several correlation approaches for this scenario to compare the approaches developed for repeated measures data. Using simulated data, comparisons are made across the following correlation approaches: Pearson's, subject level averaging, regression models, and mixed effects models with compound symmetry covariance matrix. Several simulated scenarios are considered, including varying the underlying true correlation values and different noise levels. Mean Squared Error (MSE), confidence interval width and coverage probabilities are used to assess the methods.
Keywords
correlation
repeated measures
observational data
Reliable and actionable mortality forecasts are crucial for reducing premature deaths and improving healthy life expectancy at national and global levels. While significant progress has been made over the past two decades, the United Nations Population Division projects a slowdown in mortality gains, posing a challenge to ongoing public health efforts. A critical gap in cause-specific mortality forecasts by sex and age at country-specific and global levels hinders the ability of organizations like the WHO to effectively redirect resources and accelerate progress. In this talk, we address the challenges of projecting cause-specific mortality over a long forecast horizon, including data quality, regional heterogeneity, the complexity of competing risks, and preventing unreasonably extreme trends over the long term. We will extend the singular value decomposition (SVD) based Lee-Carter model into a Bayesian setting to provide country-, age-, and gender-specific projections with uncertainty bounds. The improved mortality estimation and forecasts support WHO's efforts to accelerate progress in healthy life expectancy and prevent premature mortality worldwide.
Keywords
Mortality Forecasting
Bayesian Hierarchical Modeling
Time Series
Multiple Populations
This project applies the integer-valued autoregressive (INAR) clustering method by Roick et al. (2021) to analyze daily COVID-19 deaths in US counties. Unlike national and state data, county-level death counts are often low (0–5 daily deaths), making traditional time series clustering methods unreliable. We test whether INAR-based mixtures, designed for autocorrelated integer-valued data, can better group counties with similar mortality patterns. Using CDC data, we cluster counties based on daily death trajectories. We then compare clusters to demographic factors (e.g., vaccination rates, population density), identifying shared trends. For example, rural counties with low healthcare access form distinct clusters compared to urban areas. Preliminary results suggest that INAR models outperform distance-based methods (e.g., DTW) for low-count data. This approach highlights the importance of tailored statistical methods for discrete health data, common in disease tracking and local policy evaluations. The poster will present visualizations of cluster patterns, model diagnostics, and insights into how INAR methods can address challenges in analyzing sparse county-level data.
Keywords
Time series
INAR models
COVID-19 mortality
county-level clustering
low-count data
Longitudinal modified treatment policies (LMTPs) quantify the effects of interventions that depend on the natural value of exposure, generalizing policy-relevant quantities, such as "stochastic" and "shift" interventions. The current LMTP estimation approach yields effects on outcomes measured at the end of a study; however, repeated measures data often contains time-varying outcomes measured at each visit and interest may lie in estimating effects on the rate of change in these outcomes over time. For example, one may wish to quantify the effect of an LMTP on the rate of progression of a disease. We extend the LMTP approach to estimate the effect on change in a time-varying outcome over time and propose a hypothesis testing framework to formally test whether the outcome trajectory under an LMTP differs from the natural outcome trajectory. Repeated measures data also frequently has unique data complications that must be considered, such as irregular visit times, where the visit timing varies among individuals from some pre-specified time. We propose an extension to our work that permits effect estimation and hypothesis testing for an LMTP in a setting with irregular visit times.
Keywords
Causal inference
Longitudinal data
Modified treatment policies
Rates of change
Nonparametric
Given the wide availability of multi-wave cross-sectional studies, methods that potentially strengthen causal inference are attractive. We propose the Cross-sectional Enrichment by Sample Extrapolation (CESE) method, which matches observations from serial cross-sectional data to find proxies for longitudinal trajectories. Outcomes events from later waves can be paired with exposure data from earlier waves to estimate exposure-outcome associations. A major strength is that CESE does not require longitudinal data. In this abstract, we will describe the statistical assumptions required and demonstrate an application of CESE to estimate incident substance use disorder (SUD). Using a cross-sectional survey (n=117,590), we match individuals across two calendar years; n=10,444 participated in both years. Using a combination of Mahalanobis distance and greedy matching, CESE matched 24.48% of returning participants to themselves. Among chronic pain patients, an estimated 5.5% had incident SUD, similar to an estimate of iatrogenic opioid abuse among pain patients(4.7%). The CESE method is a potential tool for using multi-wave cross sectional data to estimate population-level incidence.
Keywords
Matching
Cross-sectional data collection
Population survey
Epidemiology
Large healthcare claims databases, which aggregate claims from commercial insurers, are increasingly being used to generate real-world evidence in medical research. Nearly 10,000 manuscripts have been published, and the pace of output is accelerating. Despite their widespread use, these databases have not been rigorously vetted against ground-truth data. Representation in such datasets has been found to be systematically biased along racial and socioeconomic lines. These same factors are known to be effect modifiers for a myriad of conditions and treatments in medicine, and the combination of inconsistent sampling and effect modification can give rise to external validity bias. In [Dahlen Deng & Charu 2024], we undertook the most detailed empirical analysis of external validity bias in healthcare claims data to date, focusing on the rates of a comprehensive set of inpatient procedures, for which a unique ground-truth dataset exists. We found large variation in the extent of the bias across procedures, including 22.8% that were underestimated by more than a factor of 2. Further, we found a significant relationship between social determinants of health and the magnitude of bias.
Keywords
external validity bias
healthcare claims databases
Food allergies result from the complex interaction of genetic and environmental factors over time. In this study, we will utilize birth cohort study data that target food allergy and track births over time, and will use incidence data from two time points to quantify the impact of risk. We used data from the Japanese Birth Cohort Study, which followed 1550 individuals from gestational age. That study began recruiting between 2003 and 2005 and continues to track to the present. We searched for children who had not developed food allergy at 1 year of age and who would develop food allergy at 3 years of age. Logistic regression analysis was used to examine the risk of developing food allergy given the explanatory variables. Maternal history of allergic disease, infant eczema, atopic dermatitis, egg removal had an increased risk of food allergy. Duvets had a reduced risk of food allergy. In genetic factors, there was a trend toward an increased risk of developing food allergy if the mother had a history of allergic disease or had already developed other allergies. In environmental factors, elimination of eggs increased the risk, and use of a down comforter decreased the risk.
Keywords
food allergy
infants
genetic factors
environmental factors
infant eczema
atopic dermatitis
Randomized controlled trials are the standard method for estimating causal effects, ensuring statistical power and confidence through adequate sample sizes. However, achieving sufficient sample sizes is often challenging. This study proposes a novel method to estimate the average treatment effect (ATE) in a target population by integrating and reconstructing information from previous trials with only summary statistics of outcomes and covariates via meta-analysis. The proposed approach combines meta-analysis, transfer learning, and weighted regression. Unlike existing methods, which estimate the ATE based on the distribution of source trials, our method directly estimates the ATE for the target population. The proposed method requires only the means and variances of outcomes and covariates from the source trials and is theoretically valid under the covariate shift assumption, regardless of the distribution of covariates in the source trials. Simulations and real-data analyses demonstrate that the proposed method yields a consistent estimator and achieves higher statistical power than the estimator derived solely from the target trial.
Keywords
conditional average treatment effect
meta-analysis
transfer learning
weighted linear regression
Smoking is the leading cause of preventable death (CDC, 2024). Disparities in Missouri exist among tobacco users including cancer patients. Smoking rates are higher in underrepresented populations with treatment being lower. Health organizations and clinicians are crucial to providing tobacco treatment. More than 70% of individuals who smoke see a clinician annually and report a desire to stop. The aims of the project include defining tobacco treatment, scaling the model to rural Missouri, and reducing disparities.
Data for patients who smoke were obtained from "Informatics for Integrating Biology and the Bedside (i2b2)". The LEAD intervention will evaluate the rate for counseling to quit smoking, comparing pre and post intervention. Overall, baseline data shows cancer patients (5.1% cancer patients vs 3.5% non-cancer), older age (3.1% 65+ years old vs 1.2% less than 65 years) and white patients (4.0% white vs 1.7% other races) received a higher rate of counseling. We hypothesize that these disparities will be reduced in post-intervention. As the project proceeds with implementation, the goal is to ensure clinicians are consistent with providing tobacco treatment to patients.
Keywords
Cancer
Tobacco Treatment
Healthcare disparities
Rural Health
Patient Outcomes
Common Data Model (CDM)
Vectors are living organisms, such as mosquitoes, which transmit disease between humans. Vector-borne diseases have a high global burden, particularly among the world's poorest populations. The worst effects can be mitigated with advanced warning of these climate-mediated diseases. In Brazil, the infectious diseases Dengue, Chikungunya and Zika all co-circulate. The same mosquito vectors spread the diseases, which pose significant health and mortality risks. Infodengue is a surveillance system in Brazil for these three diseases, with a granularity of spatio-temporal data rarely seen in such systems. The system does not currently explicitly predict future cases. We present a factor analysis model, a flexible data reduction technique, in the Bayesian framework for disease forecasting with extensions to this technique relevant to our problem, including spatiotemporal modelling and joint modelling of shared vectors.
Keywords
Factor Analysis
Forecasting
Vector-Borne Disease
Integrated Nested Laplace Approximations
Machine learning (ML) algorithms are effective in predicting clinical outcomes. This study aimed to identify ML models with the best performance for predicting mortality and possibly improving patient outcomes in drug overdose care. This study included data on 1452 patients seen at Emergency Departments in East Texas (9/1/2021-12/31/2024) for overdose care. Forty features were selected for six ML models, including decision tree, gradient boosting, logistic regression, neural network, random forest, and support vector machine to predict in-hospital mortality. ML models were compared by the area under the receiver operating characteristics curve (AUC) and KS (Youden). The analysis revealed that the random forest model was the best with superior AUC and KS values. The five most crucial features in prediction across all models are the Glasgow coma scale, systolic blood pressure, BMI, age, and diastolic blood pressure at admission. The random forest model was the best-performing ML model, making it more reliable in predicting mortality with the potential to significantly impact clinical practice, underlining the importance of such research in predictive modeling in Addiction medicine.
Keywords
Drug overdose death
predicting in-hospital mortality
machine learning algorithms
Co-Author(s)
Emmanuel Elueze, Department of Graduate Medical Education, The University of Texas Tyler School of Medicine
Karan Singh, Department of Epidemiology and Biostatistics, The University of Texas Tyler School of Medicine
First Author
Tuan Le, UT Tyler School of Medicine
Presenting Author
Tuan Le, UT Tyler School of Medicine
Selection bias is a major obstacle toward valid causal inference in epidemiology. Over the past decade, several simple graphical rules based on causal diagrams have been proposed as the sufficient identification conditions for addressing selection bias and recovering causal effects. However, these simple graphical rules are usually coupled with specific identification strategies and estimators. In this article, we show two important cases of selection bias that cannot be addressed by these simple rules and their estimators: one case where selection is a descendant of a collider of the treatment and the outcome, and the other case where selection is affected by the mediator. To address selection bias in these two cases, we construct identification formulas by the g-computation and the inverse probability weighting (IPW) methods based on single-world intervention graphs (SWIGs). They are generalized to recover the average treatment effect by adjusting for post-treatment upstream causes of selection. We propose two IPW estimators and their variance estimators to recover the average treatment effect in the presence of selection bias in these two cases.
Keywords
selection bias
causal inference
causal diagrams
SWIG
Machine learning model can help identify multifaceted factors influencing tobacco transitions. A random forest model is developed to predict smoking relapse, focusing on racial disparities and vaping characteristics. Data are drawn from the Population Assessment of Tobacco and Health (PATH) Study adult interview files. Former combustible cigarette smokers at baseline (Wave 5) were followed up one year later (Wave 6). Predictors (n=100) include a wide range of social demographics, psychosocial factors, health status, tobacco and substance use behaviors, and vaping characteristics. The findings reveal notable racial disparities in smoking relapse predictors, along with distinct roles of vaping characteristics across racial groups. Unique social, behavioral, and health factors are crucial for improving smoking cessation outcomes.
Keywords
e-cigarettes
random forest
PATH study
First Author
Hongying Dai, University of Nebraska Medical Center
Presenting Author
Hongying Dai, University of Nebraska Medical Center
Despite advancements in immune checkpoint inhibitors (ICIs) for cancer, ICIs can trigger immune syndromes called immune-related adverse events (irAEs), often linked with survival. However, many time-to-event studies overlook immortal time bias. We examined this bias using time-naïve, time-dependent, and landmark analyses in 3343 cancer patients at OSU CCC. Incident irAEs were defined as any gastrointestinal, pulmonary, dermatological, endocrine, or hepatobiliary AEs post-ICI infusion. Kaplan-Meier and Simon-Makuch methods calculated cumulative incidence, and Cox models evaluated survival with and without time dependence. A total of 1739 patients died, and 48.5% experienced irAEs. Median survival was 10.2 months, with a median time to first irAE of 1.4 months. Time-naïve analysis showed irAEs were associated with significantly lower mortality (P <0.01) while time-dependent analysis showed higher mortality (P <0.01). Cox models yielded HR 1.41 (95% CI: 1.27-1.55) with time dependence and HR 0.86 (95% CI: 0.78-0.94) without. Using different landmark values did not mitigate bias. Accounting for time dependence is needed to avoid biased interpretations of irAEs effects on survival.
Keywords
Immortal time bias
irAE
ICI
Time-Dependent
Clinical trials comparing treatment effectiveness for rare diseases such as membranous nephropathy (MN) can be limited by short follow-up and small sample sizes. We demonstrate how a matched design combined with sequential re-entry and multiple imputation can be applied to observational data to generate reliable comparative effectiveness evidence while maximizing sample size. Individuals can have multiple eligible treatment initiations with this approach, and incomplete cases are retained. Propensity scores estimated with a GEE were used in 1:1 matching without replacement with hard matching on treatment history. Hazard ratios with robust confidence intervals that account for multilevel non-nested clustering were obtained in each imputed dataset and pooled. Restricted mean survival times with appropriate bootstrap confidence intervals were also pooled. An analog to per-protocol analysis censored individuals if they stopped adhering to treatment protocol and used inverse probability-of-censoring weights to address artificial censoring. Our application compared the long-term effectiveness of two immunosuppressants for MN, and results were consistent with a shorter 24-month trial.
Keywords
comparative effectiveness
matching
sequential re-entry
multiple imputation
rare disease
robust variance estimation
Cluster randomized trials are often used to evaluate diverse types of interventions in which groups of individuals are randomized, and the interventions are delivered at the cluster level. These types of randomized trials do not always effectively balance cluster- and individual-level characteristics, resulting in a higher risk of bias. We implemented covariate-constrained randomization (CCR) in a longitudinal cluster-randomized de-implementation trial with over 40 hospitals enrolled to evaluate two de-implementation strategies for reducing overuse of continuous pulse oximetry monitoring in children with bronchiolitis. CCR was performed using the baseline over-monitoring rate of each hospital and two other hospital characteristics, which were strong independent predictors of outcome. The current metrics for balance in CCR only consider the mean levels of covariates between arms, ignoring the full distributions of covariates. We examine the impact of outliers in covariates, particularly in combination with a small number of clusters on the randomization. We propose several strategies, including a stratified randomization procedure, to improve the covariate balance at baseline.
Keywords
Covariate-Constrained Randomization
Cluster Randomized Trials
Implementation Science
Mediation pathway often involves multiple mediators and selecting true mediators is an essential step in addressing key scientific questions. We propose a novel adoption of the Measurement Error Model (MEM) framework in mediation analysis for mediator selection. The MEM framework enables variable selection by deliberately introducing measurement errors to predictors, identifying variables whose predictive utility is most sensitive to such perturbations. When introducing a certain amount of measurement error into the mediation pathway and distributing across multiple mediators, the optimization of the joint MEM likelihood will assign the majority of measurement errors to mediators that are not important in the mediation system while maintaining important mediators less impacted, effectively achieving variable selection. This approach is readily to extend naturally to path selection for identifying true mediators. We demonstrate the efficacy of the proposed method through extensive simulations across various scenarios, comparing its performance with existing approaches.
Keywords
Mediation Analysis
Measurement Error Models
Co-Author
Mengling Liu, New York University Grossman School of Medicine
First Author
Chen Liang, New York University
Presenting Author
Chen Liang, New York University
A key challenge in estimating the causal effect of a treatment on an outcome in observational studies is unmeasured confounding, which causes bias. Traditional techniques such as propensity score-based matching, stratification, and marginal structural models can control for measured confounding but are inadequate to deal with unmeasured confounding. Several advanced methods have been proposed to tackle unmeasured confounding, such as the instrumental variable (IV) approach, regression discontinuity design (RDD), and difference in difference (DID). These methods exploit assignment mechanisms that determine treatment status but are not related to any unmeasured confounding. In this presentation, we will first explore the issues arising from unmeasured confounding, then provide an overview of commonly used methods for addressing unmeasured confounding, including the assumptions, key concepts, and implementations. Finally, we will review examples of how these advanced methods are applied in clinical and healthcare research.
Keywords
Unmeasured confounding
causal inference
instrumental variable
regression discontinuity
observational studies
We introduce a novel time series model that integrates Integer-Valued Generalized Autoregressive Conditional Heteroskedasticity (INGARCH) dynamics with a COM-Poisson distribution, incorporating a spatial modeling term to account for spatial dependence. The COM-Poisson distribution allows for overdispersion and underdispersion in count data, making it more flexible for capturing real-world phenomena. The GARCH component models the time-varying conditional variance of the process, while the spatial term accounts for the influence of neighboring data points, enabling the model to address spatial correlations. This approach provides a comprehensive framework for analyzing time series count data with both heteroskedasticity and spatial dependence, which is particularly useful in fields such as epidemiology and infectious disease. The COM-Poisson INGARCH spatial model benefits public health and health policy researchers by allowing for more accurate predictions. This will assist public health officials and policymakers to make evidence-based decisions and improve public health outcomes. The model's performance is evaluated through simulation studies and applied to a real-world dataset.
Keywords
COM-Poisson
Integer-valued GARCH models
Spatial Modeling
Time Series of Counts
In this study, daily diary data from the publicly available Texas Longitudinal Study of Adolescent Stress Resilience and Health dataset was used to examine the longitudinal relationship between a series of stress and sex hormones with self-reported pain (i.e. headache, back pain, stomach pain). Latent growth modeling within a generalized structural equation modelling framework was used to assess these relationships, accounting for individual differences with random intercepts and slopes and adjustment for other covariates. In the original study, a total of 975 students from 9th grade high school completed a self-reported daily diary on their mental and physical health for 10 days alongside salivary samples measuring cortisol, corticosterone, cortisone, DHEA-s, testosterone, estradiol, and progesterone. Higher corticosterone (β = -0.24, p = 0.048) and testosterone (β = -0.40, p = 0.042) levels were significantly linked to a lower likelihood of back pain, while cortisol showed a trend toward a positive association (β = 0.40, p = 0.052) with back pain, but significantly predicted with higher likelihood of headache (β = 0.432, p = 0.004).
Keywords
Latent Growth Curve Analysis
Structural Equation Modeling
Longitudinal Study
Hormone-Pain Relationship
Texas Longitudinal Study
After linking the Florida de-identified birth records data to PM Speciation chemicals data, logistic regression analyses were conducted to assess associations between maternal exposure to PM2.5 speciation metals during pregnancy and the risk of neonatal respiratory distress syndrome (RDS), adjusting for various covariates. Study findings highlight the multifaceted nature of RDS risk, reaffirming known risk factors such as preterm birth, low birth weight, and maternal health conditions. Complex interactions among pollutants and maternal health factors were observed, emphasizing the importance of considering synergistic effects in risk assessment. Additionally, race and ethnicity were significant moderating factors, with nuances observed in Hispanic subgroups. Maternal demographics, pregnancy complications, and maternal PM2.5 pollutant exposure affect risk of RDS through complex interactions. Targeted interventions that reduce exposure to harmful pollutants, particularly among high-risk populations, may mitigate RDS burden.
Keywords
respiratory distress syndrome (RDS)
PM2.5 speciation chemicals
air pollutants
particulate matter
metals
interactions
Epidemiological data have not been used much for forecasting, as most of them are used for confirmatory risk assessment, but there is a growing need to predict frailty in a hyper-aged society in Japan. Preventing frailty is crucial in aging societies because frailty is one of the main risk factors for loss of independence in older adults. We focused on gut microbiota, which previous studies have shown to be associated with frailty. We conducted a comprehensive exploration of the involvement of gut microbiota in the two factors of frailty for elderly Japanese subjects. Our study subjects were 798 Japanese country side residents aged 65 years or older. In this study, frailty wes explored using the L1-logistic regression and Backward/Forward method of logistic regression and Random Forest, which is a tree-based variable selection method.). As a result, two gut microbiota associated with psychological frailty were found. The results obtained in this study were found to be involved in frailty from previous studies. It was suggested that gut microbiota may play an important role in psychological frailty in the Japanese.
Keywords
Frailty
Loss of independence
Gut microbiota
Psychological frailty
Pediatric cancers vary greatly in etiology, treatment response, and mortality, with significant disparities by histology-based cancer type and race-ethnicity. Using the SEER dataset of 101,328 pediatric cancer patients (1975-2016), we employed the frailty model to estimate mortality risk disparities among racial and ethnic groups. Non-Hispanic African American (NHAA) and Hispanic patients had higher mortality risks than non-Hispanic Caucasians, with NHAA patients facing the worst outcomes. The highest disparities (aHR 1.50-2.00) were in cancers of the digestive system, liver, endocrine system, acute lymphocytic leukemia, urinary systems, chronic leukemia, and Hodgkin lymphoma. Moderate disparities (aHR 1.20-1.49) were in cancers of the brain, CNS, eye and orbit, female genital system, acute myeloid leukemia (AML), other leukemias, non-Hodgkin lymphoma, and soft tissue including the heart. Low disparities (aHR 1.10-1.19) were in cancers of bones and joints, and the adrenal gland. Identifying distinct patterns and exploring subgroups in longitudinal trajectories is a key research interest.
Keywords
Race-Ethnic Disparity
Precision Estimate
Pediatric Cancer Mortality
Frailty Model
SEER Data
Pediatric Oncology Outcomes
Chronic obstructive pulmonary disease (COPD) is a leading cause of death in the U.S. While overall COPD prevalence remained stable nationally during 2011–2021, data to assess local level trends are lacking. By applying CDC's PLACES methodology, we estimated county level COPD prevalence and variance (σ2) among adults ≥18 years during 2011–2021 using annual Behavioral Risk Factor Surveillance System data, census county-level population estimates and 5-year American Community survey data. A Bayesian hierarchical regression model was constructed for county-level COPD prevalence over time. It was assumed to follow a normal distribution, with mean modeled as a linear function of the year for overall trend, county-level random slopes by years to capture the temporal trend for each county, and variance assumed to be σ2. A conditional autoregressive model was incorporated to account for the spatial dependency among counties. Results showed that 53 counties exhibited significant increasing trends in COPD estimates, 87 counties had decreasing trends, and the rest remained stable. Findings suggest the importance of monitoring trends in areas where public health interventions are needed.
Keywords
Behavioral Risk Factor Surveillance System
Bayesian hierarchical regression
Chronic obstructive pulmonary disease
CDC’s PLACES
conditional autoregressive model
The International Epidemiology Databases to Evaluate AIDS (IeDEA) is a global research consortium that provides extensive HIV/AIDS data worldwide. In this study, we propose multistate models (MSMs) to characterize HIV progression across clinical stages while addressing data complexities, including interval-censored and clustered event history data where we propose a Stochastic Expectation-Maximization (Stochastic EM) algorithm to reduce computation intensity. We use simulation to evaluate the performance of these proposed methods and apply the method to Central-Africa IeDEA data to evaluate the impact of the World Health Organization's 2015 Treat-All Policy.
Keywords
Stochastic EM
multistate model
interval-censored data
random effects
Treat-All policy
IeDEA
Various methods have been developed to investigate complex and collective effects of environmen-tal mixtures on human health. Tree ensemble methods are known for their stability and accuracy in identifying highly correlated and high-dimensional features in the statistical literature, but their use has not been well studied for environmental mixtures analysis.
We tailored the Bayesian Additive Regression Trees (BART) model for environmental mixtures analysis, which allowed a smooth response surface and incorporated confounder adjustment, for both continuous and binary outcomes. We further encompassed component-wise and hierarchical variable selection to accommodate scientific grouping of chemicals. Additionally, we proposed to quantify the marginal contributions of each chemical in the mixture through the Generalized Additive Model (GAM) approximations. A thorough investigation on the proposed approaches was conducted through simulations and a case study with the National Health and Nutrition Examination Survey (NHANES) 2001-2002 data on how persistent organic pollutants influenced leukocyte telomere length, in comparison with the Bayesian Kernel Machine Regression (BKMR) which is one of the most popular mixtures methods.
Our simulation studies demonstrated that the modified BART produced results comparable or superior to BKMR in recovering the true exposure-response surface for both continuous and binary outcomes, especially when chemical groups were considered, with significantly reduced computational time. Both methods effectively identified relevant chemical groups under hierarchical variable selection, but modified BART better distinguished important components within groups. Our case study confirmed these findings, with similar groups identified but different within-group importances estimated. GAM plots accurately summarized individual exposure effects for both modified BART and BKMR fitted results.
We recommend the modified BART as a stable and fast response surface model for environmental mixture analysis, particularly for large sample sizes, binary outcomes and grouped chemicals. GAM approximation is a practical tool for interpreting individual chemical effect in mixtures analysis.
Keywords
Soft BART
BKMR
Environmental mixture analysis
GAM approximation
Some 60 of 109 study participants provided sodium (Na) and potassium (K) intakes based on 24-hour urine collection samples, and on formulae applied to data from spot urine samples. The Normal-based, percentile, bias-corrected and the bias-corrected and accelerated methods gave confidence interval (CI) estimates for within-pair differences and correlation coefficients (CCs). All bootstrap CIs and achieved significance levels (ASLs) of a test statistic indicated no statistical significance of the mean of the within-pair differences for Na intake. The ASL but not the observed and bootstrap CIs from the truncated distribution for the formula-based estimates of K was statistically different from 0. The bootstrap and observed estimates of the correlation coefficients were statistically significant for all except the correlation of Na intakes based on the PAHO/WHO formula and 24-hour collections. For all other comparisons the bootstrap samples, the observed data and the ASL yielded different conclusions. These results indicate that, to determine validity of formula-based approaches to Na and K intake estimation using urine samples from Jamaicans, there should be use of other approac
Keywords
validation, bootstrap
urine samples
achieved significance levels
Sudden death (SD) is a primary cause of death in the US. While clinical guidelines recommend treating patients of all ages at high risk of SD with an implantable cardiac defibrillator (ICD), questions have been raised regarding benefits for older patients where the force of mortality is high. To investigate this, the PIPER-ICD Study examines the end-of-life experience among older ICD patients to identify longitudinal markers predictive for treatment success. Based on the PIPER-ICD data and extensive simulations, we illustrate how predictions of marker trajectories go wrong if they ignore mortality. Crucially, standard methods fail to acknowledge the joint distribution of the marker and mortality: linear mixed models and joint models imply a "pretend reality" in which patients are considered immortal; marginal methods marginalize over death, treating mortality as a statistical nuisance instead of an outcome of inherent clinical value. We also discuss the clinical implications of this failure, specifically in relation to how health care decisions are made.
Keywords
Mortality
Longitudinal Marker
Joint models
Marginal models
Linear mixed models
Prediction