Sunday, Aug 3: 2:00 PM - 3:50 PM
4004
Contributed Papers
Music City Center
Room: CC-207C
Main Sponsor
Section on Statistics in Epidemiology
Presentations
Classical methods of causal inference typically assume that an experimental intervention influences solely the unit receiving it and does not interfere with the behavior of any other unit. However, it is becoming increasingly common for experiments to contravene this assumption. Estimating causal effects in the presence of treatment interference necessitates an understanding of the dynamics between units and their influence on others' responses. In this study, we consider estimation under the recently proposed K Nearest Neighbor Interference Model (KNNIM), which assumes that a unit's response is influenced by its treatment status and the treatments administered to its K "closest" units. We broaden the KNNIM framework to the scenario where multiple (non-identical) experiments are performed on the same set of units. We develop a novel approach that combines an infinite beta-Bernoulli process Bayesian linear model with the KNNIM framework to allow for the simultaneous discovery of the correct K and accurate estimation of treatment effects. We demonstrate the usefulness of the approach in identifying treatment interferences through simulations.
Keywords
Causal Inference
Treatment Interference
K Nearest Neighbor Interference Model (KNNIM)
Bayesian Nonparametric
Cluster randomized trials (CRTs) with multiple unstructured mediators present significant methodological challenges for causal inference due to within-cluster correlation, interference among units, and the complexity introduced by multiple mediators. Existing causal mediation methods often fall short in simultaneously addressing these complexities, particularly in disentangling mediator-specific effects under interference that are central to studying complex mechanisms. To address this gap, we propose new causal estimands for spillover mediation effects that differentiate the roles of each individual's own mediator and the spillover effects resulting from interactions among individuals within the same cluster. We establish identification results for each estimand and, to flexibly model the complex data structures inherent in CRTs, we develop a new Bayesian nonparametric prior---the Nested Dependent Dirichlet Process Mixture---designed for flexibly capture the outcome and mediator surfaces at different levels. We illustrate our methods our new methods in an analysis of a completed CRT.
Keywords
Bayesian causal inference
Bayesian Nonparametrics
Interference
Multiple mediators
Spillover Mediation Effect
Co-Author
Fan Li, Yale School of Public Health
First Author
Yuki Ohnishi, Yale School of Public Health
Presenting Author
Yuki Ohnishi, Yale School of Public Health
Subclassification estimators are commonly used to estimate causal effects via the propensity score, offering lower variance compared to weighting methods like inverse probability weighting. Traditionally, the number of strata is set at five without data-driven selection, and even when selected from data, the resulting uncertainty is often ignored. In this study, we propose a novel Bayesian subclassification estimator that accounts for uncertainty in the number of strata rather than selecting a single optimal value. To achieve this, we employ a general Bayesian framework that does not require a likelihood function, avoiding strong assumptions about the outcome model while maintaining flexibility in causal inference. Our proposed method achieves comparable performance to non-Bayesian methods while providing more accurate uncertainty estimation. This approach ensures that uncertainties from the design phase are properly incorporated into the analysis phase, which is often overlooked in conventional methods.
Keywords
design uncertainty
general Bayes
selection of the number of strata
propensity score
reversible jump MCMC
Chest radiotherapy strongly increases subsequent breast cancer (BC) risk among female Hodgkin lymphoma (HL) survivors. We aimed to build absolute BC risk prediction models incorporating detailed treatment information and in the process addressed two important challenges in building risk prediction models. First, we proposed a novel weighting approach to estimate relative risks for risk factors that were used to match controls to cases in nested case-control studies to be able to incorporate them into a risk model. Second, we devised an approach to incorporate incidence rates from the general population, accommodating the much higher incidence among cancer survivors through a calibration factor. Both approaches were shown to work well in simulations (unbiased estimates of matching factor relative risks and <10% bias in the calibration factor estimate for many simulation settings) and when building absolute breast cancer risk prediction models.
Keywords
absolute risk prediction
breast cancer
radiotherapy
Despite advancements in managing healthcare data, missing data in Electronic Health Records (EHR) and patient-reported health data remain a challenge, compromising their usability in healthcare analytics. Conventional imputation methods face limitations such as difficulties in capturing complex non-linear relationships, extended computation times, and constraints in addressing various types of missing data mechanisms. To address this, we propose the clustering-informed shared-structure variational autoencoder (CISS-VAE), building upon the powerful generative Bayesian neural networks. This model can effectively capture complex associations and accommodate various missing data mechanisms, including missing not at random (MNAR). We also develop iterative learning algorithms that further enhance missing data imputation accuracy while preventing overfitting. Comprehensive simulations demonstrate our model's superior accuracy compared to traditional and contemporary methods. We apply our method to EHR data from early-stage breast cancer patients at Memorial Sloan Kettering Cancer Center, aiming to mitigate the impact of missing data and enhance health monitoring and analyses.
Keywords
Missing Data Imputation
Variational Autoencoder
Missing Not at Random
Electronic Health Records
Reliable, and ideally smooth, age-specific all-cause mortality rate estimates are needed when estimating life expectancy. These rates, however, can be difficult to estimate in small areas, due to small counts of deaths when subsetting the population in each small area by age and sex. The conditional autoregressive (CAR) framework allows us to integrate spatial dependencies from the data, which helps us produce more reliable estimates, even when count data may be sparse. We estimated tract-level age- and sex-specific mortality rates using a Bayesian Poisson model adaptation of the TOPALS (tool for projecting age patterns using linear splines) – which is useful for producing smooth, age-specific rates – that includes spatial (CAR) random effects. Although smooth estimates are ideal for calculating life expectancy, this approach does come with the risk of oversmoothing rates. This study builds on recent work that developed a restricted CAR model to guard against producing overly smooth and overly precise estimated mortality rates, and extends it to the TOPALS-CAR framework for modelling age-specific rates in census tracts.
Keywords
bayesian statistics
spatial statistics
spatial epidemiology
disease mapping
Geospatial analysis of the substance use disorder (SUD) population has provided various insights for the surveillance of the SUD population. Numerous data sources have been investigated but the chronic challenge regarding delayed reporting and the scarcity of the data still remains. To overcome this challenge, we conducted the Bayesian multivariate spatiotemporal modeling analysis using the real-time Urine drug test results for diverse sets of drugs (e.g. Fentanyl, Cocaine, Heroine and Methamphetamine). We use the multivariate Bayesian spatiotemporal approach to investigate the shared geospatial pattern of the substance use population. By looking at their shared components, we can investigate the co-evolving pattern of the drug substance use population in each county from 2013 to 2023. With this effort, we can confirm the existing belief about polysubstance use, and identify new shared patterns with newly emerged substances. We also expect information sharing of multiple drugs can help improve the estimation results of small areas. This talk will discuss the analysis results for various sets of drugs and how the map of substance use population changes in the 10-year period in Ohio.
Keywords
opioid overdose
Bayesian spatiotemporal modeling
substance use disorder
public health surveillance
Co-Author(s)
John Myers, The Ohio State University
Charles Marks, Millennium Health
Penn Whitley, Millennium Health
Brandon Slover, The Ohio State University
Xianhui Chen, The Ohio State University
Neena Thomas, The Ohio State University
Ping Zhang, The Ohio State University
Naleef Fareed, The Ohio State University
Soledad Fernandez, The Ohio State University
First Author
Joanne Kim, The Ohio State University
Presenting Author
Joanne Kim, The Ohio State University