Bayesian Methods in Epidemiological Studies

Huifang (Ariel) Chen Chair
AstraZeneca
 
Sunday, Aug 3: 2:00 PM - 3:50 PM
4004 
Contributed Papers 
Music City Center 
Room: CC-207C 

Main Sponsor

Section on Statistics in Epidemiology

Presentations

A Bayesian nonparametric approach for detecting interference and estimating causal effects

Classical methods of causal inference typically assume that an experimental intervention influences solely the unit receiving it and does not interfere with the behavior of any other unit. However, it is becoming increasingly common for experiments to contravene this assumption. Estimating causal effects in the presence of treatment interference necessitates an understanding of the dynamics between units and their influence on others' responses. In this study, we consider estimation under the recently proposed K Nearest Neighbor Interference Model (KNNIM), which assumes that a unit's response is influenced by its treatment status and the treatments administered to its K "closest" units. We broaden the KNNIM framework to the scenario where multiple (non-identical) experiments are performed on the same set of units. We develop a novel approach that combines an infinite beta-Bernoulli process Bayesian linear model with the KNNIM framework to allow for the simultaneous discovery of the correct K and accurate estimation of treatment effects. We demonstrate the usefulness of the approach in identifying treatment interferences through simulations. 

Keywords

Causal Inference

Treatment Interference

K Nearest Neighbor Interference Model (KNNIM)

Bayesian Nonparametric 

Co-Author

Michael Higgins, Kansas State University

First Author

Weiqiang Zhi, Kansas State University

Presenting Author

Weiqiang Zhi, Kansas State University

A Bayesian nonparametric approach to causal mediation analysis in CRTs with multiple mediators

Cluster randomized trials (CRTs) with multiple unstructured mediators present significant methodological challenges for causal inference due to within-cluster correlation, interference among units, and the complexity introduced by multiple mediators. Existing causal mediation methods often fall short in simultaneously addressing these complexities, particularly in disentangling mediator-specific effects under interference that are central to studying complex mechanisms. To address this gap, we propose new causal estimands for spillover mediation effects that differentiate the roles of each individual's own mediator and the spillover effects resulting from interactions among individuals within the same cluster. We establish identification results for each estimand and, to flexibly model the complex data structures inherent in CRTs, we develop a new Bayesian nonparametric prior---the Nested Dependent Dirichlet Process Mixture---designed for flexibly capture the outcome and mediator surfaces at different levels. We illustrate our methods our new methods in an analysis of a completed CRT. 

Keywords

Bayesian causal inference

Bayesian Nonparametrics

Interference

Multiple mediators

Spillover Mediation Effect 

Co-Author

Fan Li, Yale School of Public Health

First Author

Yuki Ohnishi, Yale School of Public Health

Presenting Author

Yuki Ohnishi, Yale School of Public Health

Bayesian-based Propensity Score Subclassification Estimator

Subclassification estimators are commonly used to estimate causal effects via the propensity score, offering lower variance compared to weighting methods like inverse probability weighting. Traditionally, the number of strata is set at five without data-driven selection, and even when selected from data, the resulting uncertainty is often ignored. In this study, we propose a novel Bayesian subclassification estimator that accounts for uncertainty in the number of strata rather than selecting a single optimal value. To achieve this, we employ a general Bayesian framework that does not require a likelihood function, avoiding strong assumptions about the outcome model while maintaining flexibility in causal inference. Our proposed method achieves comparable performance to non-Bayesian methods while providing more accurate uncertainty estimation. This approach ensures that uncertainties from the design phase are properly incorporated into the analysis phase, which is often overlooked in conventional methods. 

Keywords

design uncertainty

general Bayes

selection of the number of strata

propensity score

reversible jump MCMC 

Co-Author

Tomotaka Momozaki, Tokyo University of Science

First Author

Shunichiro Orihara, Tokyo Medical University

Presenting Author

Shunichiro Orihara, Tokyo Medical University

Building absolute breast cancer risk prediction models for female Hodgkin lymphoma survivors

Chest radiotherapy strongly increases subsequent breast cancer (BC) risk among female Hodgkin lymphoma (HL) survivors. We aimed to build absolute BC risk prediction models incorporating detailed treatment information and in the process addressed two important challenges in building risk prediction models. First, we proposed a novel weighting approach to estimate relative risks for risk factors that were used to match controls to cases in nested case-control studies to be able to incorporate them into a risk model. Second, we devised an approach to incorporate incidence rates from the general population, accommodating the much higher incidence among cancer survivors through a calibration factor. Both approaches were shown to work well in simulations (unbiased estimates of matching factor relative risks and <10% bias in the calibration factor estimate for many simulation settings) and when building absolute breast cancer risk prediction models. 

Keywords

absolute risk prediction

breast cancer

radiotherapy 

Co-Author(s)

Flora Van Leeuwen, The Netherlands Cancer Institute
Michael Hauptmann, Brandenburg Medical School Theodor Fontane
Ruth Pfeiffer, NIH/NCI

First Author

Sander Roberti, National Cancer Institute

Presenting Author

Sander Roberti, National Cancer Institute

Clustering-Informed Shared-Structure Variational Autoencoder for Missing Data Imputation

Despite advancements in managing healthcare data, missing data in Electronic Health Records (EHR) and patient-reported health data remain a challenge, compromising their usability in healthcare analytics. Conventional imputation methods face limitations such as difficulties in capturing complex non-linear relationships, extended computation times, and constraints in addressing various types of missing data mechanisms. To address this, we propose the clustering-informed shared-structure variational autoencoder (CISS-VAE), building upon the powerful generative Bayesian neural networks. This model can effectively capture complex associations and accommodate various missing data mechanisms, including missing not at random (MNAR). We also develop iterative learning algorithms that further enhance missing data imputation accuracy while preventing overfitting. Comprehensive simulations demonstrate our model's superior accuracy compared to traditional and contemporary methods. We apply our method to EHR data from early-stage breast cancer patients at Memorial Sloan Kettering Cancer Center, aiming to mitigate the impact of missing data and enhance health monitoring and analyses. 

Keywords

Missing Data Imputation

Variational Autoencoder

Missing Not at Random

Electronic Health Records 

Co-Author(s)

Kenneth Seier, Memorial Sloan Kettering Cancer Center
Katherine Panageas, Memorial Sloan-Kettering Cancer Center
Mithat Gonen, Memorial Sloan-Kettering Cancer Center
Yuan Chen, Memorial Sloan Kettering Cancer Center

First Author

Yasin Khadem Charvadeh, Memorial Sloan Kettering Cancer Center

Presenting Author

Yasin Khadem Charvadeh, Memorial Sloan Kettering Cancer Center

Restricted CAR Model for Reliable Life Expectancy Estimates in Philadelphia Census Tracts

Reliable, and ideally smooth, age-specific all-cause mortality rate estimates are needed when estimating life expectancy. These rates, however, can be difficult to estimate in small areas, due to small counts of deaths when subsetting the population in each small area by age and sex. The conditional autoregressive (CAR) framework allows us to integrate spatial dependencies from the data, which helps us produce more reliable estimates, even when count data may be sparse. We estimated tract-level age- and sex-specific mortality rates using a Bayesian Poisson model adaptation of the TOPALS (tool for projecting age patterns using linear splines) – which is useful for producing smooth, age-specific rates – that includes spatial (CAR) random effects. Although smooth estimates are ideal for calculating life expectancy, this approach does come with the risk of oversmoothing rates. This study builds on recent work that developed a restricted CAR model to guard against producing overly smooth and overly precise estimated mortality rates, and extends it to the TOPALS-CAR framework for modelling age-specific rates in census tracts. 

Keywords

bayesian statistics

spatial statistics

spatial epidemiology

disease mapping 

Co-Author

Harrison Quick, University of Minnesota

First Author

Giancarlo Anfuso

Presenting Author

Giancarlo Anfuso

The Bayesian multivariate spatiotemporal approach of Drug Overdose Surveillance in Ohio

Geospatial analysis of the substance use disorder (SUD) population has provided various insights for the surveillance of the SUD population. Numerous data sources have been investigated but the chronic challenge regarding delayed reporting and the scarcity of the data still remains. To overcome this challenge, we conducted the Bayesian multivariate spatiotemporal modeling analysis using the real-time Urine drug test results for diverse sets of drugs (e.g. Fentanyl, Cocaine, Heroine and Methamphetamine). We use the multivariate Bayesian spatiotemporal approach to investigate the shared geospatial pattern of the substance use population. By looking at their shared components, we can investigate the co-evolving pattern of the drug substance use population in each county from 2013 to 2023. With this effort, we can confirm the existing belief about polysubstance use, and identify new shared patterns with newly emerged substances. We also expect information sharing of multiple drugs can help improve the estimation results of small areas. This talk will discuss the analysis results for various sets of drugs and how the map of substance use population changes in the 10-year period in Ohio. 

Keywords

opioid overdose

Bayesian spatiotemporal modeling

substance use disorder

public health surveillance 

Co-Author(s)

John Myers, The Ohio State University
Charles Marks, Millennium Health
Penn Whitley, Millennium Health
Brandon Slover, The Ohio State University
Xianhui Chen, The Ohio State University
Neena Thomas, The Ohio State University
Ping Zhang, The Ohio State University
Naleef Fareed, The Ohio State University
Soledad Fernandez, The Ohio State University

First Author

Joanne Kim, The Ohio State University

Presenting Author

Joanne Kim, The Ohio State University