Recent Advancement of Statistical Methods for Environmental Health Studies

Daniel Zilber Chair
NIEHS
 
Shanshan Zhao Organizer
NIEHS/NIH
 
Monday, Aug 4: 2:00 PM - 3:50 PM
0178 
Invited Paper Session 
Music City Center 
Room: CC-207A 

Keywords

Environmental Health

Geospatial Data

Environmental Mixtures

Exposome 

Applied

Yes

Main Sponsor

ENAR

Co Sponsors

Section on Statistics and the Environment
Section on Statistics in Epidemiology

Presentations

Bayesian Transfer Learning for Artificially Intelligent Geospatial Systems: A Predictive Stacking Approach

Building artificially intelligent geospatial systems require rapid delivery of spatial data analysis at massive scales with minimal human intervention. Depending upon their intended use, data analysis may also entail model assessment and uncertainty quantification. This article devises transfer learning frameworks for deployment in artificially intelligent systems, where a massive data set is split into smaller data sets that stream into the analytical framework to propagate learning and assimilate inference for the entire data set. Specifically, we introduce Bayesian predictive stacking for multivariate and spatial data and demonstrate its effectiveness in rapidly analyzing massive data sets. Furthermore, we make inference feasible in a reasonable amount of time, and without excessively demanding hardware settings. We also discuss Bayesian predictive stacking for spatial-temporal models, where the primary inferential objective is to provide inference on the latent spatial random field and conduct spatial predictions at arbitrary locations. We exploit analytically tractable posterior distributions for regression coefficients of predictors and the realizations of the spatial process conditional upon process parameters. We subsequently combine such inference by stacking these models across the range of values of the hyper-parameters. We devise predictive stacking in a manner that is computationally efficient without resorting to iterative algorithms such as Markov chain Monte Carlo (MCMC) and can exploit the benefits of parallel computations. We illustrate the effectiveness of this approach in extensive simulation experiments and subsequently analyze massive data sets from climate science and wearable devices data. 

Keywords

GeoAI

Bayesian Transfer Learning

Predictive Stacking

Spatial-Temporal data 

Co-Author(s)

Luca Presicce, Bicocca University
Sudipto Banerjee, University of California Los Angeles

Speaker

Sudipto Banerjee, University of California Los Angeles

Observational PFAS Studies and Spatial Causal Inference Methods

Unmeasured spatial confounding complicates exposure effect estimation in environmental health studies. This problem is exacerbated in studies with multiple health outcomes and environmental exposure variables, as the source and magnitude of confounding bias may differ across exposure/outcome pairs. We propose to mitigate the effects of spatial confounding in multivariate studies by projecting to the spectral domain to separate relationships by the spatial scale, and assuming that the confounding bias dissipates at more local scales. Our model for the exposure effects is a three-way tensor over exposure, outcome and spatial scale. We use a canonical polyadic decomposition and shrinkage priors to encourage sparsity and borrow strength across the dimensions of the tensor. We demonstrate the performance of our method in an extensive simulation study and data analysis about perfluoroalkyl and polyfluoroalkyl substances (PFAS) and several health outcomes.
 

Keywords

Causal inference

Mixtures

PFAS

Tensor decomposition 

Co-Author(s)

Brian Reich, North Carolina State University
Shih-Ni Prim, North Carolina State University
Shu Yang, North Carolina State University, Department of Statistics
Yawen Guan, Colorado State University
Ana Rappold, US EPA

Speaker

Brian Reich, North Carolina State University

Learning High-Dimensional Mechanistic Pathways of Exposome to Health Outcomes using Mixed Integer Optimization Algorithms

This talk will focus on a new approach to studying high-dimensional mechanistic pathways of exposome to health outcomes in the framework of homogeneity pursuit (HP). HP allows scientists to cluster similar toxicants into mixtures while accommodating high-dimensional mediators (e.g. metabolites) that play different roles in mediating the relationships between mixtures and health outcomes. Statistical learning is built upon integer optimization algorithms that formulate the task on clustering of toxicants into an estimation problem. Moreover, we propose an ensemble inference that can provide confidence intervals for high-dimensional direct and indirect effects. This new statistical toolbox will be illustrated by simulation studies and real-world data examples.  

Keywords

Exposome

Directed acyclic graph

Constrained optimization 

Co-Author(s)

Peter Song, University of Michigan
Leyao Zhang

Speaker

Peter Song, University of Michigan

Heterogeneous Distributed Lag Mixture Model for Precision Environmental Health with Longitudinally Assessed Mixture Exposures

Precision environmental health seeks to estimate how the effects of the environment vary across the population to inform targeted interventions and public health policy. However, there is a lack of statistical methods to estimate the heterogeneous effects of environmental exposures, particularly mixture exposures that are assessed longitudinally. From this perspective, we examine the heterogeneous exposure effect of average fine particulate matter (PM2.5) and maximal daily temperature assessed weekly during gestation on birth weight using birth registry data in Colorado. To achieve this, we develop a Bayesian additive model represented by an ensemble of tree triplets where a tree triplet consists of two types of binary trees, interacting to model heterogeneous time-structured exposure effects. Our framework provides a tool to estimate individualized and subgroup-specific distributed lag effects of longitudinally assessed mixture exposures. Our method can accommodate a high-dimensional set of candidate modifiers with modifier selection and allows for mixture exposures with time-sensitive interactions. Through simulation, we demonstrate that our model can estimate individualized exposure effects and identify important mixture components and modifying factors. From the Colorado birth registry data, we find a more evident negative association between PM2.5 and birth weight among non-Hispanic Asian, Pacific Islanders, and White mothers.  

Keywords

Precision Environmental Health

Bayesian Additive Regression Trees

Distributed Lag Models 

Co-Author(s)

Ander Wilson, Colorado State University
Daniel Mork, Harvard University

Speaker

Ander Wilson, Colorado State University