Monday, Aug 4: 2:00 PM - 3:50 PM
0178
Invited Paper Session
Music City Center
Room: CC-207A
Environmental Health
Geospatial Data
Environmental Mixtures
Exposome
Applied
Yes
Main Sponsor
ENAR
Co Sponsors
Section on Statistics and the Environment
Section on Statistics in Epidemiology
Presentations
Building artificially intelligent geospatial systems require rapid delivery of spatial data analysis at massive scales with minimal human intervention. Depending upon their intended use, data analysis may also entail model assessment and uncertainty quantification. This article devises transfer learning frameworks for deployment in artificially intelligent systems, where a massive data set is split into smaller data sets that stream into the analytical framework to propagate learning and assimilate inference for the entire data set. Specifically, we introduce Bayesian predictive stacking for multivariate and spatial data and demonstrate its effectiveness in rapidly analyzing massive data sets. Furthermore, we make inference feasible in a reasonable amount of time, and without excessively demanding hardware settings. We also discuss Bayesian predictive stacking for spatial-temporal models, where the primary inferential objective is to provide inference on the latent spatial random field and conduct spatial predictions at arbitrary locations. We exploit analytically tractable posterior distributions for regression coefficients of predictors and the realizations of the spatial process conditional upon process parameters. We subsequently combine such inference by stacking these models across the range of values of the hyper-parameters. We devise predictive stacking in a manner that is computationally efficient without resorting to iterative algorithms such as Markov chain Monte Carlo (MCMC) and can exploit the benefits of parallel computations. We illustrate the effectiveness of this approach in extensive simulation experiments and subsequently analyze massive data sets from climate science and wearable devices data.
Keywords
GeoAI
Bayesian Transfer Learning
Predictive Stacking
Spatial-Temporal data
Unmeasured spatial confounding complicates exposure effect estimation in environmental health studies. This problem is exacerbated in studies with multiple health outcomes and environmental exposure variables, as the source and magnitude of confounding bias may differ across exposure/outcome pairs. We propose to mitigate the effects of spatial confounding in multivariate studies by projecting to the spectral domain to separate relationships by the spatial scale, and assuming that the confounding bias dissipates at more local scales. Our model for the exposure effects is a three-way tensor over exposure, outcome and spatial scale. We use a canonical polyadic decomposition and shrinkage priors to encourage sparsity and borrow strength across the dimensions of the tensor. We demonstrate the performance of our method in an extensive simulation study and data analysis about perfluoroalkyl and polyfluoroalkyl substances (PFAS) and several health outcomes.
Keywords
Causal inference
Mixtures
PFAS
Tensor decomposition
This talk will focus on a new approach to studying high-dimensional mechanistic pathways of exposome to health outcomes in the framework of homogeneity pursuit (HP). HP allows scientists to cluster similar toxicants into mixtures while accommodating high-dimensional mediators (e.g. metabolites) that play different roles in mediating the relationships between mixtures and health outcomes. Statistical learning is built upon integer optimization algorithms that formulate the task on clustering of toxicants into an estimation problem. Moreover, we propose an ensemble inference that can provide confidence intervals for high-dimensional direct and indirect effects. This new statistical toolbox will be illustrated by simulation studies and real-world data examples.
Keywords
Exposome
Directed acyclic graph
Constrained optimization
Precision environmental health seeks to estimate how the effects of the environment vary across the population to inform targeted interventions and public health policy. However, there is a lack of statistical methods to estimate the heterogeneous effects of environmental exposures, particularly mixture exposures that are assessed longitudinally. From this perspective, we examine the heterogeneous exposure effect of average fine particulate matter (PM2.5) and maximal daily temperature assessed weekly during gestation on birth weight using birth registry data in Colorado. To achieve this, we develop a Bayesian additive model represented by an ensemble of tree triplets where a tree triplet consists of two types of binary trees, interacting to model heterogeneous time-structured exposure effects. Our framework provides a tool to estimate individualized and subgroup-specific distributed lag effects of longitudinally assessed mixture exposures. Our method can accommodate a high-dimensional set of candidate modifiers with modifier selection and allows for mixture exposures with time-sensitive interactions. Through simulation, we demonstrate that our model can estimate individualized exposure effects and identify important mixture components and modifying factors. From the Colorado birth registry data, we find a more evident negative association between PM2.5 and birth weight among non-Hispanic Asian, Pacific Islanders, and White mothers.
Keywords
Precision Environmental Health
Bayesian Additive Regression Trees
Distributed Lag Models