Bayesian Hierarchical Models for Environmental Data

Robert Richardson Chair
Brigham Young University
 
Monday, Aug 4: 8:30 AM - 10:20 AM
4035 
Contributed Papers 
Music City Center 
Room: CC-103B 

Main Sponsor

Section on Statistics and the Environment

Presentations

Bayesian inference for spatial extremes under preferential sampling.

In geostatistics, preferential sampling occurs when a location's inclusion probability correlates with the measured variable. For Gaussian spatial processes, preferential sampling has been shown to impact parameter estimation and degrade out-of-sample predictions. Most proposed solutions involve modeling the locations as a realization of a log Gaussian Cox process with the observed outcome as dependent on the same underlying spatial process. Preferentiality in non-Gaussian data is less explored. This study examines its impact on extremes, such as pollution or precipitation maxima. We introduce an intuitive modeling framework to induce a shared underlying spatial process between a location's maximum median value and its probability of being sampled, using the blended GEV distribution (bGEV). Inference is performed in a Bayesian framework, via an MCMC algorithm with a data augmentation step. We show how failing to account for preferentiality leads to biased parameter estimates, and how our solution improves inference compared to a baseline extremes spatial model. We apply our approach to estimate maxima of PM2.5 levels in California 

Keywords

preferential sampling, extremes, point process, log cox gaussian process

bayesian modeling 

Co-Author

Veronica Berrocal, University of California, Irvine

First Author

Bianca Brusco

Presenting Author

Bianca Brusco

Bayesian Learning of Spatiotemporal Source Distribution for Beached Microplastic in US Gulf Coast

Over the last several decades, plastic waste has gradually accumulated while slowly degrading in terrestrial and oceanic environments. Recently, there has been an increased effort to identify the possible sources of plastic to understand how they affect vulnerable beaches. This study specifically focuses on microplastic beached in US Gulf Coast. We expand upon existing Bayesian plastic attribution models and develop a rigorous statistical framework to map observed beached microplastics to their sources. Within this framework, we combine Lagrangian backtracking simulations of floating particles using nurdle beaching data with estimates of plastic input from coastlines, rivers, and fisheries. This allows us to build a spatiotemporal microplastic distribution in the Gulf Coast from source to sink. We infer that the main sources of microplastics found on the Gulf beaches in the US are centered around New Orleans, Galveston Bay, Corpus Christi, M\'erida, the Grijalva and Pearl Rivers, as well as from fishing activities around the Mississippi River Delta. We also find strong seasonal effects of microplastic transport in the Gulf caused by the time-varying ocean currents and tourism. 

Keywords

Backtracking simulations

Lagrangian ocean analysis framework

Nurdles

Virtual particles 

Co-Author

Avishek Chakraborty, University of Arkansas

First Author

David Pojunas, NA

Presenting Author

Avishek Chakraborty, University of Arkansas

Exact Bayesian Inference for Multivariate Spatial Data of Any Size with Air Pollution Application

Fine particulate matter and aerosol optical thickness are of interest to atmospheric scientists for understanding air quality and its various health/environmental impacts. The available data are extremely large, making uncertainty quantification in a fully Bayesian framework quite difficult, as traditional implementations do not scale reasonably to the size of the data. We specifically consider roughly 8 million observations from a remote sensing dataset obtained from NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) instrument. To analyze data on this scale, we introduce Scalable Multivariate Exact Posterior Regression (SM-EPR) which combines the data subset approach and Exact Posterior Regression (EPR). EPR is a Bayesian hierarchical model that allows sampling of fixed and random effects directly from the posterior without Markov chain Monte Carlo (MCMC) or approximate Bayesian techniques. We extend EPR to the multivariate spatial context, where the multiple variables may be distributed according to different distributions. The combination of the data subset approach with EPR allows one to perform exact Bayesian inference without MCMC for effectively any sample size. 

Keywords

Basis functions

Big Data

Multivariate

Uncertainty quantification 

Co-Author

Jonathan Bradley, Florida State University

First Author

Madelyn Clinch, Florida State University

Presenting Author

Madelyn Clinch, Florida State University

Hierarchical Bayesian Modeling of Hurricane Genesis via Poisson Point Process Models

We model hurricane genesis using Poisson processes with finite mixtures for the intensity surface. We use a Bayesian hierarchical framework for the estimation of the parameters of the model. The marked Poison point process allows for the inclusion of additional information such as hurricane strength. Based on our formulation we can answer two important questions: at which locations do we expect hurricane genesis of a certain strength to occur, and given a specific location, what hurricane strength do we expect to observe. 

Keywords

Bayesian hierarchical model

non-homogeneous Poisson process

spatial statistics

hurricane genesis

environmental science 

Co-Author(s)

Athanasios Micheas, Univ of Missouri-Columbia
Guirong (Grace) Yan, Missouri University of Science and Technology

First Author

Zselyke Talata, USD497 Lawrence High School

Presenting Author

Zselyke Talata, USD497 Lawrence High School

Modeling Extreme Rainfall Events in Taiwan Using a Spatial-Temporal Hierarchical Framework

The PoT-GEV model, which integrates the generalized extreme value distribution with the peaks-over-threshold method, is a robust tool for extreme value analysis. Originally developed by Olafsdottir et al. (2021) for fitting block maxima data, it offers the capability to simultaneously analyze trends in the frequency and intensity of extreme events. In this study, we advance the PoT-GEV framework by introducing a spatial hierarchical structure combined with temporal effects. Spatial dependencies are captured through a latent spatial Gaussian process applied to the PoT-GEV parameters, while temporal covariates are incorporated to model time-varying effects. To address computational challenges, we replace traditional Markov Chain Monte Carlo methods with the Laplace approximation, significantly improving efficiency. The proposed methodology is validated through extensive simulation studies across diverse scenarios. Furthermore, its practical utility is demonstrated by applying the model to analyze extreme rainfall events in Taiwan. 

Keywords

Block maximum series data

Climate data analysis

Generalized extreme value distribution

Laplace approximation

Latent spatial Gaussian process 

Co-Author(s)

Tzu-Han Peng, Graduate Institute of Statistics, National Central University, Taiwan
Cheng-Ching Lin, Institute of Statistics and Data Science, National Tsing-Hua University, Taiwan
Nan-Jung Hsu, Institute of Statistics and Data Science, National Tsing Hua University

First Author

Chun-Shu Chen, National Central University

Presenting Author

Chun-Shu Chen, National Central University

Sample from the Posterior to Stabilize an Unknown Diffusion Process

Multidimensional diffusion processes are classical models for dynamics of highly stochastic phenomena. The range of applications spans from early-stage cancer treatments to monetary policies for controlling inflation. Accordingly, the exogenous control input of the diffusion process needs to be delicately designed to stabilize the dynamics and preclude wild growths. In many situations that involve uncertainties, one needs to apply input signals and estimate the unknown parameters in order to learn stabilizing control policies. We propose a data-driven algorithm for this task that employs random inputs and forms a posterior belief about the model parameters. Then, we show that by treating posterior samples as true parameters, the process can be stabilized with high probability. 

Keywords

Diffusion Processes

Random Input

Stabilization

Posterior Sampling 

Co-Author

Mohamad Kazem Shirani Faradonbeh, Southern Methodist University

First Author

Reza Sadeghi Hafshejani

Presenting Author

Reza Sadeghi Hafshejani

Binary Spatio-Temporal Process for Point-Referenced Data using a Markov Random Field

In this talk, we propose a novel approach for modeling spatio-temporal binary data with point-referenced data by leveraging Markov kernels to capture both spatial and temporal dependencies. Spatio-temporal binary data, which commonly arises in fields such as environmental science, epidemiology, and geospatial analysis, usually relies on a computationally cumbersome latent Gaussian process in the context of point-referenced applications to overcome the problem of specifying dependencies on a continuous spatial structure. Through the use of Markov kernels, we provide a flexible mechanism for incorporating spatial and temporal correlations in a unified manner, allowing for non-linear dependencies and varying transition probabilities over both space and time. Our approach is validated with an application on forest cover in the CONUS using the Forest Inventory Analysis data and the Tree Canopy Cover data product from the Forest Service, demonstrating its ability to accurately capture spatio-temporal dynamics and improve prediction accuracy over existing methods. The results show that Markov kernels offer a powerful and scalable framework for analyzing spatio-temporal binary data. 

Keywords

Spatio-temporal analysis

Binary Data

Markov Random Field

Bayesian statistics 

Co-Author(s)

Andrew Finley, Michigan State University
Paul May

First Author

Romain Boutelet

Presenting Author

Romain Boutelet