Monday, Aug 4: 8:30 AM - 10:20 AM
4035
Contributed Papers
Music City Center
Room: CC-103B
Main Sponsor
Section on Statistics and the Environment
Presentations
In geostatistics, preferential sampling occurs when a location's inclusion probability correlates with the measured variable. For Gaussian spatial processes, preferential sampling has been shown to impact parameter estimation and degrade out-of-sample predictions. Most proposed solutions involve modeling the locations as a realization of a log Gaussian Cox process with the observed outcome as dependent on the same underlying spatial process. Preferentiality in non-Gaussian data is less explored. This study examines its impact on extremes, such as pollution or precipitation maxima. We introduce an intuitive modeling framework to induce a shared underlying spatial process between a location's maximum median value and its probability of being sampled, using the blended GEV distribution (bGEV). Inference is performed in a Bayesian framework, via an MCMC algorithm with a data augmentation step. We show how failing to account for preferentiality leads to biased parameter estimates, and how our solution improves inference compared to a baseline extremes spatial model. We apply our approach to estimate maxima of PM2.5 levels in California
Keywords
preferential sampling, extremes, point process, log cox gaussian process
bayesian modeling
Over the last several decades, plastic waste has gradually accumulated while slowly degrading in terrestrial and oceanic environments. Recently, there has been an increased effort to identify the possible sources of plastic to understand how they affect vulnerable beaches. This study specifically focuses on microplastic beached in US Gulf Coast. We expand upon existing Bayesian plastic attribution models and develop a rigorous statistical framework to map observed beached microplastics to their sources. Within this framework, we combine Lagrangian backtracking simulations of floating particles using nurdle beaching data with estimates of plastic input from coastlines, rivers, and fisheries. This allows us to build a spatiotemporal microplastic distribution in the Gulf Coast from source to sink. We infer that the main sources of microplastics found on the Gulf beaches in the US are centered around New Orleans, Galveston Bay, Corpus Christi, M\'erida, the Grijalva and Pearl Rivers, as well as from fishing activities around the Mississippi River Delta. We also find strong seasonal effects of microplastic transport in the Gulf caused by the time-varying ocean currents and tourism.
Keywords
Backtracking simulations
Lagrangian ocean analysis framework
Nurdles
Virtual particles
Fine particulate matter and aerosol optical thickness are of interest to atmospheric scientists for understanding air quality and its various health/environmental impacts. The available data are extremely large, making uncertainty quantification in a fully Bayesian framework quite difficult, as traditional implementations do not scale reasonably to the size of the data. We specifically consider roughly 8 million observations from a remote sensing dataset obtained from NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) instrument. To analyze data on this scale, we introduce Scalable Multivariate Exact Posterior Regression (SM-EPR) which combines the data subset approach and Exact Posterior Regression (EPR). EPR is a Bayesian hierarchical model that allows sampling of fixed and random effects directly from the posterior without Markov chain Monte Carlo (MCMC) or approximate Bayesian techniques. We extend EPR to the multivariate spatial context, where the multiple variables may be distributed according to different distributions. The combination of the data subset approach with EPR allows one to perform exact Bayesian inference without MCMC for effectively any sample size.
Keywords
Basis functions
Big Data
Multivariate
Uncertainty quantification
We model hurricane genesis using Poisson processes with finite mixtures for the intensity surface. We use a Bayesian hierarchical framework for the estimation of the parameters of the model. The marked Poison point process allows for the inclusion of additional information such as hurricane strength. Based on our formulation we can answer two important questions: at which locations do we expect hurricane genesis of a certain strength to occur, and given a specific location, what hurricane strength do we expect to observe.
Keywords
Bayesian hierarchical model
non-homogeneous Poisson process
spatial statistics
hurricane genesis
environmental science
The PoT-GEV model, which integrates the generalized extreme value distribution with the peaks-over-threshold method, is a robust tool for extreme value analysis. Originally developed by Olafsdottir et al. (2021) for fitting block maxima data, it offers the capability to simultaneously analyze trends in the frequency and intensity of extreme events. In this study, we advance the PoT-GEV framework by introducing a spatial hierarchical structure combined with temporal effects. Spatial dependencies are captured through a latent spatial Gaussian process applied to the PoT-GEV parameters, while temporal covariates are incorporated to model time-varying effects. To address computational challenges, we replace traditional Markov Chain Monte Carlo methods with the Laplace approximation, significantly improving efficiency. The proposed methodology is validated through extensive simulation studies across diverse scenarios. Furthermore, its practical utility is demonstrated by applying the model to analyze extreme rainfall events in Taiwan.
Keywords
Block maximum series data
Climate data analysis
Generalized extreme value distribution
Laplace approximation
Latent spatial Gaussian process
Co-Author(s)
Tzu-Han Peng, Graduate Institute of Statistics, National Central University, Taiwan
Cheng-Ching Lin, Institute of Statistics and Data Science, National Tsing-Hua University, Taiwan
Nan-Jung Hsu, Institute of Statistics and Data Science, National Tsing Hua University
First Author
Chun-Shu Chen, National Central University
Presenting Author
Chun-Shu Chen, National Central University
Multidimensional diffusion processes are classical models for dynamics of highly stochastic phenomena. The range of applications spans from early-stage cancer treatments to monetary policies for controlling inflation. Accordingly, the exogenous control input of the diffusion process needs to be delicately designed to stabilize the dynamics and preclude wild growths. In many situations that involve uncertainties, one needs to apply input signals and estimate the unknown parameters in order to learn stabilizing control policies. We propose a data-driven algorithm for this task that employs random inputs and forms a posterior belief about the model parameters. Then, we show that by treating posterior samples as true parameters, the process can be stabilized with high probability.
Keywords
Diffusion Processes
Random Input
Stabilization
Posterior Sampling
In this talk, we propose a novel approach for modeling spatio-temporal binary data with point-referenced data by leveraging Markov kernels to capture both spatial and temporal dependencies. Spatio-temporal binary data, which commonly arises in fields such as environmental science, epidemiology, and geospatial analysis, usually relies on a computationally cumbersome latent Gaussian process in the context of point-referenced applications to overcome the problem of specifying dependencies on a continuous spatial structure. Through the use of Markov kernels, we provide a flexible mechanism for incorporating spatial and temporal correlations in a unified manner, allowing for non-linear dependencies and varying transition probabilities over both space and time. Our approach is validated with an application on forest cover in the CONUS using the Forest Inventory Analysis data and the Tree Canopy Cover data product from the Forest Service, demonstrating its ability to accurately capture spatio-temporal dynamics and improve prediction accuracy over existing methods. The results show that Markov kernels offer a powerful and scalable framework for analyzing spatio-temporal binary data.
Keywords
Spatio-temporal analysis
Binary Data
Markov Random Field
Bayesian statistics