Sunday, Aug 3: 4:00 PM - 5:50 PM
0610
Topic-Contributed Paper Session
Music City Center
Room: CC-106A
Presentations by the winners of the ENVR student paper competition.
Spatial statistics
Bayesian modeling
Neural networks
Spatial extremes
Applied
No
Main Sponsor
Section on Statistics and the Environment
Presentations
Classic Bayesian methods with complex models are frequently infeasible due to an intractable likelihood. Simulation-based inference methods, such as Approximate Bayesian Computing (ABC), calculate posteriors without accessing a likelihood function by leveraging the fact that data can be quickly simulated from the model, but converge slowly and/or poorly in high-dimensional settings. In this paper, we propose a framework for Bayesian posterior estimation by mapping data to posteriors of parameters using a machine learning model trained on data simulated from the complex model. Posterior distributions of model parameters are efficiently obtained by assuming a parametric form for the posterior, parametrized by the machine learning model, which is trained with the simulated observed data as inputs and the associated parameters as outputs. We show theoretically that our posteriors converge to the true posteriors in Kullback-Leibler divergence if the correct parametric family of the posterior is identified. We also provide tools to help us identify if our parametric assumption is close to the true posterior, and modeling options if that is not the case. Comprehensive simulation studies highlight our method's robustness and accuracy.
Keywords
Simulation-based Inference
Emulator
Spatial Epidemiology
Spatial Extreme Models
Variational Inference
Approximate Bayesian Computing
Anthropogenically forced climate shifts disrupt the seasonal behavior of climatic and hydrologic processes. The seasonality of streamflow has significant implications for the ecology of riverine ecosystems and for meeting societal demands for water resources. We develop a hierarchical Bayesian model of daily streamflow to quantify how the shape of seasonal hydrographs are changing and to evaluate temporal trends in model-based hydrologic indices related to flow timing and magnitude shifts. We apply this model to 1,112 gages across the Northern US over the years 1965-2022. We identify large-scale patterns in temporal changes to streamflow profiles that are consistent with regional changes in hydroclimate, including decreasing seasonal flow variability in the Pacific Northwest and increasing winter flows in the northeastern US. Within these regions we also observe fine-scale heterogeneity in streamflow timing and magnitude shifts, both of which have potentially significant implications for riverine ecosystem function and the ecosystem services they provide.
Keywords
Streamflow
Climate
Bayesian
Modeling the nonstationarity that often prevails in extremal dependence of spatial data can be challenging. Inference for stationary and isotropic models is considerably easier, but the assumptions that underpin these models are not typically met by data observed over large or topographically complex domains. A simple approach to accommodating spatial nonstationarity under the assumption of Gaussianity is to warp the original spatial domain to a latent space where stationarity and isotropy can be reasonably assumed and has since seen further developments in the classical Gaussian-based geostatistics and spatial extremes contexts. However, estimation of the warping function can be computationally expensive, and the transformation is not always guaranteed to be injective, which can lead to physically unrealistic transformations. We present a deep compositional model to capture nonstationarity in extremal dependence in exceedances of data functionals by leveraging efficient inference methods for r-Pareto processes. A detailed high-dimensional simulation study demonstrates the superior performance of our model in estimating the warped space, leading to an accurate characterization of the highly nonstationary extremal dependence structure. We apply the proposed approach to UK precipitation data, where we efficiently estimate the extremal dependence pattern with data observed at thousands of locations, which has never been achieved in previous relevant studies. The model is programmed with the R language and tensorflow v2.
Keywords
Deformation
Nonstationarity
Deep Models
Spatial Extremes
r-Pareto Processes
Ensemble decision tree methods such as XGBoost, random forest, and Bayesian additive decision trees (BART) have gained enormous popularity in data science for their superior performance in machine learning regression and classification tasks. In this paper, we develop a new Bayesian graph-split-based additive decision trees method, called GS-BART, to improve the performance of BART for spatially dependent data. The new method adopts a highly flexible split rule complying with spatial structures to relax the axis-parallel split rule assumption adopted in most existing ensemble decision tree models. We consider a generalized spatial nonparametric regression model using GS-BART and design a scalable informed MCMC algorithm to sample the decision trees of GS-BART, which apply to both point referenced and areal unit data as well as Gaussian and non-Gaussian responses. The algorithm leverages a gradient-based recursive algorithm on root directed spanning trees or chains (called arborescences) The superior performance of the method over conventional ensemble tree models and Gaussian process regression models is illustrated in various spatial data analysis.
Keywords
Bayesian Nonparametric Regression
Complex Domain
Decision Trees
Informed MCMC
Spatial Prediction
Spanning Tree
Data derived from remote sensing or numerical simulations often have a regular gridded structure and are large in volume. However, it is challenging to find accurate spatial models that can fill in missing grid cells or simulate the process effectively, especially when there is spatial heterogeneity and heavy-tailed marginal distributions. One effective method is to use a spatial autoregressive (SAR) model, which maps a location and its neighbors to spatially independent random variables. This model is flexible and well-suited for non-Gaussian fields. In this study, we assume that the innovations in the SAR model follow a Generalized Extreme Value (GEV) distribution, a heavy-tailed distribution, and incorporate nonlinear maps that combine a central grid location with its neighbors, introducing extreme spatial behavior. While these models are fast to simulate due to the sparseness of the construction, the estimation process is slow because the likelihood is intractable. To overcome this, we suggest training a convolutional neural network (CNN) on a large training set that covers a useful parameter space and then using the trained network for fast estimation. We apply this model to analyze yearly maximum precipitation data from a regional climate model to study spatial extremal behavior across North America.
Keywords
Spatial Autoregressive Model
Generalized Extreme Value Distribution
Convolutional Neural Networks
Parameter Estimation
Spatial Extremes
Quantile Regression