Recursive Bayesian Methods in Ecology and Environmental Science

Mevin Hooten Chair
The University of Texas At Austin
 
Mevin Hooten Organizer
The University of Texas At Austin
 
Tuesday, Aug 5: 10:30 AM - 12:20 PM
0576 
Topic-Contributed Paper Session 
Music City Center 
Room: CC-202B 
Modern computing environments continue to improve allowing us to fit models to data faster than
ever before. In particular, multi-processor and distributed computing resources have become
widely available in academics, agencies, and industry, and the computing cores per machine
continues to increase. More emphasis on parallel computing using graphical processing units
(GPUs) and the demand for massive high-performance computing centers driven by the large
language model (LLM) training needs of the AI-based industry has led to unprecedented
accessibility of such resources.

Ecological and environmental data sets continue to grow in size and complexity due to
technological advances and increased interest. Recursive (i.e., multi-stage) approaches for fitting
statistical models often allow us to leverage modern parallel computing environments and they
have proliferated in the Bayesian and machine learning fields. Despite these advancements
however, many approaches are either too complicated or too narrowly applicable for ecologists
and environmental scientists to adopt. Fortunately, ongoing developments in recursive statistical
computing have led to simple and intuitive methods that can be refined and generalized with
ease. Combined with readily available software for parallelization, these new recursive computing
approaches have enabled practitioners and led to improvements in scientific inquiry based on
large data sets. Recursive computing methods have also inspired new approaches to formulate
statistical models that can be both more general and economize implementation at the same
time.

This proposed invited session comprises statisticians across a range of career stages who work on
ecological and environmental applications. Each will present new methodology in recursive
Bayesian computing, developing ways to make implementation of statistical models scalable and
accurate and demonstrating the approaches using a variety of ecological and environmental data
sets.

Keywords

Bayesian

Markov Chain Monte Carlo

Ecology

Environment

Spatial

Hierarchical Models 

Applied

Yes

Main Sponsor

Section on Statistics and the Environment

Co Sponsors

Biometrics Section
Section on Bayesian Statistical Science

Presentations

An improved sampler for recursive Bayesian inference

Recursive Bayesian inference is an important tool for applications in which data arrives sequentially and updated parameter estimates are desired each time data arrives. Models for which the posterior distribution is estimated via Markov chain Monte Carlo (MCMC) can use Prior-Proposal-Recursive Bayes (PPRB) to resample existing posterior samples using the likelihood of the new data. Like all filtering methods, if applied many times PPRB will eventually converge to sampling from a degenerate distribution, limiting its usefulness for repeated application in longitudinal data settings. We present a sampling strategy for recursive Bayesian inference that extends PPRB to avoid the eventual tendency towards degeneracy by the addition of a transition kernel step run in parallel on each filtered sample. We show that this sampler improves upon PPRB by producing samples from the target posterior distribution that will not tend towards degeneracy. Additionally, we compare the performance of the proposed sampler to other streaming samplers for recursive inference and present an application to Ecological species count data. 

Keywords

Recursive Bayes

Distributed MCMC

Streaming Data 

Co-Author(s)

Ian Taylor
Brenda Betancourt, NORC at The University of Chicago

Speaker

Andee Kaplan, Colorado State University

A Multistage Approach to Posterior Sampling for Bayesian Models

The surge in access to computing resources has attracted attention toward the development of algorithms that can run efficiently on multi-core processing units or in distributed computing environments. In the context of Bayesian inference, MCMC still remains the most reliable and widely applicable algorithm to characterize posterior distributions, however its Markovian nature imposes challenges when it comes to parallelization. Running independent chains is inefficient due to the need to discard a fixed number of observations as burn in, while parallelizing a single chain leads to additional communication costs at every iteration. To circumvent both of these issues, multistage approaches have been proposed, where a sample from a computationally convenient and parallel friendly approximation of the posterior distribution is obtained and later corrected using an importance sampling or Metropolis-Hastings post-processing step. In this work, propose an extension of pre-existing multistage approaches, showcasing the effectiveness of the resulting algorithm considering both simulated experiments and real data. 

Keywords

Bayesian inference

embarrassingly parallel MCMC

Gaussian process 

Speaker

Daniel Wurzler Barreto, The University of Texas at Austin

New tools for recursive Bayesian inference applied to high-resolution satellite imagery

Recursive Bayesian inference, in which posterior beliefs are updated in light of accumulating batches of data, is a tool for implementing Bayesian models in applications with streaming and/or very large data sets. Implementation typically proceeds via a sequence of "transient" posteriors characterized by samples obtained using acceptance/rejection algorithms in which draws from one posterior in the sequence are used as proposals for the next. While straightforward to implement, such filtering approaches suffer from particle depletion, degrading each sample's ability to represent its target posterior. Generating proposals from smoothed versions of the transient posterior's empirical sampling distributions can alleviate particle depletion, but the efficiency of such an approach can be extremely limited for moderate to high dimensional parameter spaces. We introduce new tools for smoothed recursive Bayesian inference in the form of blocking and generalized elliptical slice samplers that ensure satisfactory effective sample sizes throughout the sequence of transient posterior samples. We apply the method to satellite imagery to classify forest vegetation in New Mexico. 

Keywords

recursive Bayes

generalized elliptical slice sampler

satellite imagery

vegetation cover 

Speaker

Henry Scharf, University of Arizona

Multi-Stage MCMC for Spatio-Temporal Data with an Application to the U.S. Drought Monitor

Bayesian analyses of large spatio-temporal data are hindered by the computational expense required to implement Markov chain Monte Carlo (MCMC) algorithms. A common solution is for researchers to reduce the dimensionality by making simplifying assumptions in the spatial and/or temporal structure. While this increases computational efficiency, it comes at the cost of model flexibility. Alternatively, we propose a multi-stage MCMC approach that
permits Bayesian analysis of the full model. The use of parallel computing alleviates the computational expense associated with large space-time data, making this approach scalable and generalizable to a flexible class of spatio-
temporal models. This work is motivated by spatio-temporal ordinal data from the US Drought Monitor. This weekly data product records drought conditions across the United States as one of six ordered levels. We develop a Bayesian
spatio-temporal ordinal model for modeling and forecasting drought conditions, and we fit this model with the proposed multi-stage MCMC approach. 

Keywords

Bayesian computing

MCMC

parallel computing

spatial

environmental 

Speaker

Staci Hepler, Wake Forest University

A Multi-Stage Approach to Fit Bayesian Spatial Point Process Models

Bayesian point process models are commonly used to analyze presence-only data in ecology. Current methods for fitting these models are computationally expensive because they require numerical quadrature and algorithm supervision. We propose a flexible and efficient multi-stage Bayesian approach to fitting point process models that leverages parallel computing resources to estimate coefficients and predict total abundance. We show how this method can be extended to study designs with compact observation windows and allows for posterior prediction in unobserved areas, which can be used for downstream analyses. We demonstrate this approach using a simulation study and on imagery data from aerial surveys to learn spatially explicit abundance of harbor seals in Johns Hopkins Inlet, an important glacial fjord in Alaska. 

Keywords

spatial point process

recursive Bayes

species distribution modeling

presence-only data

parallel processing

Markov Chain Monte Carlo 

Co-Author(s)

Mevin Hooten, The University of Texas At Austin
Toryn Schafer, Texas A&M University
Nicholas Calzada, The University of Texas At Austin
Benjamin Hoose, Texas A&M University
Jamie Womble, National Park Service
Scott Gende, National Park Service

Speaker

Rachael Ren, The University of Texas At Austin