Print Close

Advancement in Statistical and Machine Learning Methods for Environmental Data

Nathan Wikle Chair
University of Iowa

Monday, Aug 4: 10:30 AM - 12:20 PM
4045
Contributed Papers

Music City Center

Room: CC-211

Main Sponsor

Section on Statistics and the Environment

Presentations

Decentralized Inference for Spatial Data Using Low-Rank Models

Advancements in information technology have enabled the generation of massive spatial datasets, necessitating scalable distributed methods. Centralized frameworks are prone to vulnerabilities such as single-point failures and communication bottlenecks. This paper introduces a decentralized framework for parameter inference in spatial low-rank models to address these limitations. A key challenge stems from spatial dependence among observations, which prevents the log-likelihood from being expressed as a summation-a critical requirement for decentralized optimization. To overcome this, we propose a novel objective function leveraging the evidence lower bound, facilitating the application of decentralized optimization techniques. Our approach integrates block descent with multi-consensus and dynamic consensus averaging for effective parameter optimization. We prove the new objective's convexity near true parameters, ensuring convergence. Additionally, we establish theoretical results on the consistency and asymptotic normality of the estimator for spatial low-rank models. Extensive simulations and real-world experiments confirm the framework's robustness and scalability.

Keywords

Block descent method

Dynamic consensus averaging

Evidence lower bound

Multi-consensus

Spatial dependence

Co-Author(s)

Sameh Abdulah, King Abdullah University of Science and Technology
Ying Sun, King Abdullah University of Science and Technology
Marc Genton, King Abdullah University of Science and Technology

First Author

Jianwei Shi, King Abdullah University of Science and Technology

Presenting Author

Jianwei Shi, King Abdullah University of Science and Technology

Estimation of Impact Ranges for Functional Valued Predictors

Spectroscopy is essential for scientific and industrial applications, enabling the analysis of complex materials and their interactions with radiation. Hyperspectral remote sensing, or imaging spectroscopy, plays a key role in Earth sciences, including ecology, geology, and cryosphere research. With the growing availability of orbital imaging spectrometers, developing methods to enhance data utility is crucial. Identifying diagnostic absorption features in spectra is vital for understanding spectral-response relationships. This study considers a Functional Partial Least Squares (FPLS) approach to model spectral data as smooth functions and analyze their impact within specific impact ranges. We propose a two-stage estimation procedure to determine these ranges' midpoints and half-lengths, along with an iterative algorithm to estimate their number and locations. The method is validated through simulations and applied to real spectral data to identify diagnostic absorption features for predicting soil calcium carbonate (CaCO₃) content, successfully estimating their number and locations.

Keywords

Functional Data Analysis

Functional Partial Least Squares

Spectroscopy

Impact Range

Co-Author(s)

Nimrod Carmon, Jet Propulsion Laboratory
Bledar Komoni, University of Cincinnati
Jonathan Hobbs, Jet Propulsion Laboratory
Amy Braverman, Jet Propulsion Laboratory
Dean Young
Joon Jin Song, Baylor University

First Author

Rory Samuels

Presenting Author

Joon Jin Song, Baylor University

Frequency Domain Empirical Likelihood Using Distributional Constraint.

This work introduces a framework for robust Bayesian inference by integrating
two methodologies: a Bayesian exponentially tilted empirical likelihood and a
frequency domain empirical likelihood, each designed to address different aspects
of statistical modeling. The first component leverages a new variant of
the Wasserstein metric to concentrate the likelihood near a chosen parametric
family, enabling robust inference on model parameters in the presence of outliers.
We extend this idea to dependent data through a data transformation
(i.e., a Fourier transform) developed in terms of the spectral distribution. In
this semi-parametric approach, instead of using moment-based constraints as in
the existing literature, we employ distributional constraints so that the distribution
is concentrated around a guessed parametric family. Applications extend
to robust inference, spectral analysis, Whittle estimation, and goodness-of-fit
testing, with implications for trustworthy machine learning.

Keywords

Frequency Domain Empirical Likelihood

Robust Inference

Whittle Estimation

Spectral Distribution

Periodograms

Co-Author(s)

Debdeep Pati, University of Wisconsin-Madison
Soutir Bandyopadhyay, Colorado School of Mines

First Author

Souvick Bera, Colorado School of Mines

Presenting Author

Souvick Bera, Colorado School of Mines

Modeling Bayesian Transport Map Uncertainty for Non-Gaussian Spatial Data via Laplace Approximation

Transport maps can be used to describe non-Gaussian multivariate distributions relative to a simple reference distribution, usually Gaussian. Previous work in this area focused on modeling transport maps using Gaussian processes, and computational limitations have led practitioners to focus on learning map parameters via stochastic gradient methods. We extend this idea by employing a Laplace approximation to the posterior distribution of transport map parameters. We first discuss the characteristics of the Laplace approximation in the transport map setting, then explore how capturing and quantifying uncertainty in transport map parameters affects the model's ability to learn the non-Gaussian target distribution. We then compare our new model's performance in learning the distribution of a potentially nonstationary spatial field to established methods using various metrics. Finally, we contrast the Laplace approximation with various other approximation and uncertainty quantification methods.

Keywords

Gaussian process

Generative modeling

Laplace approximation

Uncertainty quantification

Co-Author(s)

Matthias Katzfuss, University of Wisconsin–Madison
Felix Jimenez

First Author

Jacob Johnson, University of Wisconsin - Madison

Presenting Author

Jacob Johnson, University of Wisconsin - Madison

Multivariate confluent hypergeometric covariance functions with origin and tail flexibility

Spatially-indexed multivariate data appear frequently in geostatistics and related fields including oceanography and environmental science, with data often modeled through covariance and cross-covariance functions in the Gaussian Process setting. The purpose of this work is to present techniques using multivariate mixtures for establishing validity that are simultaneously simplified and comprehensive. In particular, cross-covariances are constructed for the recently-introduced confluent hypergeometric (CH) class of covariance functions, which has slow (polynomial) decay in the tails of the covariance that better handles large gaps between observations in comparison with other covariance models. The approach leads to valid multivariate cross-covariance models that inherit the desired marginal properties of the confluent hypergeometric model and outperform the multivariate Matérn model in out-of-sample prediction under slowly-decaying correlation of the underlying multivariate random field. The model captures heavy tail decay and dependence between variables in an oceanography dataset of temperature, salinity and oxygen, as measured by autonomous floats in the Southern Ocean.

Keywords

Cross-covariances

Multivariate geostatistics

Oceanography

Spectral construction

Co-Author

Anindya Bhadra, Purdue University

First Author

Andrew Yarger, Purdue University

Presenting Author

Andrew Yarger, Purdue University

On modeling discrete lattice data using the Potts model

The analysis of spatial data on a grid is a widely used tool in fields like demography, epidemiology, image analysis, and land management. The Ising and Potts models are often used for such data, for instance in studying protein structures in biology, reconstruction of social networks in social sciences, and image segmentation in computer vision. However, in high-correlation settings simulations from the fitted models are not able to reproduce the characteristics observed in the data. Furthermore, likelihood-based inference is challenging due to an intractable normalizing constant that is a function of the model parameters. We propose a novel tapered version of the Potts models that builds on work from Fellows and Handcock in the context of exponential family random graph models. We show that the tapered model is a valuable alternative to the Potts model and provide an algorithm to fit the model. Based on real and simulated data studies, we provide practical guidance on when to use the tapered model, along with a discussion of its potential limitations.

Keywords

Potts model

Lattice data modeling

Discrete lattice data

Co-Author(s)

Stephen Berg
Murali Haran, Penn State University

First Author

Maria Paula Duenas Herrera, The Pennsylvania State University

Presenting Author

Maria Paula Duenas Herrera, The Pennsylvania State University

Spatial Hyperspheric Models for Compositional Data

Compositional data are an increasingly prevalent data source in spatial statistics. Analysis of such data is typically done on log-ratio transformations or via Dirichlet regression. However, these approaches often make unnecessarily strong assumptions (e.g., strictly positive components, exclusively negative correlations). An alternative approach uses square-root transformed compositions and directional distributions. Such distributions naturally allow for zero-valued components and positive correlations, yet they may include support outside the non-negative orthant and are not generative for compositional data. To overcome this challenge, we truncate the elliptically symmetric angular Gaussian (ESAG) distribution to the non-negative orthant. Additionally, we propose a spatial hyperspheric regression that contains fixed and random multivariate spatial effects. The proposed model also contains a term that can be used to propagate uncertainty that may arise from precursory stochastic models (i.e., machine learning classification). We used our model in a simulation study and for a spatial analysis of classified bioacoustic signals of the Dryobates pubescens (downy woodpecker).

Keywords

Bayesian

generative

hyperspheric regression

uncertainty propagation

compositional data

directional data

Co-Author(s)

Mevin Hooten, The University of Texas At Austin
Nicholas Calzada, The University of Texas At Austin
Timothy Keitt, University of Texas at Austin

First Author

Michael Schwob, Virginia Tech

Presenting Author

Michael Schwob, Virginia Tech