Monday, Aug 4: 10:30 AM - 12:20 PM
4045
Contributed Papers
Music City Center
Room: CC-211
Main Sponsor
Section on Statistics and the Environment
Presentations
Advancements in information technology have enabled the generation of massive spatial datasets, necessitating scalable distributed methods. Centralized frameworks are prone to vulnerabilities such as single-point failures and communication bottlenecks. This paper introduces a decentralized framework for parameter inference in spatial low-rank models to address these limitations. A key challenge stems from spatial dependence among observations, which prevents the log-likelihood from being expressed as a summation-a critical requirement for decentralized optimization. To overcome this, we propose a novel objective function leveraging the evidence lower bound, facilitating the application of decentralized optimization techniques. Our approach integrates block descent with multi-consensus and dynamic consensus averaging for effective parameter optimization. We prove the new objective's convexity near true parameters, ensuring convergence. Additionally, we establish theoretical results on the consistency and asymptotic normality of the estimator for spatial low-rank models. Extensive simulations and real-world experiments confirm the framework's robustness and scalability.
Keywords
Block descent method
Dynamic consensus averaging
Evidence lower bound
Multi-consensus
Spatial dependence
Co-Author(s)
Sameh Abdulah, King Abdullah University of Science and Technology
Ying Sun, King Abdullah University of Science and Technology
Marc Genton, King Abdullah University of Science and Technology
First Author
Jianwei Shi, King Abdullah University of Science and Technology
Presenting Author
Jianwei Shi, King Abdullah University of Science and Technology
Spectroscopy is essential for scientific and industrial applications, enabling the analysis of complex materials and their interactions with radiation. Hyperspectral remote sensing, or imaging spectroscopy, plays a key role in Earth sciences, including ecology, geology, and cryosphere research. With the growing availability of orbital imaging spectrometers, developing methods to enhance data utility is crucial. Identifying diagnostic absorption features in spectra is vital for understanding spectral-response relationships. This study considers a Functional Partial Least Squares (FPLS) approach to model spectral data as smooth functions and analyze their impact within specific impact ranges. We propose a two-stage estimation procedure to determine these ranges' midpoints and half-lengths, along with an iterative algorithm to estimate their number and locations. The method is validated through simulations and applied to real spectral data to identify diagnostic absorption features for predicting soil calcium carbonate (CaCO₃) content, successfully estimating their number and locations.
Keywords
Functional Data Analysis
Functional Partial Least Squares
Spectroscopy
Impact Range
This work introduces a framework for robust Bayesian inference by integrating
two methodologies: a Bayesian exponentially tilted empirical likelihood and a
frequency domain empirical likelihood, each designed to address different aspects
of statistical modeling. The first component leverages a new variant of
the Wasserstein metric to concentrate the likelihood near a chosen parametric
family, enabling robust inference on model parameters in the presence of outliers.
We extend this idea to dependent data through a data transformation
(i.e., a Fourier transform) developed in terms of the spectral distribution. In
this semi-parametric approach, instead of using moment-based constraints as in
the existing literature, we employ distributional constraints so that the distribution
is concentrated around a guessed parametric family. Applications extend
to robust inference, spectral analysis, Whittle estimation, and goodness-of-fit
testing, with implications for trustworthy machine learning.
Keywords
Frequency Domain Empirical Likelihood
Robust Inference
Whittle Estimation
Spectral Distribution
Periodograms
Transport maps can be used to describe non-Gaussian multivariate distributions relative to a simple reference distribution, usually Gaussian. Previous work in this area focused on modeling transport maps using Gaussian processes, and computational limitations have led practitioners to focus on learning map parameters via stochastic gradient methods. We extend this idea by employing a Laplace approximation to the posterior distribution of transport map parameters. We first discuss the characteristics of the Laplace approximation in the transport map setting, then explore how capturing and quantifying uncertainty in transport map parameters affects the model's ability to learn the non-Gaussian target distribution. We then compare our new model's performance in learning the distribution of a potentially nonstationary spatial field to established methods using various metrics. Finally, we contrast the Laplace approximation with various other approximation and uncertainty quantification methods.
Keywords
Gaussian process
Generative modeling
Laplace approximation
Uncertainty quantification
Spatially-indexed multivariate data appear frequently in geostatistics and related fields including oceanography and environmental science, with data often modeled through covariance and cross-covariance functions in the Gaussian Process setting. The purpose of this work is to present techniques using multivariate mixtures for establishing validity that are simultaneously simplified and comprehensive. In particular, cross-covariances are constructed for the recently-introduced confluent hypergeometric (CH) class of covariance functions, which has slow (polynomial) decay in the tails of the covariance that better handles large gaps between observations in comparison with other covariance models. The approach leads to valid multivariate cross-covariance models that inherit the desired marginal properties of the confluent hypergeometric model and outperform the multivariate Matérn model in out-of-sample prediction under slowly-decaying correlation of the underlying multivariate random field. The model captures heavy tail decay and dependence between variables in an oceanography dataset of temperature, salinity and oxygen, as measured by autonomous floats in the Southern Ocean.
Keywords
Cross-covariances
Multivariate geostatistics
Oceanography
Spectral construction
The analysis of spatial data on a grid is a widely used tool in fields like demography, epidemiology, image analysis, and land management. The Ising and Potts models are often used for such data, for instance in studying protein structures in biology, reconstruction of social networks in social sciences, and image segmentation in computer vision. However, in high-correlation settings simulations from the fitted models are not able to reproduce the characteristics observed in the data. Furthermore, likelihood-based inference is challenging due to an intractable normalizing constant that is a function of the model parameters. We propose a novel tapered version of the Potts models that builds on work from Fellows and Handcock in the context of exponential family random graph models. We show that the tapered model is a valuable alternative to the Potts model and provide an algorithm to fit the model. Based on real and simulated data studies, we provide practical guidance on when to use the tapered model, along with a discussion of its potential limitations.
Keywords
Potts model
Lattice data modeling
Discrete lattice data
Compositional data are an increasingly prevalent data source in spatial statistics. Analysis of such data is typically done on log-ratio transformations or via Dirichlet regression. However, these approaches often make unnecessarily strong assumptions (e.g., strictly positive components, exclusively negative correlations). An alternative approach uses square-root transformed compositions and directional distributions. Such distributions naturally allow for zero-valued components and positive correlations, yet they may include support outside the non-negative orthant and are not generative for compositional data. To overcome this challenge, we truncate the elliptically symmetric angular Gaussian (ESAG) distribution to the non-negative orthant. Additionally, we propose a spatial hyperspheric regression that contains fixed and random multivariate spatial effects. The proposed model also contains a term that can be used to propagate uncertainty that may arise from precursory stochastic models (i.e., machine learning classification). We used our model in a simulation study and for a spatial analysis of classified bioacoustic signals of the Dryobates pubescens (downy woodpecker).
Keywords
Bayesian
generative
hyperspheric regression
uncertainty propagation
compositional data
directional data