Amortized Learning for Environmental Data using Neural Networks

Brian Reich Chair
North Carolina State University
 
Reetam Majumder Organizer
North Carolina State University
 
Brian Reich Organizer
North Carolina State University
 
Monday, Aug 5: 2:00 PM - 3:50 PM
1052 
Invited Paper Session 
Oregon Convention Center 
Room: CC-C122 

Applied

Yes

Main Sponsor

Section on Statistics and the Environment

Co Sponsors

Section on Statistical Computing
Section on Statistical Learning and Data Science

Presentations

Fast Estimation of non-Gaussian Fields

Data derived from remote sensing or from the output of numerical simulations typically has some regular gridded structure but is large in volume. The challenge is to find accurate spatial models to fill in missing grid cells or emulate the process in the presence of heterogeneity of the spatial fields and heavy tailed marginal distributions. A spatial autoregressive model is a map from a location and its neighbors to spatially independent random variables and can provide a flexible model for non-Gaussian fields. This can be accomplished using distributions of innovations with heavy tails and maps that are nonlinear in combining the central location with its neighbors. These models are fast to simulate by taking advantage of the sparseness of the map, but the estimation is slow for large data fields. An alternative to traditional statistical methods is to train a neural network based on a large training set spanning a useful parameter space and then use the network for fast estimation. This approach is applied to high resolution ecologic data from the JPL SHIFT mission and also from numerical simulations of urban flooding under different storm forcings. 

Co-Author(s)

Sweta Rai
Soutir Bandyopadhyay, Colorado School of Mines

Speaker

Douglas Nychka, Colorado School of Mines

Scaling Black-Box Inference to Large Spatial Settings: a Distributed Approach

Extreme environmental processes display spatial and temporal dependencies that are computationally expensive to model, even with small datasets. These data are usually modeled using max stable models, and typical estimators rely on sub-optimal composite likelihoods that imply a loss in efficiency for higher dimensions. We propose a novel distributed deep learning multi-step approach. First, deep neural networks are trained using simulated data on subsets of the spatial domain to estimate parameters locally and quantify their uncertainty. We take advantage of the fact that simulation from such models is fast and easy in smaller spatial partitions. In the next step, a meta-estimator that reduces the bias of parameter estimates over a full data approach without compromising the increase of the variance that would follow is obtained. The proposed methodology enables statistical inference for intractable likelihoods with a previously prohibitive number of observations. 

Speaker

Amanda Lenzi

Neural amortized kriging for scalable Gaussian process inference

Spatial statistics often leverages the flexibility and interpretability of Gaussian processes to predict values at unseen spatial locations through Kriging. Unfortunately, determination of Kriging weights relies on the inversion of the process' covariance matrix, creating a computational bottleneck for large spatial datasets. We propose neural amortized Kriging that uses feed-forward neural networks (FFNNs) to learn a mapping from scaled spatial location coordinates and covariance function parameters to Kriging weights and the spatial variance. The FFNNs are trained on synthetic data, and the Vecchia approximation is used to ensure scalability to large spatial datasets. Since the FFNNs do not require a matrix inversion step for predictions, our approach bypasses the bottleneck of Gaussian processes entirely. We demonstrate significant speedup over existing frequentist methods with comparable estimation and prediction errors through simulation studies and using the Jason-3 windspeed dataset. 

Speaker

Reetam Majumder, North Carolina State University

Neural Bayes Estimators for Censored Inference with Peaks-over-threshold Models

Making inference with spatial extremal dependence models can be computationally costly as they involve intractable and/or censored likelihoods. Building upon recent advances in likelihood-free inference with neural Bayes estimators, we develop highly efficient estimators for censored peaks-over-threshold models that encode censoring information in the neural network architecture. Our new method provides a paradigm shift that challenges traditional censored likelihood-based inference methods for spatial extremal dependence models. Simulation studies show massive gains in both computational and statistical efficiency, relative to competing likelihood-based approaches, when fitting popular extremal dependence models, such as max-stable, Pareto, and random scale mixture process models. We also illustrate that it is possible to train a single neural Bayes estimator for a general censoring level, precluding the need to retrain the network when the censoring level is changed. We illustrate the efficacy of our estimators by making fast inference on hundreds-of-thousands of high-dimensional spatial extremal dependence models to assess high PM2.5 concentration over the whole of Saudi Arabia. 

Speaker

Raphael Huser, KAUST