Print Close

Simulation-Based Inference: Robust Methods for Astronomy and the Broader Sciences

Maximilian Autenrieth Chair
University of Cambridge

Maximilian Autenrieth Organizer
University of Cambridge

Wednesday, Aug 5: 2:00 PM - 3:50 PM
1630
Topic-Contributed Paper Session

View Abstract 1630

Applied

Yes

Main Sponsor

Section on Physical and Engineering Sciences

Co Sponsors

Astrostatistics Interest Group

Section on Bayesian Statistical Science

Presentations

An Introduction to Simulation-Based Inference for Astronomical Applications

Many complex datasets in modern astronomy and cosmology are the result of intricate physical phenomena combined with detailed instrumental effects and observational processes. In these cases, accurately specifying the probability distribution of the data given the parameters may be difficult or impossible, making statistical inference using conventional likelihood-based (including Bayesian) methods intractable. A introduction will be given to motivate machine learning-enabled methods for statistical inference broadly applicable when a forward model encapsulating the complex physical, instrumental, and observational effects can be used to simulate realistic data. Such simulation-based inference methods promise to enable effective statistical inference for many previously intractable data analysis problems in astronomy.

Keywords

astrostatistics

Bayesian methods

machine learning

simulation-based inference

Speaker

Kaisey Mandel, University of Cambridge

Building a reliable simulator at scale, the case of Rubin Supernova Cosmology

Performing Simulation-Based Inference requires reliable simulators of a vast array of physical effects in any modern experimental setting. I will discuss two tools for achieving such reliability. First, I discuss caskade, which is designed for arbitrarily scalable simulator design. I will demonstrate compelling applications in astronomical image processing, strong gravitational lensing, and supernova cosmology. Second, I will discuss PTED, a tool for evaluating simulated samples against real data. It is an exact, multi-dimensional, two sample test with a high sensitivity to distribution mismatch. We will see how it effectively spots common errors in simulation products. Further, I will show how it can evaluate the final product of SBI, posterior samples, in full multi-dimensional detail. Finally, I will end with some preliminary results using these tools on the case of Rubin Observatory Supernova cosmology at the scale of hundreds of thousands of objects.

Keywords

Simulation-Based Inference

Simulators

Two-Sample Tests

Python

Rubin Observatory

Speaker

Connor Stone, University of Toronto

Millions of Galaxies, Sparse Information: Reliable SED Inference for HETDEX with Neural Density Estimators

Interpreting galaxy images and spectroscopy is central to studies of galaxy formation and evolution. Widely-used forward-modeling approaches infer galaxy parameters via MCMC but take hours per galaxy due to model generation cost; this is prohibitive for modern astronomical surveys with millions of galaxies. Here, we present a simulation-based inference framework for fast, amortized inference of galaxy physical parameters in the HETDEX survey. A unique challenge is that astrophysical inference is not performed upon raw measurements but instead processed observables which have strongly heteroscedastic uncertainties, epistemic uncertainties, and patchy imaging coverage. Furthermore, galaxy inference requires complex physical forward models whose correctness cannot be empirically verified, making simulation-based inference a natural framework for posterior predictive checks at scale. We address these challenges with a neural posterior estimator trained on ∼10^7 simulated galaxies generated with a 17-parameter model, paired with lin–log asinh magnitudes and a tailored uncertainty model; we contrast the predictive performance and speed with classic MCMC inference.

Speaker

Lishan Shi, The Pennsylvania State University

Shedding light on Dark Energy with Weak Lensing and Hybrid Statistics

How much can we learn about Dark Energy and how it behaves from weak gravitational lensing surveys ? Two-point functions offer some insight but miss non-Gaussian information. Simulation-based inference (SBI) offers a way to combine and learn higher-order statistics via neural compression, but does not always a) leverage or b) exceed human domain knowledge in physical inference problems in terms of bits extracted from data, especially when simulations are large and limited in number.

I will present an information-theoretic approach to illustrate SBI , which can be naturally extended to derive hybrid statistics, an optimal framework for combining domain knowledge and learned neural summaries. These statistics improve information extraction from the field-level compared to neural summaries alone or their concatenation to existing summaries and makes inference robust in settings with low training data.

I will show an application of hybrid statistics for constraining wCDM from the Dark Energy Survey Year 3 data. By changing the optimisation objective alone, the method is forecast to provide the most competitive Dark Energy and weak lensing parameter constraints to date, showcasing the power of SBI for science applications. Furthermore, the modular nature of hybrid statistics alongside hand-designed statistics may shed light on where signatures of Dark Energy information might lie in massive cosmological datasets, to be exploited in upcoming astronomical surveys.

Keywords

simulation-based inference

cosmology

neural statistics

Speaker

T. Lucas Makinen

Trustworthy Scientific Inference from Limited or Sparse Calibration Data

This talk concerns statistical inference on the internal parameters of complex physical systems, where the likelihood is intractable but encoded by a simulator or in observations of Nature itself. In this so-called likelihood-free inference (LFI) setting, one can estimate key quantities such as likelihoods, posteriors, or likelihood ratios from labeled (e.g. simulated) data. An open question is how to best construct confidence sets with high power for realistic settings with finite sample sizes and model misspecifications. In this work, we leverage estimated posteriors to construct sets with frequentist coverage and high constraining power (small average size) that are robust to several forms of model misspecification and can be estimated using limited or sparse calibration data.

Keywords

Likelihood-free inference

Model misspecification

Limited data

Speaker

James Carzon