Wednesday, Aug 5: 2:00 PM - 3:50 PM
1630
Topic-Contributed Paper Session
Applied
Yes
Main Sponsor
Section on Physical and Engineering Sciences
Co Sponsors
Astrostatistics Interest Group
Section on Bayesian Statistical Science
Presentations
Many complex datasets in modern astronomy and cosmology are the result of intricate physical phenomena combined with detailed instrumental effects and observational processes. In these cases, accurately specifying the probability distribution of the data given the parameters may be difficult or impossible, making statistical inference using conventional likelihood-based (including Bayesian) methods intractable. A introduction will be given to motivate machine learning-enabled methods for statistical inference broadly applicable when a forward model encapsulating the complex physical, instrumental, and observational effects can be used to simulate realistic data. Such simulation-based inference methods promise to enable effective statistical inference for many previously intractable data analysis problems in astronomy.
Keywords
astrostatistics
Bayesian methods
machine learning
simulation-based inference
Performing Simulation-Based Inference requires reliable simulators of a vast array of physical effects in any modern experimental setting. I will discuss two tools for achieving such reliability. First, I discuss caskade, which is designed for arbitrarily scalable simulator design. I will demonstrate compelling applications in astronomical image processing, strong gravitational lensing, and supernova cosmology. Second, I will discuss PTED, a tool for evaluating simulated samples against real data. It is an exact, multi-dimensional, two sample test with a high sensitivity to distribution mismatch. We will see how it effectively spots common errors in simulation products. Further, I will show how it can evaluate the final product of SBI, posterior samples, in full multi-dimensional detail. Finally, I will end with some preliminary results using these tools on the case of Rubin Observatory Supernova cosmology at the scale of hundreds of thousands of objects.
Keywords
Simulation-Based Inference
Simulators
Two-Sample Tests
Python
Rubin Observatory
Interpreting galaxy images and spectroscopy is central to studies of galaxy formation and evolution. Widely-used forward-modeling approaches infer galaxy parameters via MCMC but take hours per galaxy due to model generation cost; this is prohibitive for modern astronomical surveys with millions of galaxies. Here, we present a simulation-based inference framework for fast, amortized inference of galaxy physical parameters in the HETDEX survey. A unique challenge is that astrophysical inference is not performed upon raw measurements but instead processed observables which have strongly heteroscedastic uncertainties, epistemic uncertainties, and patchy imaging coverage. Furthermore, galaxy inference requires complex physical forward models whose correctness cannot be empirically verified, making simulation-based inference a natural framework for posterior predictive checks at scale. We address these challenges with a neural posterior estimator trained on ∼10^7 simulated galaxies generated with a 17-parameter model, paired with lin–log asinh magnitudes and a tailored uncertainty model; we contrast the predictive performance and speed with classic MCMC inference.
Speaker
Lishan Shi, The Pennsylvania State University
How much can we learn about Dark Energy and how it behaves from weak gravitational lensing surveys ? Two-point functions offer some insight but miss non-Gaussian information. Simulation-based inference (SBI) offers a way to combine and learn higher-order statistics via neural compression, but does not always a) leverage or b) exceed human domain knowledge in physical inference problems in terms of bits extracted from data, especially when simulations are large and limited in number.
I will present an information-theoretic approach to illustrate SBI , which can be naturally extended to derive hybrid statistics, an optimal framework for combining domain knowledge and learned neural summaries. These statistics improve information extraction from the field-level compared to neural summaries alone or their concatenation to existing summaries and makes inference robust in settings with low training data.
I will show an application of hybrid statistics for constraining wCDM from the Dark Energy Survey Year 3 data. By changing the optimisation objective alone, the method is forecast to provide the most competitive Dark Energy and weak lensing parameter constraints to date, showcasing the power of SBI for science applications. Furthermore, the modular nature of hybrid statistics alongside hand-designed statistics may shed light on where signatures of Dark Energy information might lie in massive cosmological datasets, to be exploited in upcoming astronomical surveys.
Keywords
simulation-based inference
cosmology
neural statistics
This talk concerns statistical inference on the internal parameters of complex physical systems, where the likelihood is intractable but encoded by a simulator or in observations of Nature itself. In this so-called likelihood-free inference (LFI) setting, one can estimate key quantities such as likelihoods, posteriors, or likelihood ratios from labeled (e.g. simulated) data. An open question is how to best construct confidence sets with high power for realistic settings with finite sample sizes and model misspecifications. In this work, we leverage estimated posteriors to construct sets with frequentist coverage and high constraining power (small average size) that are robust to several forms of model misspecification and can be estimated using limited or sparse calibration data.
Keywords
Likelihood-free inference
Model misspecification
Limited data