Wednesday, Aug 5: 2:00 PM - 3:50 PM
1630
Topic-Contributed Paper Session
Thomas M. Menino Convention & Exhibition Center
Room: CC-107A
Applied
Yes
Main Sponsor
Section on Physical and Engineering Sciences
Co Sponsors
Astrostatistics Interest Group
Section on Bayesian Statistical Science
Presentations
Many complex datasets in modern astronomy and cosmology are the result of intricate physical phenomena combined with detailed instrumental effects and observational processes. In these cases, accurately specifying and evaluating the probability distribution of the data given the parameters may be difficult or impossible, making statistical inference using conventional likelihood-based (including Bayesian) methods intractable. A conceptual introduction will be given to motivate machine learning-enabled methods for statistical inference broadly applicable when a forward model encapsulating the complex physical, instrumental, and observational effects can be used to simulate realistic data. Such simulation-based inference methods promise to enable effective statistical inference for many previously intractable data analysis problems in astronomy. As time permits, some example applications from recent astronomical research will be illustrated.
Keywords
astrostatistics
Bayesian methods
machine learning
simulation-based inference
Performing Simulation-Based Inference requires reliable simulators of a vast array of physical effects in any modern experimental setting. I will discuss two tools for achieving such reliability. First, I discuss caskade, which is designed for arbitrarily scalable simulator design. I will demonstrate compelling applications in astronomical image processing, strong gravitational lensing, and supernova cosmology. Second, I will discuss PTED, a tool for evaluating simulated samples against real data. It is an exact, multi-dimensional, two sample test with a high sensitivity to distribution mismatch. We will see how it effectively spots common errors in simulation products. Further, I will show how it can evaluate the final product of SBI, posterior samples, in full multi-dimensional detail. Finally, I will end with some preliminary results using these tools on the case of Rubin Observatory Supernova cosmology at the scale of hundreds of thousands of objects.
Keywords
Simulation-Based Inference
Simulators
Two-Sample Tests
Python
Rubin Observatory
Interpreting galaxy imaging and spectroscopy is central to studies of galaxy formation and evolution. Widely used forward-modeling approaches infer galaxy parameters via MCMC but take hours per galaxy due to model-generation costs, making them prohibitive for modern astronomical surveys with millions of galaxies. Here, we present a simulation-based inference framework for fast, amortized inference of galaxy physical parameters in the HETDEX survey. A unique challenge is that astrophysical inference is not performed on raw measurements, but rather on processed observables with strongly heteroscedastic uncertainties, epistemic uncertainties, and patchy imaging coverage. We address these challenges using a neural posterior estimator trained on ~10^7 simulated galaxies generated from a 17-parameter model, paired with lin–log asinh magnitudes and a tailored uncertainty model. The resulting amortized approach recovers key galaxy properties, including redshift and stellar mass, in ~0.06 seconds per object, achieving a ~2.5 million-fold speedup over traditional nested sampling methods while preserving comparable accuracy and uncertainty quantification. Characterizing redshift posteriors for faint, poorly constrained galaxies is challenging because they can contain multiple well-separated modes which are difficult for a sampler to traverse. Our SBI framework avoids this traverse entirely by sampling directly from these modes, producing better-calibrated redshift posterior distributions than our state-of-the-art benchmark while maintaining comparable performance for bright objects. This demonstrates that improved posterior calibration does not come at the expense of predictive accuracy. We additionally introduce a realistic masking framework to accommodate missing data.
Speaker
Lishan Shi, The Pennsylvania State University
How much can we learn about Dark Energy and how it behaves from weak gravitational lensing surveys ? Two-point functions offer some insight but miss non-Gaussian information. Simulation-based inference (SBI) offers a way to combine and learn higher-order statistics via neural compression, but does not always a) leverage or b) exceed human domain knowledge in physical inference problems in terms of bits extracted from data, especially when simulations are large and limited in number.
I will present an information-theoretic approach to illustrate SBI , which can be naturally extended to derive hybrid statistics, an optimal framework for combining domain knowledge and learned neural summaries. These statistics improve information extraction from the field-level compared to neural summaries alone or their concatenation to existing summaries and makes inference robust in settings with low training data.
I will show an application of hybrid statistics for constraining wCDM from the Dark Energy Survey Year 3 data. By changing the optimisation objective alone, the method is forecast to provide the most competitive Dark Energy and weak lensing parameter constraints to date, showcasing the power of SBI for science applications. Furthermore, the modular nature of hybrid statistics alongside hand-designed statistics may shed light on where signatures of Dark Energy information might lie in massive cosmological datasets, to be exploited in upcoming astronomical surveys.
Keywords
simulation-based inference
cosmology
neural statistics
This talk concerns statistical inference on the internal parameters of complex physical systems, where the likelihood is intractable but encoded by a simulator or in observations of Nature itself. In this so-called likelihood-free inference (LFI) setting, one can estimate key quantities such as likelihoods, posteriors, or likelihood ratios from labeled (e.g. simulated) data. An open question is how to best construct confidence sets with high power for realistic settings with finite sample sizes and model misspecifications. In this work, we leverage estimated posteriors to construct sets with frequentist coverage and high constraining power (small average size) that are robust to several forms of model misspecification and can be estimated using limited or sparse calibration data.
Keywords
Likelihood-free inference
Model misspecification
Limited data