Monday, Aug 3: 8:30 AM - 10:20 AM
1275
Invited Paper Session
Early developments in Statistics were often driven by astronomical data and analyses. It is well known that the motivation for the development of regression analysis by Gauss was the problem of locating the position of minor planet Ceres after emergence from Sun block. In recent times, Astrostatistics has driven new techniques in MCMC and Bayesian methodology. Astronomical data has stead ily increased in both quantity and quality, with space telescopes like SDO, Hubble, Chandra, XMM-Newton, Gaia, TESS, and JWST producing copious data with unprecedented resolutions, and massive ground-based surveys like Rubin LSST and the Square Kilometer Array poised to cause a revolution in both astronomy and data science. Recently, astronomy has also been at the forefront of applications of Machine Learning methods.
This session emphasizes the interface between Machine Learning, Statistics, and Astronomy, emphasizing the advances in statistical machine learning driven by astronomical data and analyses. In keeping with the theme of the JSM, this session will focus on the community of AI-aware astronomers bringing the ir unique perspectives to mesh together astronomy and statistical machine learning. This is particularly relevant in the Boston area, which has one of the largest astrostatistical and astroinformatics communities in the world, and has been buzzing with activity in this field. Greater Boston has several institutions that do space weather and astronomy, and concomitantly work in the overlap between these fields in both astrostatistics and AI (notably the CHASC Astrostatistics collaboration and the AstroAI initiative). We will bring in three researchers working at the nexus of statistical machine learning and astronomy to present talks at the session. Additionally, we will have two discussants synthesizing the current state of the art, one from an astronomer's perspective, and one from a statistician's perspective.
This session is congruous to the aims of ASA's Astrostatistics Interest Group (AIG), which seeks to highlight and advance astrostatistical learning and collaborations.
Applied
Yes
Main Sponsor
Section on Physical and Engineering Sciences
Co Sponsors
Astrostatistics Interest Group
Section on Statistical Learning and Data Science
Presentations
Score-based generative models (SBGMs) offer a path to mitigate class imbalance in solar flare prediction. Recent and prior work in the community has shown the ability of deep learning models to reliably separate flare-free intervals from flaring ones, but this framing potentially overestimates the performance when the objective is to discriminate strong (M-class+) flares from weaker events. In this regime of training only on intervals containing flares, the scarcity of high-quality examples of strong flares results in limited model performance for operational use. We study data augmentation for multichannel solar-flare images using synthetic strong-flare samples drawn from a conditional SBGM. In a simplified linear-regression setting, we derive conditions under which data augmentation can improve prediction. Using this as a reference, we outline potential conditions under which this can be applied for solar flare forecasting beyond what our empirical results show.
Keywords
generative models
synthetic data augmentation
space weather
Speaker
Kevin Jin, University of Michigan, Ann Arbor
Foundation models for scientific data must contend with a fundamental challenge: observations often conflate the true underlying physical phenomena with systematic distortions introduced by measurement instruments. This entanglement limits model generalization, especially in heterogeneous or multi-instrument settings. We present a causally motivated foundation model that explicitly disentangles physical and instrumental factors using a dual-encoder architecture trained with structured contrastive learning. Leveraging naturally occurring observational triplets (i.e., where the same target is measured under varying conditions, and distinct targets are measured under shared conditions), the model learns separate latent representations for the underlying physical signal and instrument effects. Evaluated on simulated astronomical time series designed to resemble the complexity of variable stars observed by missions like NASA's Transiting Exoplanet Survey Satellite (TESS), the method outperforms traditional single-latent space foundation models on downstream prediction tasks, particularly in low-data regimes. These results demonstrate that our model supports key capabilities of foundation models, including few-shot generalization and efficient adaptation, and highlight the importance of encoding causal structure into representation learning for structured data.
The taxonomy of physical events in the night sky (erupting stars, stellar explosions, and even gravitational lensing) is inherently hierarchical in nature. Broad physical families subdivide into increasingly specific physics, forming a tree-structured classification problem. Standard classification objectives typically ignore said structure and treat all misclassifications equally, rather than encouraging classifiers to "make better mistakes". In this talk, I will discuss a few approaches to modeling hierarchical class structure within the objective function. I will also discuss open issues of "fuzzy boundaries" within this taxonomy, in which both discrete and continuous subclasses exist.
Keywords
Astronomy
Classification
Machine Learning