Wednesday, Aug 6: 10:30 AM - 12:20 PM
4157
Contributed Posters
Music City Center
Room: CC-Hall B
Main Sponsor
Astrostatistics Interest Group
Presentations
In astrophysics, the C statistic, which is a likelihood ratio statistic, has been
widely adopted for model fitting and goodness-of-fit assessments for Poisson-count
data with heterogeneous rates. In many astronomy and high-energy physics applications, the observations are sparse, making the theoretical properties of C-statistics questionable. Over the past decade, researchers have gradually realized the problems of directly applying the C-statistics for such small count data and published approximate solutions. In this paper, we comprehensively study the properties of C-statistics and evaluate various algorithms for goodness-of-fit assessment using C-statistics, emphasizing lowcount scenarios. Theoretical results, computational algorithms, extensive simulation
studies, and real data applications will be presented. We show both theoretically and numerically that (a) classical χ2-based goodness-of-fit assessment is not effective in low-count settings, (b) vanilla bootstrap with moment estimators of the mean and variances result in biases in estimated null distribution and (c)
high-order asymptotic achieves good precision, with a much lower computation cost.
Keywords
Goodness-of-fit
Low Count Data
C-statisitics
Bootstrap
High-energy Physics
A new method that enables the use of systematic errors for the maximum-likelihood regression of integer-count Poisson data is presented. The method is based on the use of a phenomenological intrinsic model variance that describes the variability of the model, and it results in a goodness-of-fit statistic that is a simple modification of the usual Poisson deviance, which is also known in the astronomical community as the Cash statistic. A related statistic that is used for testing nested model components is also presented. The new methods presented in this talk aim to overcome the difficulty associated with the regression of integer-count data when there are sources of error that go beyond those of the data-generating process. The method is shown to be formally equivalent to the regression with data that are distributed according to a compounded and therefore overdispersed Poisson variable. Simple analytic forms for the null-hypothesis distributions of the statistics are also presented.
Keywords
regression methods
Poisson distribution
systematic errors
goodness of fit statistics
Class imbalance is a common challenge in datasets where one category significantly outnumbers the others. This issue is particularly relevant in the prediction of extreme tail events, where the occurrence of such events is vastly outweighed by their non-occurrence, especially in scientific data. Generative learning models help address this issue by enabling users to generate synthetic samples from the learned joint distribution to augment the training data.
In this project, we train diffusion models on space weather data to mitigate class imbalance. Specifically, we train a diffusion model on multi-channel images captured by the Atmospheric Imaging Assembly (AIA) and the Helioseismic Magnetic Imager (HMI) aboard the Solar Dynamics Observatory (SDO). These images focus on active regions six hours before reaching a peak flare time. We assess the fidelity and quality of the generated synthetic images and evaluate their effectiveness in improving flare forecasting using common techniques and models from the space weather community for solar flare prediction.
Keywords
Diffusion Models
Data Augmentation
Space Weather
Co-Author
Yang Chen, University of Michigan
First Author
Kevin Jin, University of Michigan, Ann Arbor
Presenting Author
Kevin Jin, University of Michigan, Ann Arbor
Detecting Earth-like exoplanets presents a significant statistical challenge due to their weak signals, which can be obscured or even mimicked by stellar activity (e.g., sunspots, faculae). In this work, we propose a statistical approach to enhance exoplanet detection by analyzing a time series of stellar spectra while accounting for the confounding effects of stellar activity. We model the stellar spectra as a functional time series, using local spectral features to estimate planetary signals. However, stellar activity distorts the shape of these local features over time, introducing variability that can interfere with planetary detection. To address this, we apply dissimilarity metrics and dimension reduction techniques to characterize shape changes in the local features; the resulting embeddings are then incorporated into a statistical model to produce a clearer exoplanet signal. We leverage data from hundreds of the local spectral features to disentangle the effects of stellar activity from true planetary signals.
Keywords
astrostatistics
dimension reduction
time-series analysis
statistical shape analysis