Tuesday, Aug 5: 10:30 AM - 12:20 PM
4109
Contributed Papers
Music City Center
Room: CC-208A
Main Sponsor
IMS
Presentations
In order for probabilistic forecasts to be useful to decision makers, the forecasts should be calibrated – given a sequence of 90% quantile forecasts, we want the true value to be less than the forecast 90% of the time. Existing online calibration procedures, such as the quantile tracking algorithm from online conformal prediction (Angelopoulos et al., 2023), are able to effectively calibrate a single quantile but, when applied to multiple quantiles, can produce invalid probability distributions due to crossings – e.g., the calibrated 50% quantile forecast is above the calibrated 75% quantile forecast. In this work, we consider the problem of online calibration with order constraints. We propose intuitive ways of combining the quantile tracking algorithm with an order-enforcing method (such as sorting or isotonic regression) that produce a sequence of forecasts with no crossings but is also guaranteed to achieve the correct long-run coverage under mild assumptions. We demonstrate our methods on COVID-19 forecasting data.
Keywords
forecasting
calibration
conformal prediction
online learning
I will present a Physics-Informed multiple quantile regression model. The method features a regularizing term involving a Partial Differential Equation, that encodes the available problem-specific information about the phenomenon under study. The method permits to jointly estimate multiple quantiles, preserving monotonicity. Moreover, it can handle spatial data observed over non-Euclidean domains, such as linear networks, two-dimensional manifolds and non-convex volumes. The method will be illustrated through application to the study of nitrogen dioxide over Lombardy region, in Italy.
Keywords
spatial data analysis
smoothing with roughness penalties
quantile regression
Co-Author(s)
Ilenia Di Battista, Politecnico di Milano
Marco De Sanctis, Politecnico di Milano
Eleonora Arnone, Università degli Studi di Torino
Cristian Castiglione, Bocconi University
Mauro Bernardi, Università degli Studi di Padova
Francesca Ieva, Politecnico di Milano
First Author
Laura Maria Sangalli, MOX - Dipartimento Di Matematica, Politecnico Di Milano
Presenting Author
Laura Maria Sangalli, MOX - Dipartimento Di Matematica, Politecnico Di Milano
The origin of COVID-19 remains unclear despite extensive research. Theoretical models can simplify complex epigenetic landscapes by reducing vast methylation sites into manageable sets, revealing fundamental pathogen interactions that leap medical advances for the first time in tracing virus origin in the literature and practices. In our study, a max-logistic intelligence classifier analyzed 865,859 Infinium MethylationEPIC sites (CpGs), identifying eight CpGs that achieved 100% accuracy in distinguishing COVID-19 patients from other respiratory disease patients and healthy controls. One CpG, cg07126281, linked to the SAMM50 gene, shares genetic ties with rare infectious diseases like Sennetsu fever and glanders, suggesting a potential connection between COVID-19 and these diseases, possibly transmitted through contaminated seafood or glanders-infected individuals. Identifying such links among 865,859 CpG sites is challenging, with a random correlation probability of less than one in ten million. However, the likelihood of finding meaningful associations with rare diseases lowers this probability to one in one hundred million, reinforcing the credibility of our findings.
Keywords
Biomarkers
virus tracing
DNA methylations
site-site interaction effects
rare diseases
Sennetsu fever and glanders
First Author
Zhengjun Zhang, University of Chinese Academy of Sciences
Presenting Author
Zhengjun Zhang, University of Chinese Academy of Sciences
We analyze the statistical problem of recovering a discrete signal, modeled as a k-atomic uniform distribution μ, from a binned Poisson convolution model. This question is motivated from super-resolution microscopy where precise estimation of μ provides insights into spatial configurations, such as protein colocalization in cellular imaging. Our main result quantifies the minimax risk of estimating μ under the Wasserstein distance for Gaussian and compactly supported, smooth convolution kernels. Specifically, we show that the global minimax risk scales with t^{-1/2k} for t→∞, where t denotes the illumination time of the probe, and that this rate is achieved by the method of moments and the maximum likelihood estimator. To address practical settings where atoms of μ may be partially separated, we also analyze a regime with structured clusters and show faster adaptive rates for both estimators and locally minimax optimality. As an application we use our methods on experimental STED microscopy data to locate single DNA origami. In addition, we complement our findings with numerical experiments that showcase the practical performance of both estimators and their trade-offs.
Keywords
Gaussian Mixture Models
Method of Moments
Maximum Likelihood Estimation
Microscopy
Polynomial Root Stability
Chebyshev Systems
Adjusting for confounding and imbalance when establishing statistical relationships is an increasingly important task, and causal inference methods have emerged as the most popular tool to achieve this. Causal inference has been developed mainly for regression relationships with scalar responses and also for distributional responses. We introduce here a general framework for causal inference when responses reside in general geodesic metric spaces, where we draw on a novel geodesic calculus that facilitates scalar multiplication for geodesics and the quantification of treatment effects through the concept of geodesic average treatment effect. Using ideas from Fréchet regression, we obtain a doubly robust estimation of the geodesic average treatment effect and results on consistency and rates of convergence for the proposed estimators. We also study uncertainty quantification and inference for the treatment effect. Examples and practical implementations include simulations and data illustrations for responses corresponding to compositional responses as encountered for U.S. statewise energy source data, where we study the effect of coal mining, network data corresponding to New York taxi trips, where the effect of the COVID-19 pandemic is of interest, and the studying the effect of Alzheimer's disease on connectivity networks.
Keywords
Doubly robust estimation
Fréchet regression
geodesic average treatment effect
metric statistic
network
random object
Studies of T cells and their clonally unique receptors have shown promise in elucidating the association between immune response and human disease. Methods to identify T-cell receptor clones which expand or contract in response to certain therapeutic strategies have so far been limited to longitudinal pairwise comparisons of clone frequency with multiplicity adjustment. Here we develop a more general mixture model approach for arbitrary follow-up and missingness which partitions dynamic longitudinal clone frequency behavior from static. While it is common to mix on the location or scale parameter of a family of distributions, the model takes a different approach, mixing on the parameterization itself, the dynamic component allowing for a variable, Gamma-distributed Poisson mean parameter over longitudinal followup, while the static component mean is time invariant. We leverage Gamma-Poisson conjugacy to evaluate the model with respective component posterior predictive distributions and develop an EM-algorithm to estimate the empirical Bayes hyperparameters and component membership. We demonstrate the model in simulation and in a prostate cancer patient cohort.
Keywords
mixture model
hierarchical model
Bayesian conjugacy
EM algorithm
T-cell receptor
First Author
David Swanson, University of Texas MD Anderson Cancer Center
Presenting Author
David Swanson, University of Texas MD Anderson Cancer Center
Series regression estimates the conditional mean of a response variable by regressing it on features derived from basis functions evaluated at covariate values. Ordinary least squares (OLS)-based series estimators achieve minimax rate optimality but impose stringent assumptions on basis functions. To address this, prior work introduced the Forster-Warmuth (FW) learner, which relaxes these conditions using a unified pseudo-outcome framework to minimize bias from nuisance function estimation, achieving minimax rates under mild assumptions. While these results relied on an i.i.d. sample condition, we extend the FW framework to dependent data settings, including time series and spatial structures. Our analysis shows that under specific dependence conditions, the ℓ2 error rate aligns with the i.i.d. case, preserving minimax optimality. This extension broadens the applicability of FW-inspired methods to high-dimensional and structured data. We demonstrate its utility by estimating dose-response curves for continuous treatments under both unconfounded and confounded scenarios. We model air pollution's immediate effects on heart attack rates to identify actionable public health insights.
Keywords
Series regression
Forster-Warmuth (FW) learner
Minimax rate optimality
Dependent data
Dose-response curves
Air pollution and heart attack rates