Flexible Statistical Inference with Machine Learning

Lucas Janson Chair
Harvard University
 
Lucas Janson Organizer
Harvard University
 
Tuesday, Aug 5: 8:30 AM - 10:20 AM
0169 
Invited Paper Session 
Music City Center 
Room: CC-214 

Applied

No

Main Sponsor

IMS

Co Sponsors

Section on Nonparametric Statistics
Section on Statistical Learning and Data Science

Presentations

Conformal changepoint localization

Offline changepoint localization is the problem of estimating the index at which a change occurred in the data generating distribution of an ordered list of data, or declaring that no change occurred. We present the broadly applicable CONCH (CONformal CHangepoint localization) algorithm, which uses a matrix of conformal p-values to produce a confidence interval for a changepoint under the mild assumption that the pre-change and post-change distributions are each exchangeable. We exemplify the CONCH algorithm on a variety of synthetic and real-world datasets, including using black-box classifiers to detect changes in sequences of images or text. 

Speaker

Aaditya Ramdas, Carnegie Mellon University

Gradient Equilibrium in Online Learning

We present a new perspective on online learning that we refer to as gradient equilibrium: a sequence of iterates achieves gradient equilibrium if the average of gradients of losses along the sequence converges to zero. In general, this condition is not implied by, nor implies, sublinear regret. It turns out that gradient equilibrium is achievable by standard online learning methods such as gradient descent and mirror descent with constant step sizes (rather than decaying step sizes, as is usually required for no regret). Further, as we show through examples, gradient equilibrium translates into an interpretable and meaningful property in online prediction problems spanning regression, classification, quantile estimation, and others. Notably, we show that the gradient equilibrium framework can be used to develop a debiasing scheme for black-box predictions under arbitrary distribution shift, based on simple post hoc online descent updates. We also show that post hoc gradient updates can be used to calibrate predicted quantiles under distribution shift, and that the framework leads to unbiased Elo scores for pairwise preference prediction. 

Keywords

Online learning

Debiasing

Recalibration 

Speaker

Ryan Tibshirani, UC Berkeley

Probably Approximately Correct Labels

Obtaining high-quality labeled datasets is often costly, requiring either extensive human annotation or expensive experiments. We propose a method to reduce this cost by using AI predictions where they are confident and collecting expert labels only where needed. Our procedure outputs a labeled dataset with a probably approximately correct (PAC) guarantee: with high probability, the labeling error is small. This approach enables rigorous, cost-effective dataset curation using modern AI models. We demonstrate the benefits of the methodology via text annotation with large language models, image labeling with pre-trained vision models, and studying protein folding with AlphaFold.  

Keywords

black-box machine learning

statistical inference 

Speaker

Tijana Zrnic, University of California