Tuesday, Aug 5: 8:30 AM - 10:20 AM
0215
Invited Paper Session
Music City Center
Room: CC-Davidson Ballroom B
Rashomon
Model set selection
Stability
Applied
No
Main Sponsor
Section on Statistical Learning and Data Science
Co Sponsors
Section on Statistical Computing
Presentations
Most scientific publications follow the familiar recipe of (i) obtaining data, (ii) fitting a model, and (iii) commenting on the scientific relevance of the effects of particular covariates in that model. This approach, however, ignores the fact that there may exist a multitude of similarly-accurate models in which the implied effects of individual covariates may be vastly different. This problem of finding an entire collection of plausible models has also received relatively little attention in the statistics community, with nearly all of the proposed methodologies being narrowly tailored to a particular model class and/or requiring an exhaustive search over all possible models, making them largely infeasible in the current big data era. The idea of forward stability is developed, and a novel, computationally-efficient approach is proposed to finding collections of accurate models referred to as model path selection (MPS). MPS builds up a plausible model collection via a forward selection approach and is entirely agnostic to the model class and loss function employed. The resulting model collection can be displayed in a simple and intuitive graphical fashion, easily allowing practitioners to visualize whether some covariates can be swapped for others with minimal loss.
Keywords
Stability
Forward Selection
Trees
Ranking and Selection
I will present the Rashomon set paradigm for interpretable machine learning. In this paradigm, machine learning algorithms are not focused on finding a single optimal model, but instead capture the full collection of good (i.e., low-loss) models, i.e., the "Rashomon set." I will show how the Rashomon set paradigm solves the interaction bottleneck to users for sparse decision trees and sparse risk scores, and discuss other benefits of the Rashomon sets discussed in this paper:
Amazing Things Come From Having Many Good Models. ICML spotlight, 2024.
https://arxiv.org/abs/2407.04846
Keywords
interpretable machine learning
decision trees
sparsity
human-computer interaction
AI
From applications in structural biology to the analysis of electronic health record data, predictions from machine learning models increasingly complement costly gold-standard data in scientific inquiry. While "using predictions as data" enables biomedical studies to scale in an unprecedented manner, appropriately accounting for inaccuracies in the predictions is critical to achieving trustworthy conclusions from downstream statistical inference.
In this talk, I will explore the methodological and practical impacts of using predictions as data on statistical inference across various biomedical applications. I will introduce our recently proposed method for bias correction and draw connections with classical statistical approaches dating back to the 1960s. Time permitting, I will also discuss ethical, social, and cultural challenges of using predictions as data, underscoring the need for careful and thoughtful adoption of this practice in biomedical research.
Online nonparametric estimators are gaining popularity due to their efficient computation and competitive generalization abilities. An important example is variants of stochastic gradient descent. These algorithms often take one sample point at a time and incrementally update the parameter estimate of interest. In this work, we consider model selection/hyperparameter tuning for such online algorithms. We propose a weighted rolling validation (wRV) procedure, an online variant of leave-one-out cross-validation, that costs minimal extra computation for many typical stochastic gradient descent estimators and maintains their online nature. We study the model selection behavior of wRV under a general stability framework and also reveal some unexpected advantage of wRV over its batch counterpart.
Keywords
model selection
cross-validation
online learning
nonparametric regression
Speaker
Jing Lei, Carnegie Mellon University