When Many Models Are Equally Accurate: Exploring Interpretability and Uncertainty Quantification

Srikar Katta Chair
 
Zhenke Wu Discussant
University of Michigan
 
Harsh Parikh Organizer
Johns Hopkins University
 
Wednesday, Aug 6: 8:30 AM - 10:20 AM
0456 
Invited Paper Session 
Music City Center 
Room: CC-202C 

Keywords

Machine Learning

Causal Inference

Interpretability 

Applied

Yes

Main Sponsor

Social Statistics Section

Co Sponsors

Section on Nonparametric Statistics
Section on Statistical Learning and Data Science

Presentations

Who Are We Missing? A Principled Approach to Characterizing the Underrepresented Population

Randomized controlled trials (RCTs) serve as the cornerstone for understanding causal effects, yet extending inferences to target populations presents challenges due to effect heterogeneity and underrepresentation. Our paper addresses the critical issue of identifying and characterizing underrepresented subgroups in RCTs, proposing a novel framework for refining target populations to improve generalizability. We introduce an optimization-based approach, Rashomon Set of Optimal Trees (ROOT), to characterize underrepresented groups. ROOT optimizes the target subpopulation distribution by minimizing the variance of the target average treatment effect estimate, ensuring more precise treatment effect estimations. Notably, ROOT generates interpretable characteristics of the underrepresented population, aiding researchers in effective communication. Our approach demonstrates improved precision and interpretability compared to alternatives, as illustrated with synthetic data experiments. We apply our methodology to extend inferences from the Starting Treatment with Agonist Replacement Therapies (START) trial -- investigating the effectiveness of medication for opioid use disorder -- to the real-world population represented by the Treatment Episode Dataset: Admissions (TEDS-A). By refining target populations using ROOT, our framework offers a systematic approach to enhance decision-making accuracy and inform future trials in diverse populations. 

Keywords

Causal Inference

Generalizability

Interpretability

Machine Learning 

Speaker

Harsh Parikh, Johns Hopkins University

A path to simpler models starts with noise

The Rashomon set is the set of models that perform approximately equally well on a given dataset, and the Rashomon ratio is the fraction of all models in a given hypothesis space that are in the Rashomon set. Rashomon ratios are often large for tabular datasets in criminal justice, healthcare, lending, education, and in other areas, which has practical implications about whether simpler models can attain the same level of accuracy as more complex models. An open question is why Rashomon ratios often tend to be large. In this work, we propose and study a mechanism of the data generation process, coupled with choices usually made by the analyst during the learning process, that determines the size of the Rashomon ratio. Specifically, we demonstrate that noisier datasets lead to larger Rashomon ratios through the way that practitioners train models. Additionally, we introduce a measure called pattern diversity, which captures the average difference in predictions between distinct classification patterns in the Rashomon set, and motivate why it tends to increase with label noise. Our results explain a key aspect of why simpler models often tend to perform as well as black box models on complex, noisier datasets. 

Keywords

Machine Learning

Interpretability

Uncertainty 

Speaker

Lesia Semenova, Rutgers University

Quantifying and Correcting for Model Space Bias from AI-Synthesized Data in Streaming Data

Hoffman et al. (2024) investigate how the inclusion of synthetic AI or ML-generated data can bias the space of feasible models, potentially leading to erroneous downstream decision-making. This work demonstrates how to quantify and correct for this bias through the inclusion of small amounts of real data with a correction factor from the framework of Inference on Predicted Data (IPD). With this procedure, we demonstrate how to get valid statistical inference in the context of streaming data even when much of the data is machine biased. Furthermore, Bayesian optimal experimental design leveraged to define the optimal sample sizes of real and synthetic data to best control the space of feasible models. 

Keywords

Artificial Intelligence

Inference on Predicted Data

Statistical Inference

Streaming Data

Bayesian Optimal Experimental Design 

Co-Author(s)

Kentaro Hoffman, University of Washington Department of Statistics
Tyler McCormick, University of Washington

Speaker

Kentaro Hoffman, University of Washington Department of Statistics