Wednesday, Aug 6: 8:30 AM - 10:20 AM
0456
Invited Paper Session
Music City Center
Room: CC-202C
Machine Learning
Causal Inference
Interpretability
Applied
Yes
Main Sponsor
Social Statistics Section
Co Sponsors
Section on Nonparametric Statistics
Section on Statistical Learning and Data Science
Presentations
Randomized controlled trials (RCTs) serve as the cornerstone for understanding causal effects, yet extending inferences to target populations presents challenges due to effect heterogeneity and underrepresentation. Our paper addresses the critical issue of identifying and characterizing underrepresented subgroups in RCTs, proposing a novel framework for refining target populations to improve generalizability. We introduce an optimization-based approach, Rashomon Set of Optimal Trees (ROOT), to characterize underrepresented groups. ROOT optimizes the target subpopulation distribution by minimizing the variance of the target average treatment effect estimate, ensuring more precise treatment effect estimations. Notably, ROOT generates interpretable characteristics of the underrepresented population, aiding researchers in effective communication. Our approach demonstrates improved precision and interpretability compared to alternatives, as illustrated with synthetic data experiments. We apply our methodology to extend inferences from the Starting Treatment with Agonist Replacement Therapies (START) trial -- investigating the effectiveness of medication for opioid use disorder -- to the real-world population represented by the Treatment Episode Dataset: Admissions (TEDS-A). By refining target populations using ROOT, our framework offers a systematic approach to enhance decision-making accuracy and inform future trials in diverse populations.
Keywords
Causal Inference
Generalizability
Interpretability
Machine Learning
The Rashomon set is the set of models that perform approximately equally well on a given dataset, and the Rashomon ratio is the fraction of all models in a given hypothesis space that are in the Rashomon set. Rashomon ratios are often large for tabular datasets in criminal justice, healthcare, lending, education, and in other areas, which has practical implications about whether simpler models can attain the same level of accuracy as more complex models. An open question is why Rashomon ratios often tend to be large. In this work, we propose and study a mechanism of the data generation process, coupled with choices usually made by the analyst during the learning process, that determines the size of the Rashomon ratio. Specifically, we demonstrate that noisier datasets lead to larger Rashomon ratios through the way that practitioners train models. Additionally, we introduce a measure called pattern diversity, which captures the average difference in predictions between distinct classification patterns in the Rashomon set, and motivate why it tends to increase with label noise. Our results explain a key aspect of why simpler models often tend to perform as well as black box models on complex, noisier datasets.
Keywords
Machine Learning
Interpretability
Uncertainty
Hoffman et al. (2024) investigate how the inclusion of synthetic AI or ML-generated data can bias the space of feasible models, potentially leading to erroneous downstream decision-making. This work demonstrates how to quantify and correct for this bias through the inclusion of small amounts of real data with a correction factor from the framework of Inference on Predicted Data (IPD). With this procedure, we demonstrate how to get valid statistical inference in the context of streaming data even when much of the data is machine biased. Furthermore, Bayesian optimal experimental design leveraged to define the optimal sample sizes of real and synthetic data to best control the space of feasible models.
Keywords
Artificial Intelligence
Inference on Predicted Data
Statistical Inference
Streaming Data
Bayesian Optimal Experimental Design