Wednesday, Aug 6: 8:30 AM - 10:20 AM
4141
Contributed Papers
Music City Center
Room: CC-105B
Main Sponsor
Section on Statistical Learning and Data Science
Presentations
The traditional statistical/machine-learning paradigm generally seeks a single best model for prediction and interpretation. However, the Rashomon Effect, introduced by Leo Breiman, challenges this by highlighting how multiple equally good predictive models can exist for the same problem. This has significant implications for interpretation, usability, variable importance, and replicability. The collection of such models within a function class is called the Rashomon set, and recent research has focused on identifying and analyzing these sets. Motivated by sparse latent representations in high-dimensional problems, we propose a heuristic method that finds sets of sparse models with strong predictive power. Using a greedy forward-search, the algorithm builds progressively larger models by leveraging good low-dimensional ones. These sparse models, maintaining near-equal performance to common reference models (i.e. Rashomon Sets), can be connected into networks that provide deeper insights into variable interactions and into how latent variables contribute to the Rashomon Effect.
Keywords
SWAG: Sparse Wrapper Algorithm
Latent Representations
Replication Crisis
High-Dimensional Problems
Variable Importance
Explainable AI
Understanding the population-level natural history of chronic diseases is essential for developing effective targeted prevention strategies. Intensity-based multistate analyses estimate transition rates between disease states but often rely on longitudinal data, which may be impractical when resource constraints limit data collection to a cross-sectional sample from the target population. We propose an innovative framework for augmenting cross-sectional data from the target population with auxiliary longitudinal data from other populations with which transition intensity estimates can be identified and estimated. Importantly, this data augmentation approach facilitates the specification of semi-Markov models for a three-state process accommodating recurrent transitions between disease-free and diseased states. The method is evaluated through extensive simulation studies and we apply it to study the population-level natural history of cervical precancer using cross-sectional data from cervical cancer screening studies in two populations.
Keywords
multistate current status data
portable natural history models
recurrent processes
data integration
semi-Markov models
We estimate the probability of many (possibly dependent) binary outcomes, which is at the core of many applications. Without further conditions, the distribution of an M-dimensional binary vector is characterized by exponentially in M coefficients -- a high-dimensional problem even without covariates. Understanding the (in)dependence structure substantially improves the estimation as it allows an effective factorization of the distribution. To estimate this distribution, we leverage a Bahadur's representation connecting the sparsity of its coefficients with independence across components. We use regularized and adversarial regularized estimators, adaptive to the dependence structure, allowing rates of convergence to depend on the intrinsic (lower) dimension. We propose a locally penalized estimator the presence of (low dimensional) covariates, and provide rates of convergence addressing several challenges in the theoretical analyses when striving for making a computationally tractable formulation. We apply our results in estimating causal effects with multiple binary treatments and show how our estimators improve the finite sample performance compared with non-adaptive estimators.
Keywords
many binary classification
penalized regression
adversarial lasso
In statistical learning, algorithms that fit a lasso over a set of basis functions can achieve desirable theoretical properties. For example, the Highly Adaptive Lasso (HAL) estimator applies the lasso to a very high-dimensional indicator basis, attaining dimension-free rates of convergence across a large class of functions. However, the time complexity of such algorithms is often exponential in the number of features, meaning they are too computationally intensive for most practical data analysis problems. In this work, we show that a lasso-based empirical risk minimizer over a growing set of basis functions retains its asymptotic rate even if fit only on a random, relatively small subset of the basis. Applying this idea, we propose RandomHAL: a fast approximation to the HAL estimator that retains its desirable properties, yet can be fit on datasets with many more features. The empirical performance of RandomHAL is evaluated using simulation experiments.
Keywords
lasso
function estimation
empirical risk minimization
statistical learning
high-dimensional statistics
Co-Author
Nima Hejazi, Harvard T.H. Chan School of Public Health
First Author
Salvador Balkus, Harvard T.H. Chan School of Public Health
Presenting Author
Salvador Balkus, Harvard T.H. Chan School of Public Health
The Rashomon effect, introduced by Leo Breiman, highlights the existence of multiple plausible models explaining the same problem. The Sparse Wrapper Algorithm (SWAG) addresses this effect in high-dimensional settings by selecting sets of strong yet sparse models. However, as a heuristic method, its effectiveness requires formal validation. We propose a statistical testing framework to assess whether SWAG extracts informative models. Instead of testing individual models, we leverage the SWAG network and apply graph-theoretical tools to evaluate its informativeness. By comparing SWAG networks under null and alternative hypotheses, we examine whether selected models concentrate around significant variables through the use of network measures to quantify these differences. Using a bootstrap approach to approximate test statistic distributions, we confirm that entropies of eigenvector centrality and variable frequency are two strong candidates that differentiate the SWAG network under the null and the alternative hypotheses. This framework offers a foundation for inference on Rashomon sets and for interpreting model diversity.
Keywords
SWAG
Multi-model selection
Statistical testing
Rashomon effect
Sparse models
The double descent phenomenon observed in overparameterized machine learning models appears to defy classical prediction risk theory and has spurred considerable research. Recently, a notion of predictive model degrees of freedom (PDOF) has been proposed as an alternative measure of model complexity to explain the double descent phenomenon with a focus on linear modeling procedures. We extend PDOF to the nonlinear case by first studying the lasso model. The PDOF for lasso involves the covariance matrix of the lasso estimator, for which no closed-form expression exists. Furthermore, existing covariance matrix estimators only work in the under-parameterized case. To fill this gap, we explore two estimators: one based on the iterative soft-thresholding algorithm, and the other based on the infinitesimal jackknife. In a simulation study, we compare these estimators with bootstrap and other covariance matrix estimators based on approximate lasso solutions. Beyond lasso, the infinitesimal jackknife approach can be used to quantify the PDOF of other algorithmic models such as random forests and neural networks.
Keywords
Double Descent
Infinitesimal Jackknife
Lasso
Model Degrees of Freedom
Variance Estimation
In high-dimensional linear models, sparsity is often assumed to control variability and improve model performance. Equi-sparsity, where one assumes that predictors can be aggregated into groups sharing the same effects, is an alternative parsimonious structure that may be more suitable for many applications. Previous work has also shown a benefit of such structures for prediction in the presence of "rare features". This paper proposes a tree-guided penalty for simultaneous estimation and group aggregation. Unlike existing methods, our estimator avoids the overparametrization and the unfair group selection problem therein. We provide a closed-form solution to the proximal operator, allowing for efficient computation despite hierarchically overlapped groups. Novel techniques are developed to study the finite-sample error bound of this seminorm-induced penalty under least squares and binomial deviance losses. Compared to existing methods, the proposed approach is often more favorable in high-dimensional settings, as verified by extensive simulation studies. The method is further illustrated with application in microbiome data where we conduct post-selection inference on group effects.
Keywords
feature aggregation
equi-sparsity
tree-guided regularization
high-dimensional linear models
post-selection inference
proximal operator