Advances in Sparse Learning

Tathagata Dutta Chair
Michigan State University
 
Wednesday, Aug 6: 8:30 AM - 10:20 AM
4141 
Contributed Papers 
Music City Center 
Room: CC-105B 

Main Sponsor

Section on Statistical Learning and Data Science

Presentations

More of Less: A Rashomon Algorithm for Sparse Model Sets

The traditional statistical/machine-learning paradigm generally seeks a single best model for prediction and interpretation. However, the Rashomon Effect, introduced by Leo Breiman, challenges this by highlighting how multiple equally good predictive models can exist for the same problem. This has significant implications for interpretation, usability, variable importance, and replicability. The collection of such models within a function class is called the Rashomon set, and recent research has focused on identifying and analyzing these sets. Motivated by sparse latent representations in high-dimensional problems, we propose a heuristic method that finds sets of sparse models with strong predictive power. Using a greedy forward-search, the algorithm builds progressively larger models by leveraging good low-dimensional ones. These sparse models, maintaining near-equal performance to common reference models (i.e. Rashomon Sets), can be connected into networks that provide deeper insights into variable interactions and into how latent variables contribute to the Rashomon Effect. 

Keywords

SWAG: Sparse Wrapper Algorithm

Latent Representations

Replication Crisis

High-Dimensional Problems

Variable Importance

Explainable AI 

Co-Author(s)

Cesare Miglioli, Purdue University
Gaetan Bakalli, Emlyon Business School
Stephane Guerrier
Samuel Orso

First Author

Roberto Molinari, Auburn University

Presenting Author

Roberto Molinari, Auburn University

An augmented pseudo likelihood approach for multistate modeling in cross-sectional studies

Understanding the population-level natural history of chronic diseases is essential for developing effective targeted prevention strategies. Intensity-based multistate analyses estimate transition rates between disease states but often rely on longitudinal data, which may be impractical when resource constraints limit data collection to a cross-sectional sample from the target population. We propose an innovative framework for augmenting cross-sectional data from the target population with auxiliary longitudinal data from other populations with which transition intensity estimates can be identified and estimated. Importantly, this data augmentation approach facilitates the specification of semi-Markov models for a three-state process accommodating recurrent transitions between disease-free and diseased states. The method is evaluated through extensive simulation studies and we apply it to study the population-level natural history of cervical precancer using cross-sectional data from cervical cancer screening studies in two populations. 

Keywords

multistate current status data

portable natural history models

recurrent processes

data integration

semi-Markov models 

Co-Author(s)

Nicole Campos, Center for Health Decision Science, Harvard T.H. Chan School of Public Health
Li Cheung, National Cancer Institute

First Author

Fangya Mao

Presenting Author

Fangya Mao

Joint Probability Estimation of Many Binary Outcomes via Localized Adversarial Lasso

We estimate the probability of many (possibly dependent) binary outcomes, which is at the core of many applications. Without further conditions, the distribution of an M-dimensional binary vector is characterized by exponentially in M coefficients -- a high-dimensional problem even without covariates. Understanding the (in)dependence structure substantially improves the estimation as it allows an effective factorization of the distribution. To estimate this distribution, we leverage a Bahadur's representation connecting the sparsity of its coefficients with independence across components. We use regularized and adversarial regularized estimators, adaptive to the dependence structure, allowing rates of convergence to depend on the intrinsic (lower) dimension. We propose a locally penalized estimator the presence of (low dimensional) covariates, and provide rates of convergence addressing several challenges in the theoretical analyses when striving for making a computationally tractable formulation. We apply our results in estimating causal effects with multiple binary treatments and show how our estimators improve the finite sample performance compared with non-adaptive estimators. 

Keywords

many binary classification

penalized regression

adversarial lasso 

Co-Author

Matthew Harding, University of California Irvine

First Author

Alexandre Belloni, Duke University

Presenting Author

Yan Chen, Duke University

Lasso rates hold even after randomly discarding most of the predictors

In statistical learning, algorithms that fit a lasso over a set of basis functions can achieve desirable theoretical properties. For example, the Highly Adaptive Lasso (HAL) estimator applies the lasso to a very high-dimensional indicator basis, attaining dimension-free rates of convergence across a large class of functions. However, the time complexity of such algorithms is often exponential in the number of features, meaning they are too computationally intensive for most practical data analysis problems. In this work, we show that a lasso-based empirical risk minimizer over a growing set of basis functions retains its asymptotic rate even if fit only on a random, relatively small subset of the basis. Applying this idea, we propose RandomHAL: a fast approximation to the HAL estimator that retains its desirable properties, yet can be fit on datasets with many more features. The empirical performance of RandomHAL is evaluated using simulation experiments. 

Keywords

lasso

function estimation

empirical risk minimization

statistical learning

high-dimensional statistics 

Co-Author

Nima Hejazi, Harvard T.H. Chan School of Public Health

First Author

Salvador Balkus, Harvard T.H. Chan School of Public Health

Presenting Author

Salvador Balkus, Harvard T.H. Chan School of Public Health

A Statistical Test for SWAG: Assessing the Rashomon Effect in Sparse Model Selection

The Rashomon effect, introduced by Leo Breiman, highlights the existence of multiple plausible models explaining the same problem. The Sparse Wrapper Algorithm (SWAG) addresses this effect in high-dimensional settings by selecting sets of strong yet sparse models. However, as a heuristic method, its effectiveness requires formal validation. We propose a statistical testing framework to assess whether SWAG extracts informative models. Instead of testing individual models, we leverage the SWAG network and apply graph-theoretical tools to evaluate its informativeness. By comparing SWAG networks under null and alternative hypotheses, we examine whether selected models concentrate around significant variables through the use of network measures to quantify these differences. Using a bootstrap approach to approximate test statistic distributions, we confirm that entropies of eigenvector centrality and variable frequency are two strong candidates that differentiate the SWAG network under the null and the alternative hypotheses. This framework offers a foundation for inference on Rashomon sets and for interpreting model diversity. 

Keywords

SWAG

Multi-model selection

Statistical testing

Rashomon effect

Sparse models 

Co-Author

Roberto Molinari, Auburn University

First Author

Yagmur Yavuzozdemir

Presenting Author

Yagmur Yavuzozdemir

The Predictive Degrees of Freedom of LASSO

The double descent phenomenon observed in overparameterized machine learning models appears to defy classical prediction risk theory and has spurred considerable research. Recently, a notion of predictive model degrees of freedom (PDOF) has been proposed as an alternative measure of model complexity to explain the double descent phenomenon with a focus on linear modeling procedures. We extend PDOF to the nonlinear case by first studying the lasso model. The PDOF for lasso involves the covariance matrix of the lasso estimator, for which no closed-form expression exists. Furthermore, existing covariance matrix estimators only work in the under-parameterized case. To fill this gap, we explore two estimators: one based on the iterative soft-thresholding algorithm, and the other based on the infinitesimal jackknife. In a simulation study, we compare these estimators with bootstrap and other covariance matrix estimators based on approximate lasso solutions. Beyond lasso, the infinitesimal jackknife approach can be used to quantify the PDOF of other algorithmic models such as random forests and neural networks. 

Keywords

Double Descent

Infinitesimal Jackknife

Lasso

Model Degrees of Freedom

Variance Estimation 

Co-Author

Yoonkyung Lee, The Ohio State University

First Author

Xuerong Wang, The Ohio State University

Presenting Author

Xuerong Wang, The Ohio State University

Tree-guided equi-sparisty pursuit for high-dimensional regression and classification

In high-dimensional linear models, sparsity is often assumed to control variability and improve model performance. Equi-sparsity, where one assumes that predictors can be aggregated into groups sharing the same effects, is an alternative parsimonious structure that may be more suitable for many applications. Previous work has also shown a benefit of such structures for prediction in the presence of "rare features". This paper proposes a tree-guided penalty for simultaneous estimation and group aggregation. Unlike existing methods, our estimator avoids the overparametrization and the unfair group selection problem therein. We provide a closed-form solution to the proximal operator, allowing for efficient computation despite hierarchically overlapped groups. Novel techniques are developed to study the finite-sample error bound of this seminorm-induced penalty under least squares and binomial deviance losses. Compared to existing methods, the proposed approach is often more favorable in high-dimensional settings, as verified by extensive simulation studies. The method is further illustrated with application in microbiome data where we conduct post-selection inference on group effects. 

Keywords

feature aggregation

equi-sparsity

tree-guided regularization

high-dimensional linear models

post-selection inference

proximal operator 

Co-Author(s)

Aaron Molstad, University of Minnesota
Hui Zou, University of Minnesota

First Author

Jinwen Fu

Presenting Author

Jinwen Fu