Lasso rates hold even after randomly discarding most of the predictors

Nima Hejazi Co-Author
Harvard T.H. Chan School of Public Health
 
Salvador Balkus First Author
Harvard T.H. Chan School of Public Health
 
Salvador Balkus Presenting Author
Harvard T.H. Chan School of Public Health
 
Wednesday, Aug 6: 9:20 AM - 9:35 AM
1652 
Contributed Papers 
Music City Center 
In statistical learning, algorithms that fit a lasso over a set of basis functions can achieve desirable theoretical properties. For example, the Highly Adaptive Lasso (HAL) estimator applies the lasso to a very high-dimensional indicator basis, attaining dimension-free rates of convergence across a large class of functions. However, the time complexity of such algorithms is often exponential in the number of features, meaning they are too computationally intensive for most practical data analysis problems. In this work, we show that a lasso-based empirical risk minimizer over a growing set of basis functions retains its asymptotic rate even if fit only on a random, relatively small subset of the basis. Applying this idea, we propose RandomHAL: a fast approximation to the HAL estimator that retains its desirable properties, yet can be fit on datasets with many more features. The empirical performance of RandomHAL is evaluated using simulation experiments.

Keywords

lasso

function estimation

empirical risk minimization

statistical learning

high-dimensional statistics 

Main Sponsor

Section on Statistical Learning and Data Science