Print Close

Using sufficiency and sparsity for more powerful controlled variable selection in the linear model

Presented During: Advances in Variable Selection

Lucas Janson Co-Author
Harvard University

Souhardya Sengupta First Author
Harvard University

Souhardya Sengupta Presenting Author
Harvard University

Wednesday, Aug 6: 3:05 PM - 3:20 PM
1062
Contributed Papers

Music City Center

We show that for the problem of controlled variable selection in the Gaussian linear
model, informative and valid weights (for weighted multiple testing) can be derived
entirely from sufficient statistics and a belief in sparsity using only the data itself and
no external quantitative side information. This idea results in new procedures with
strict guarantees on the (unweighted) familywise error rate or false discovery rate and
that are more powerful than existing methods when the model is sparse. A naive
implementation of our idea is computationally intensive, so we propose computational
improvements that maintain strict validity while having little impact on the power.
We show that the same idea extends asymptotically to any setting with a Gaussian
limit and consistently estimable covariance matrix, such as any M-estimation problem.
We demonstrate the performance of our methods in simulations and an application to
HIV drug resistance.

Keywords

Variable selection

Weighted multiple testing

Sparsity

Familywise error rate

False discovery rate

Main Sponsor

Section on Statistical Learning and Data Science