Using sufficiency and sparsity for more powerful controlled variable selection in the linear model

Lucas Janson Co-Author
Harvard University
 
Souhardya Sengupta First Author
Harvard University
 
Souhardya Sengupta Presenting Author
Harvard University
 
Wednesday, Aug 6: 3:05 PM - 3:20 PM
1062 
Contributed Papers 
Music City Center 
We show that for the problem of controlled variable selection in the Gaussian linear
model, informative and valid weights (for weighted multiple testing) can be derived
entirely from sufficient statistics and a belief in sparsity using only the data itself and
no external quantitative side information. This idea results in new procedures with
strict guarantees on the (unweighted) familywise error rate or false discovery rate and
that are more powerful than existing methods when the model is sparse. A naive
implementation of our idea is computationally intensive, so we propose computational
improvements that maintain strict validity while having little impact on the power.
We show that the same idea extends asymptotically to any setting with a Gaussian
limit and consistently estimable covariance matrix, such as any M-estimation problem.
We demonstrate the performance of our methods in simulations and an application to
HIV drug resistance.

Keywords

Variable selection

Weighted multiple testing

Sparsity

Familywise error rate

False discovery rate 

Main Sponsor

Section on Statistical Learning and Data Science