Active multiple testing with proxy p-values and e-values

Catherine Wang Co-Author
Carnegie Mellon University
 
Kathryn Roeder Co-Author
Carnegie Mellon University
 
Larry Wasserman Co-Author
Carnegie Mellon University
 
Aaditya Ramdas Co-Author
Carnegie Mellon University
 
Ziyu Xu First Author
 
Ziyu Xu Presenting Author
 
Monday, Aug 4: 8:35 AM - 8:40 AM
2704 
Contributed Speed 
Music City Center 
Researchers often lack the resources to test every hypothesis of interest directly or compute
test statistics comprehensively, but often possess auxiliary data from which we can compute
an estimate of the experimental outcome. We introduce a novel approach for selecting which
hypotheses to query a statistic in a hypothesis testing setup by leveraging estimates to compute proxy statistics. Our framework allows a scientist to
propose a proxy statistic, and then query the true statistic with some probability based on
the value of the proxy. We make no assumptions about how the proxy is derived and it can be
arbitrarily dependent with the true statistic. If the true statistic is not queried, the proxy is used
in its place. We characterize "active" methods that produce valid p-values and e-values in this
setting and utilize this framework to create procedures with false
discovery rate (FDR) control. Through simulations and real data analysis of causal effects in
scCRISPR screen experiments, we empirically demonstrate that our proxy framework has both
high power and low resource usage when our proxies are accurate estimates of the respective true statistics.

Keywords

multiple testing

e-values

false discovery rate (FDR)

active sampling 

Main Sponsor

Section on Statistical Learning and Data Science