Print Close

Active multiple testing with proxy p-values and e-values

Presented During: SPEED 2: Statistical Design of Experiments and Sample Size Considerations, Part 1

Catherine Wang Co-Author
Carnegie Mellon University

Kathryn Roeder Co-Author
Carnegie Mellon University

Larry Wasserman Co-Author
Carnegie Mellon University

Aaditya Ramdas Co-Author
Carnegie Mellon University

Ziyu Xu First Author

Ziyu Xu Presenting Author

Monday, Aug 4: 8:35 AM - 8:40 AM
2704
Contributed Speed

Music City Center

Researchers often lack the resources to test every hypothesis of interest directly or compute
test statistics comprehensively, but often possess auxiliary data from which we can compute
an estimate of the experimental outcome. We introduce a novel approach for selecting which
hypotheses to query a statistic in a hypothesis testing setup by leveraging estimates to compute proxy statistics. Our framework allows a scientist to
propose a proxy statistic, and then query the true statistic with some probability based on
the value of the proxy. We make no assumptions about how the proxy is derived and it can be
arbitrarily dependent with the true statistic. If the true statistic is not queried, the proxy is used
in its place. We characterize "active" methods that produce valid p-values and e-values in this
setting and utilize this framework to create procedures with false
discovery rate (FDR) control. Through simulations and real data analysis of causal effects in
scCRISPR screen experiments, we empirically demonstrate that our proxy framework has both
high power and low resource usage when our proxies are accurate estimates of the respective true statistics.

Keywords

multiple testing

e-values

false discovery rate (FDR)

active sampling

Main Sponsor

Section on Statistical Learning and Data Science