New developments in hypothesis testing

Jacqueline Johnson Chair
SAS Institute
 
Tuesday, Aug 5: 2:00 PM - 3:50 PM
4132 
Contributed Papers 
Music City Center 
Room: CC-101C 

Main Sponsor

IMS

Presentations

A unifying framework for selective inference with applications in rank verification

Upon observing n-dimensional multivariate Gaussian data, when can we infer that the largest K observations came from the largest K means? When K=1 and the covariance is isotropic, Gutmann and Maymin (1987) argue that this inference is justified when the two-sided difference-of-means test comparing the largest and second largest observation rejects. After developing a unifying framework for selective inference centered on p-values, we provide a generalization of their procedure that applies for both any K and any covariance structure. We show that our procedure draws the desired inference whenever the two-sided difference-of-means test comparing the pair of observations inside and outside the top K with the smallest standardized difference rejects, and sometimes even when this test fails to reject. Using this insight, we argue that our procedure renders existing simultaneous inference approaches inadmissible when n>2. When the observations are independent (with possibly unequal variances) our procedure corresponds exactly to running the two-sided difference-of-means test comparing the pair of observations inside and outside the top K with the smallest standardized difference. 

Keywords

Selective inference

Winner's curse

Rank Verification

Publication bias

Data carving

Conditional inference 

Abstracts


First Author

Anav Sood, Stanford University

Presenting Author

Anav Sood, Stanford University

Aggregating Dependent Signals with Heavy-Tailed Combination Tests

Combining dependent p-values poses a long-standing challenge in statistical inference, particularly when aggregating findings from multiple methods to enhance signal detection. Recently, p-value combination tests based on regularly varying-tailed distributions, such as the Cauchy combination test and harmonic mean p-value, have attracted attention for their robustness to unknown dependence. This paper provides a theoretical and empirical evaluation of these methods under an asymptotic regime where the number of p-values is fixed and the global test significance level approaches zero. We examine two types of dependence among the p-values. First, when p-values are pairwise asymptotically independent, such as with bivariate normal test statistics with no perfect correlation, we prove that these combination tests are asymptotically valid. However, they become equivalent to the Bonferroni test as the significance level tends to zero for both one-sided and two-sided p-values. Empirical investigations suggest that this equivalence can emerge at moderately small significance levels. Second, under pairwise asymptotic dependence, such as with bivariate t-distributed test statistics, these combination tests can remain valid with certain choices of heavy-tailed distributions and exhibit notable power gains over Bonferroni, even as the significance level diminishes.  

Keywords

Cauchy combination test

Dependent p-values combination

Quasi-asymptotic independence

t-copula 

View Abstract 1511

Co-Author(s)

Tiantian Mao, The University of Science and Technology of China
Yuchao Jiang, Texas A&M University
Jingshu Wang
Ruodu Wang, University of Waterloo

First Author

Lin Gui

Presenting Author

Lin Gui

Approximation methods for global hypothesis testing with discrete and non independent p-values

Mathematically sound and straightforward and computationally inexpensive methods for combining continuous and independent p-values are the building block of global hypothesis testing, with the seminal work of Tukey, Fisher, Pearson, among others, still being widely applied by practitioners almost a century after their initial proposal. Discrete and/or dependent p-values, on the other hand, present a wide arrange of mathematical issues that make their mathematical modelling less straightforward, with contemporary proposals like the Cauchy Combination Statistic, the Gfisher statistic and the Wasserstein projection method having adressed either of these issues separately. This presentation proposes a bridge between the Cauchy Combination Statistic and the closest-to-continuous Wasserstein projection that allows for combining discrete and exchangeable p-values with accurately asymptotic type I error control. Statistical power and further mathematical properties are demonstrated via extensive simulation studies, with particular focus on contingency table data spanning from highly unbalanced case-control studies. 

Keywords

Global hypotesis testing

Approximation Theory

Cauchy Combination Statistic

Discrete and dependent p-values

Wasserstein Metric

Case-control studies 

View Abstract 2295

First Author

Gonzalo Contador

Presenting Author

Gonzalo Contador

Boosting e-BH via conditional calibration

The e-BH procedure is an e-value-based multiple testing procedure that provably controls the false discovery rate (FDR) under any dependence structure between the e-values. Despite this appealing theoretical FDR control guarantee, the e-BH procedure often suffers from low power in practice. In this paper, we propose a general framework that boosts the power of e-BH without sacrificing its FDR control under arbitrary dependence. This is achieved by the technique of conditional calibration, where we take as input the e-values and calibrate them to be a set of "boosted e-values" that are guaranteed to be no less (and are often more) powerful than the original ones. Our general framework is explicitly instantiated in three classes of problems: (1) testing under parametric models, (2) conditional independence testing under the model-X setting, and (3) model-free conformal selection. Numerical experiments show that our proposal significantly improves the power of e-BH while continuing to control the FDR. We also demonstrate the effectiveness of our method through an application to an observational study dataset for identifying individuals whose counterfactuals satisfy certain properties. 

Keywords

e-values

multiple testing

variable selection

conformal inference

novelty detection

model-X knockoffs 

View Abstract 1999

Co-Author

Zhimei Ren

First Author

Junu Lee, University of Pennsylvania

Presenting Author

Junu Lee, University of Pennsylvania

Testing multivariate normality using angular skewness and kurtosis processes

Given a sample X_1, ... , X_n from a common distribution P in R^d, d ≥ 2, we develop a method to test multivariate normality based on two random processes S_n and K_n indexed by the unit sphere S^{d-1}, which stand respectively for the skewness and kurtosis of linear combinations. We show that the limit processes are Gaussian and can be represented as random finite linear combinations of the spherical harmonics. We consider test statistics based on the supremums of these processes and numerically obtain their limit distributions. We also show that the Bayesian bootstrap can consistently estimate the cutoffs. We obtain the limiting power of the test under contiguous alternative hypotheses. Through an extensive simulation study, we show that our proposed method performs well for moderate and large sample sizes. 

Keywords

multivariate normality test

skewness and kurtosis

stochastic processes

spherical harmonics

emprical processes

Bayesian bootstrap 

View Abstract 1785

Co-Author

Subhashis Ghoshal, North Carolina State University

First Author

Jisu Oh, North Carolina State University

Presenting Author

Jisu Oh, North Carolina State University

Testing Random Effects for Binomial Data: Minimax Goodness-of-Fit Testing and Meta-analyses

In many modern scientific investigations, researchers conduct numerous small-scale studies with few participants. Since individual participant outcomes can be difficult to interpret, combining data across studies via random effects has become standard practice for drawing broader scientific conclusions. In this talk, we introduce an optimal methodology for testing properties of random effects arising from binomial counts. Using the minimax framework, we characterize how the worst-case power of the best Goodness-of-fit test depends on the number of studies and participants. Interestingly, the optimal test is related to a debiased version of Pearson's chi-squared test.

We then turn to meta-analyses, where a central question is to determine whether multiple studies agree on a treatment's effectiveness before pooling all data. We show how the difficulty of this problem depends on the underlying effect size and demonstrate that a debiased version of Cochran's chi-squared test is minimax-optimal. Finally, we illustrate how the proposed methodology improves the construction of p-values and confidence intervals for assessing the safety of drugs associated with rare adverse outcomes. 

Keywords

hypothesis testing

meta-analysis

local minimax

critical separation

Wasserstein distance

homogeneity testing 

View Abstract 2152

Co-Author(s)

Sivaraman Balakrishnan, Carnegie Mellon University
Larry Wasserman, Carnegie Mellon University

First Author

Lucas Kania, Carnegie Mellon University

Presenting Author

Lucas Kania, Carnegie Mellon University