Tuesday, Aug 5: 2:00 PM - 3:50 PM
4132
Contributed Papers
Music City Center
Room: CC-101C
Main Sponsor
IMS
Presentations
Upon observing n-dimensional multivariate Gaussian data, when can we infer that the largest K observations came from the largest K means? When K=1 and the covariance is isotropic, Gutmann and Maymin (1987) argue that this inference is justified when the two-sided difference-of-means test comparing the largest and second largest observation rejects. After developing a unifying framework for selective inference centered on p-values, we provide a generalization of their procedure that applies for both any K and any covariance structure. We show that our procedure draws the desired inference whenever the two-sided difference-of-means test comparing the pair of observations inside and outside the top K with the smallest standardized difference rejects, and sometimes even when this test fails to reject. Using this insight, we argue that our procedure renders existing simultaneous inference approaches inadmissible when n>2. When the observations are independent (with possibly unequal variances) our procedure corresponds exactly to running the two-sided difference-of-means test comparing the pair of observations inside and outside the top K with the smallest standardized difference.
Keywords
Selective inference
Winner's curse
Rank Verification
Publication bias
Data carving
Conditional inference
Abstracts
First Author
Anav Sood, Stanford University
Presenting Author
Anav Sood, Stanford University
Combining dependent p-values poses a long-standing challenge in statistical inference, particularly when aggregating findings from multiple methods to enhance signal detection. Recently, p-value combination tests based on regularly varying-tailed distributions, such as the Cauchy combination test and harmonic mean p-value, have attracted attention for their robustness to unknown dependence. This paper provides a theoretical and empirical evaluation of these methods under an asymptotic regime where the number of p-values is fixed and the global test significance level approaches zero. We examine two types of dependence among the p-values. First, when p-values are pairwise asymptotically independent, such as with bivariate normal test statistics with no perfect correlation, we prove that these combination tests are asymptotically valid. However, they become equivalent to the Bonferroni test as the significance level tends to zero for both one-sided and two-sided p-values. Empirical investigations suggest that this equivalence can emerge at moderately small significance levels. Second, under pairwise asymptotic dependence, such as with bivariate t-distributed test statistics, these combination tests can remain valid with certain choices of heavy-tailed distributions and exhibit notable power gains over Bonferroni, even as the significance level diminishes.
Keywords
Cauchy combination test
Dependent p-values combination
Quasi-asymptotic independence
t-copula
Mathematically sound and straightforward and computationally inexpensive methods for combining continuous and independent p-values are the building block of global hypothesis testing, with the seminal work of Tukey, Fisher, Pearson, among others, still being widely applied by practitioners almost a century after their initial proposal. Discrete and/or dependent p-values, on the other hand, present a wide arrange of mathematical issues that make their mathematical modelling less straightforward, with contemporary proposals like the Cauchy Combination Statistic, the Gfisher statistic and the Wasserstein projection method having adressed either of these issues separately. This presentation proposes a bridge between the Cauchy Combination Statistic and the closest-to-continuous Wasserstein projection that allows for combining discrete and exchangeable p-values with accurately asymptotic type I error control. Statistical power and further mathematical properties are demonstrated via extensive simulation studies, with particular focus on contingency table data spanning from highly unbalanced case-control studies.
Keywords
Global hypotesis testing
Approximation Theory
Cauchy Combination Statistic
Discrete and dependent p-values
Wasserstein Metric
Case-control studies
The e-BH procedure is an e-value-based multiple testing procedure that provably controls the false discovery rate (FDR) under any dependence structure between the e-values. Despite this appealing theoretical FDR control guarantee, the e-BH procedure often suffers from low power in practice. In this paper, we propose a general framework that boosts the power of e-BH without sacrificing its FDR control under arbitrary dependence. This is achieved by the technique of conditional calibration, where we take as input the e-values and calibrate them to be a set of "boosted e-values" that are guaranteed to be no less (and are often more) powerful than the original ones. Our general framework is explicitly instantiated in three classes of problems: (1) testing under parametric models, (2) conditional independence testing under the model-X setting, and (3) model-free conformal selection. Numerical experiments show that our proposal significantly improves the power of e-BH while continuing to control the FDR. We also demonstrate the effectiveness of our method through an application to an observational study dataset for identifying individuals whose counterfactuals satisfy certain properties.
Keywords
e-values
multiple testing
variable selection
conformal inference
novelty detection
model-X knockoffs
Given a sample X_1, ... , X_n from a common distribution P in R^d, d ≥ 2, we develop a method to test multivariate normality based on two random processes S_n and K_n indexed by the unit sphere S^{d-1}, which stand respectively for the skewness and kurtosis of linear combinations. We show that the limit processes are Gaussian and can be represented as random finite linear combinations of the spherical harmonics. We consider test statistics based on the supremums of these processes and numerically obtain their limit distributions. We also show that the Bayesian bootstrap can consistently estimate the cutoffs. We obtain the limiting power of the test under contiguous alternative hypotheses. Through an extensive simulation study, we show that our proposed method performs well for moderate and large sample sizes.
Keywords
multivariate normality test
skewness and kurtosis
stochastic processes
spherical harmonics
emprical processes
Bayesian bootstrap
Co-Author
Subhashis Ghoshal, North Carolina State University
First Author
Jisu Oh, North Carolina State University
Presenting Author
Jisu Oh, North Carolina State University
In many modern scientific investigations, researchers conduct numerous small-scale studies with few participants. Since individual participant outcomes can be difficult to interpret, combining data across studies via random effects has become standard practice for drawing broader scientific conclusions. In this talk, we introduce an optimal methodology for testing properties of random effects arising from binomial counts. Using the minimax framework, we characterize how the worst-case power of the best Goodness-of-fit test depends on the number of studies and participants. Interestingly, the optimal test is related to a debiased version of Pearson's chi-squared test.
We then turn to meta-analyses, where a central question is to determine whether multiple studies agree on a treatment's effectiveness before pooling all data. We show how the difficulty of this problem depends on the underlying effect size and demonstrate that a debiased version of Cochran's chi-squared test is minimax-optimal. Finally, we illustrate how the proposed methodology improves the construction of p-values and confidence intervals for assessing the safety of drugs associated with rare adverse outcomes.
Keywords
hypothesis testing
meta-analysis
local minimax
critical separation
Wasserstein distance
homogeneity testing