Mind your zeros: accurate p-value approximation in permutation testing

Martin Depner Co-Author
Institute of Asthma and Allergy Prevention , Helmholtz Zentrum München, Munich
 
Erika von Mutius Co-Author
Institute of Asthma and Allergy Prevention, Helmholtz Zentrum München
 
Anne-Laure Boulesteix Co-Author
Institute for Medical Information Processing, Biometry and Epidemiology, LMU München
 
Christian L. Müller Co-Author
helmholtz Munich
 
Stefanie Peschel First Author
Munich Center for Machine Learning, Munich, Germany
 
Stefanie Peschel Presenting Author
Munich Center for Machine Learning, Munich, Germany
 
Tuesday, Aug 6: 9:30 AM - 9:35 AM
3424 
Contributed Speed 
Oregon Convention Center 
Permutation procedures are essential for hypothesis testing when the distributional assumptions about the considered test statistic are not met or unknown, but are challenging in scenarios with limited permutations, such as complex biomedical studies. P-values may either be zero, making multiple testing adjustment problematic, or too large to remain significant after adjustment. A common heuristic solution is to approximate extreme p-values by fitting a Generalized Pareto Distribution (GPD) to the tail of the distribution of the permutation test statistics. In practice, an estimated negative shape parameter combined with extreme test statistics can again result in zero p-values. To address this issue, we present a comprehensive workflow for accurate permutation p-value approximation that fits a constrained GPD and strictly avoids zero p-values. We also propose new methods that address the challenges of determining an optimal GPD threshold and correcting for multiple testing. Through extensive simulations, our approach demonstrates considerably higher accuracy than existing methods. The computational framework will be available as the open-source R package "permAprox".

Keywords

Non-parametric hypothesis testing

Permutation test

Generalized Pareto Distribution (GPD)

Multiple testing correction

Differential abundance and differential association testing in microbiome studies

R package 

Main Sponsor

Section on Nonparametric Statistics