Testing a large number of composite null hypotheses for mediation, pleiotropy, and replication analyses in genome-wide studies

Ryan Sun Co-Author
University of Texas, MD Anderson Cancer Center
 
Zachary McCaw Co-Author
Harvard School of Public Health
 
Xihong Lin Co-Author
Harvard T.H. Chan School of Public Health
 
Ryan Sun Speaker
University of Texas, MD Anderson Cancer Center
 
Wednesday, Aug 6: 8:35 AM - 9:00 AM
Invited Paper Session 
Music City Center 
Causal mediation, pleiotropy, and replication analyses are three highly popular genetic study designs. Although these analyses address different scientific questions, the underlying statistical inference problems all involve large-scale testing of composite null hypotheses. The goal is to determine whether all null hypotheses—as opposed to at least one—in a set of individual tests should simultaneously be rejected. Recently, various methods have been proposed for each of these situations, including an appealing two- group empirical Bayes approach that calculates local false discovery rates (lfdr). However, lfdr estimation is difficult due to the need for multivariate density estimation. Furthermore, the multiple testing rules for the empirical Bayes lfdr approach can disagree with conventional frequentist z-statistics, which is troubling for a field that ubiquitously uses summary statistics. This work proposes a framework to unify two-group testing in genetic association composite null settings, the conditionally symmetric multidimensional Gaussian mixture model (csmGmm). Crucially, the csmGmm offers interpretability guarantees by harmonizing lfdr and z-statistic testing rules. We apply the model to a collection of translational lung cancer genetic association studies that motivated this work.

Keywords

Composite null

Empirical Bayes

Mediation analysis

Pleiotropy

Replication analysis

Genome-wide association study