Efficient mixed model association test for nonlinear effects in large-scale human genetic studies

Han Chen Co-Author
The University of Texas Health Science Center at Houston
 
Han Chen Speaker
The University of Texas Health Science Center at Houston
 
Tuesday, Aug 5: 11:15 AM - 11:35 AM
Topic-Contributed Paper Session 
Music City Center 
Generalized linear mixed model (GLMM) based genetic association tests have been widely applied in human genetic studies with related individuals to identify genetic variants associated with complex diseases and quantitative traits. In recent years, efficient GLMM-based tests have been implemented in the genome-wide association studies (GWAS) from large biobank-scale cohorts with hundreds of thousands of individuals, such as the UK Biobank and All of Us. These methods and software programs often assume an additive coding scheme for bi-allelic genetic variants such as single nucleotide polymorphisms (SNPs). While additive coding is convenient and computationally efficient, the linearity assumption may be violated for multi-allelic genetic variations, such as structural variants (SVs), copy number variations (CNVs), and tandem repeats (TRs). In the presence of nonlinear genetic effects, GLMM-based tests with additive coding may suffer from substantial power loss, especially when the linear effects are weak. Here we develop a generalized additive mixed model (GAMM) based framework for testing nonlinear genetic effects using smoothing splines for multi-allelic genetic variations. To improve the computational efficiency, instead of fitting a separate GAMM for each genetic variant, we only fit a null model without any genetic effects once in a GWAS. We then develop a variance component score test for nonlinear effects after projecting out linear effects, as well as a joint test for linear and nonlinear genetic effects. Assuming a sparse kinship matrix for modeling sample relatedness with a bounded maximum cluster size, and a limited number of observed alleles for each genetic variant, the computational complexity scales linearly with both the sample size and the number of variants in a GWAS. We perform simulation studies to evaluate type I error control and power gain of GAMM-based association tests compared to GLMM-based tests, in the presence of nonlinear genetic effects for multi-allelic genetic variations. We also illustrate the new method in a real data example by performing GWAS on TRs.