53 Estimating association of CNVs using penalized regression with Lasso and weighted fusion penalties

Wenbin Lu Co-Author
North Carolina State University
 
Albert Tucci Co-Author
 
Hui Wang Co-Author
Perelman School of Medicine, University of Pennsylvania
 
Yuhuan Cheng Co-Author
 
Li-San Wang Co-Author
Perelman School of Medicine, University of Pennsylvania
 
Gerard Schellenberger Co-Author
Perelman School of Medicine, University of Pennsylvania
 
Wan-Ping Lee Co-Author
Perelman School of Medicine, University of Pennsylvania
 
Jung-Ying Tzeng Co-Author
North Carolina State University
 
Yaqin Si First Author
 
Yaqin Si Presenting Author
 
Tuesday, Aug 6: 10:30 AM - 12:20 PM
3306 
Contributed Posters 
Oregon Convention Center 
CNVs are DNA gains or losses involving ≥50 base pairs. Estimating CNV association effects requires considering a few factors, e.g., 1) variations in CNV dosage and length need to be accounted for; and 2) all CNVs in a genomic region should be jointly assessed. Here we propose a penalized regression model for CNV association analysis. We model an individual's CNVs as a piecewise constant curve to naturally capture CNV length and dosage. To jointly model all CNVs in a genomic region, we use Lasso penalty to select CNVs associated with the outcome and integrate a weighted fusion penalty to encourage similar effects of adjacent CNVs when supported by the data. Our simulations show that the proposed model can more effectively identify causal CNVs without introducing additional false positives compared to the baseline methods (Lasso and gBridge); and yield more precise effect size estimation in different simulation settings. In the real data application to identify CNVs associated with Alzheimer's Disease (AD), the CNVs identified by our methods overlap genes that are significantly enriched in pathways related to neuron structure and neuron function and yield higher predictive accuracy.

Keywords

Penalized Regression

Association

Weighted Fusion

Lasso

Effect estimation

Copy number variants 

Abstracts


Main Sponsor

Section on Statistics in Genomics and Genetics