Cn-RNN: a Supervised Learning Framework for CNV Detection with Sequencing Data

Wenhan Bao Co-Author
 
Fei Qin Co-Author
 
Feifei Xiao Co-Author
University of Florida
 
Dayuan Wang First Author
 
Dayuan Wang Presenting Author
 
Tuesday, Aug 5: 10:05 AM - 10:20 AM
1653 
Contributed Papers 
Music City Center 
Copy number variants (CNVs), involving genomic duplications/deletions, play a critical role in various human diseases. Accurate CNV detection is essential but challenging due to high dimensionality, technical biases, and low signal-to-noise ratios, leading to inconsistent calls and high false positives. Existing deep learning-based methods employ Convolutional Neural Networks (CNNs), which rely on image-based recognition and are prone to domain shifting problems. Also, accurate supervised learning required a large and validated variant set to differentiate CNV predictions from false positives.
Therefore, we developed a novel deep learning model, cn-RNN, for copy number estimation with sequencing data using Recurrent Neural Networks (RNNs). Unlike CNNs, RNNs inherently preserve the sequential structure of genomic data, enabling more accurate and biologically meaningful processing of sequencing data. Besides, we used a publicly available trio dataset to construct a large high-confidence CNV training set. Compared to CNN-based methods, cn-RNN achieved a 20% higher F1-score with significantly fewer false positives. Our work enables more reliable CNV detection with sequencing data.

Keywords

Copy Number Variants (CNV) Detection

Recurrent Neural Networks (RNN)

Supervised Learning

Statistical Genetics

Deep Learning in Genomics 

Main Sponsor

Section on Statistics in Genomics and Genetics