02. A hierarchical Bayesian model for the identification and removal of technical length isomiRs in miRNA sequencing data

Conference: Women in Statistics and Data Science 2024
10/17/2024: 11:45 AM - 1:15 PM EDT
Speed 

Description

MicroRNAs (miRNAs) are small, single-stranded non-coding RNA molecules with important gene regulatory function. MiRNA biogenesis is a multi-step process, and certain steps of the pathway, such as cleavage by Drosha and Dicer, can result in miRNA isoforms that differ from the canonical miRNA sequence in nucleotide sequence and/or length. These miRNA isomers, called isomiRs, which may differ from the canonical sequence by as few as one or two nucleotides, can have different mRNA targets and stability than the corresponding canonical miRNA. As the body of research demonstrating the role of isomiRs in disease grows, the need for differential expression analysis of miRNA data at scale finer than miRNA-level grows too. Unfortunately, errors during the amplification and sequencing processes can result in technical miRNA isomiRs identical to biological isomiRs, making resolving variation at this scale challenging. We present a novel algorithm for the identification and correction of technical miRNA length variants in miRNA sequencing data. The algorithm assumes that the transformed degradation rate of canonical miRNA sequences in a sample follows a hierarchical normal Bayesian model. The algorithm then draws from the posterior predictive distribution and constructs 95% posterior predictive intervals to determine if the observed counts of degraded sequences are consistent with our error model. We present the theory underlying the model and assess the performance of the model using an experimental benchmark data set.

Presenting Author

Hannah Swan, University of Rochester School of Medicine and Dentistry

First Author

Hannah Swan, University of Rochester School of Medicine and Dentistry

Target Audience

Mid-Level

Tracks

Knowledge
Women in Statistics and Data Science 2024