58: Sparse Bayesian Partially Identified Models Enhance Differential Abundance and Expression Analyses

Justin Silverman Co-Author
Penn State University
 
Won Gu First Author
 
Won Gu Presenting Author
 
Tuesday, Aug 5: 2:00 PM - 3:50 PM
1711 
Contributed Posters 
Music City Center 
In genomics, differential expression and abundance analyses are challenging due to the compositional structure of the data. These data only provide information about the relative abundance of taxa or the relative expression of genes and not absolute amounts. While many authors have approached this problem through data normalizations, we have shown that such methods are flawed as they imply strong, often implausible assumptions about total microbial load or total gene expression. Even slight errors in these assumptions often lead Type-I and/or II error rates in excess of 70%. Here, we show similar flaws with currently available sparse estimators, which attempt to overcome compositional problems by assuming few taxa (or genes) are changing in abundance (or expression) between conditions. Instead, we show that a novel sparse Bayesian Partially Identified Model overcomes the limitations of existing methods by accounting for uncertainty in the sparsity assumptions themselves. We prove the consistency of our novel estimator. Moreover, through both simulated and real data analysis, we show that our methods can drastically reduce Type-I and Type-II errors compared to existing methods.

Keywords

Compositional Data

Bayesian Partially Identified Model

Sparsity Assumption

Type-I and Type-II Errors

Uncertainty Quantification 

Main Sponsor

Section on Statistics in Genomics and Genetics