Microbiome Data Integration via Shared Dictionary Learning

Shulei Wang Co-Author
 
Bo Yuan First Author
 
Bo Yuan Presenting Author
 
Monday, Aug 4: 11:50 AM - 12:05 PM
0991 
Contributed Papers 
Music City Center 
Data integration is a powerful tool for facilitating a comprehensive understanding of microbial communities and their association with outcomes of interest. However, integrating data sets from different studies remains a challenging problem because of severe batch effects, unobserved confounding variables, and high heterogeneity across data sets. We propose a new data integration method called MetaDICT, which initially estimates the batch effects by weighting methods in causal inference literature and then refine the estimation via a novel shared dictionary learning. Compared with existing methods, MetaDICT can better avoid the overcorrection of batch effects and preserve biological variation when there exist unobserved confounding variables or data sets are highly heterogeneous across studies. Applications to synthetic and real microbiome data sets demonstrate the robustness and effectiveness of MetaDICT in integrative analysis. Using MetaDICT, we characterize microbial interaction, identify generalizable microbial signatures, and enhance the accuracy of disease prediction in an integrative analysis of colorectal cancer metagenomics studies.

Keywords

data integration

shared dictionary learning

batch effect

microbiome

embedding 

Main Sponsor

Section on Statistics in Genomics and Genetics