Microbiome Data Integration via Shared Dictionary Learning
Monday, Aug 4: 11:50 AM - 12:05 PM
0991
Contributed Papers
Music City Center
Data integration is a powerful tool for facilitating a comprehensive understanding of microbial communities and their association with outcomes of interest. However, integrating data sets from different studies remains a challenging problem because of severe batch effects, unobserved confounding variables, and high heterogeneity across data sets. We propose a new data integration method called MetaDICT, which initially estimates the batch effects by weighting methods in causal inference literature and then refine the estimation via a novel shared dictionary learning. Compared with existing methods, MetaDICT can better avoid the overcorrection of batch effects and preserve biological variation when there exist unobserved confounding variables or data sets are highly heterogeneous across studies. Applications to synthetic and real microbiome data sets demonstrate the robustness and effectiveness of MetaDICT in integrative analysis. Using MetaDICT, we characterize microbial interaction, identify generalizable microbial signatures, and enhance the accuracy of disease prediction in an integrative analysis of colorectal cancer metagenomics studies.
data integration
shared dictionary learning
batch effect
microbiome
embedding
Main Sponsor
Section on Statistics in Genomics and Genetics
You have unsaved changes.