Print Close

10 - Batch effect in microbiome data

Conference: Women in Statistics and Data Science 2022

10/07/2022: 2:30 PM - 4:00 PM CDT
Speed

Room: Grand Ballroom Salon G

Description

Microbiome study have been gaining enormous popularity among scientist to characterize human health and disease. While many statistical analysis tools work well in most high-dimensional data similarly, such as gene-expression data, there is a need to pay attention to the compositionality in microbiome data meaning relative abundances based on taxon counts. With such data, reproducibility is difficult to achieve, we aim to examine the batch effect, i.e., systematic bias from datasets collected at different sites or times. In microbiome experiments, combining several data sets is often considered for the sake of statistical power, hoping to discover reliable biomarker and establish more robust prognostic models. The unique challenge in microbiome data, however, is the sum-to-one constraint, that is, the relative abundance is vulnerable to a different set of microbiotas from a different experiment. For example, certain transformation in Euclidean space is not robust to the sub-compositionality. Therefore, simply adding samples from a different subset of features is rather at the risk of misleading than gaining a power. In this talk, we aim to provide the helpful advice for the use of the statistical methods under the multi-batch situations including sub-compositionality, false-discovery rate and dependency among features.

Keywords

microbiome

high dimensional compositional data

batch effect

subcompositionaltiy

false discovery rate

reproducibility

Presenting Author

Jung Ae Lee, University of Massachusetts Chan Medical School

First Author

Jung Ae Lee, University of Massachusetts Chan Medical School

Target Audience

Mid-Level

Tracks

Knowledge

Women in Statistics and Data Science 2022