A multiple imputation method for compositional microbiome data

Michael Sohn Speaker
University of Rochester
 
Sunday, Aug 3: 4:25 PM - 4:45 PM
Topic-Contributed Paper Session 
Music City Center 
High sparsity (i.e., excessive zeros) in microbiome data is unavoidable and can significantly alter analysis results. However, efforts to address this high sparsity have been limited, in part because it is impossible to justify the validity of any such methods, as zeros in microbiome data can arise from multiple sources. In this study, we first demonstrate theoretically and empirically that treating all zeros as missing values is a more robust approach than treating them as structural zeros (i.e., true absence) or rounded zeros (i.e., undetected due to detection limit), when the source of zeros is unknown. We then introduce a novel multiple imputation method developed specifically for high-sparse, high-dimensional compositional data. The robustness of the proposed approach, along with its beneficial effects on downstream analyses, is demonstrated through extensive simulation studies. Finally, we reanalyzed a type II diabetes (T2D) dataset to determine differentially abundant species between T2D patients and non-diabetic controls.

Keywords

Excess zeros

Composition

High dimension

Microbiome

Multiple imputation