Print Close

Efficient clustering of microbiome compositional data using mixtures of logistic normal multinomial models and their extensions

Presented During: Better clustering leads to better understanding: genes, microbes, or foods show risk subgroups

Yuan Fang Speaker
Binghamton University

Thursday, Aug 7: 9:00 AM - 9:25 AM
Invited Paper Session

Music City Center

Microbiome taxa count data, derived from next-generation sequencing, are inherently high-dimensional, over-dispersed, and reveal only relative abundance, making them compositional and constrained to a simplex. To model such data, the logistic normal multinomial (LNM) approach transforms relative abundances from a simplex to real Euclidean space using the additive log-ratio transformation. We have developed mixtures of LNM models for clustering microbiome data, adopting an efficient framework for parameter estimation using variational approximations to reduce the computational overhead. In this talk, we will illustrate that the LNM mixture models provide a flexible framework, which can be easily adopted by assuming different data structures and distributions at the hidden layer latent space. Specifically, we present a matrix-LNM model that introduces a matrix variate normal distribution at the latent layer, designed for time-coursed microbiome data. This approach captures both temporal dependencies and inter-sample correlations, offering a structured approach to longitudinal microbiome analysis. In addition, a family of models is also proposed by incorporating the modified Cholesky decomposition and imposing constraints on the components of the covariance matrix. Through simulation studies and real data analysis, we demonstrate the model's effectiveness in identifying dynamic patterns and clustering temporal microbiome profiles.

Keywords

Microbiome data

Model-based clustering

Matrix-variate normal