Clustering high dimensional RNA-seq data
Thursday, Aug 7: 8:35 AM - 9:00 AM
Invited Paper Session
Music City Center
Multivariate count data are commonly encountered in bioinformatics. Although the Poisson distribution seems a natural fit to these count data, its multivariate extension is computationally expensive. Recently, mixtures of multivariate Poisson lognormal (MPLN) models have been used to analyze these multivariate count measurements. In the MPLN model, the counts, conditional on the latent variable, are modelled using a Poisson distribution and the latent variable comes from a multivariate Gaussian distribution. Due to this hierarchical structure, the MPLN model can account for over-dispersion as opposed to the traditional Poisson distribution and allows for correlation between the variables. Here, we extend the mixture of multivariate Poisson-log normal distributions for clustering high dimensional RNA-seq data by incorporating a factor analyzer structure in the latent space. A family of parsimonious mixtures of multivariate Poisson log-normal distributions are proposed by decomposing the covariance matrix and imposing constraints on these decompositions. Application on simulated data sets as well as a real data set is presented.
Model-based clustering
RNA-seq data
Mixture models
multivariate Poisson lognormal distribution
You have unsaved changes.