Identifiable Nonlinear Group Factor Analysis

Gemma Moran Speaker
Rutgers University
 
Thursday, Aug 7: 11:35 AM - 11:55 AM
Topic-Contributed Paper Session 
Music City Center 
Given data from multiple groups or environments, one goal is to understand which underlying factors of variation are common to all groups, and which factors are group-specific. As a particular example, we consider platelet gene expression data from patients in different disease groups. In this data, factors correspond to clusters of genes which are expressed together; we may expect some clusters (or biological pathways) to be active in all diseases, while some are only active in a specific disease. To learn these factors, we consider a nonlinear multi-group factor model, which allows for both shared and specific factors. To fit this model, we propose a multi-group sparse variational autoencoder. The underlying model is sparse in that each observed feature (i.e. each dimension of the data) depends on a small subset of the latent factors. In the genomics example, this means each gene is active in only a few biological processes. Further, the model implicitly induces a penalty on the number of latent factors, which helps separate the shared factors from the group-specific factors. We prove that the shared factors are identified, and demonstrate our method recovers meaningful factors in the platelet gene expression data.