Structurally aware robust rank selection for probabilistic matrix factorization
Tuesday, Aug 5: 8:55 AM - 9:15 AM
Topic-Contributed Paper Session
Music City Center
Matrix factorization models (e.g., factor analysis, PCA, and nonnegative matrix factorization) are widely used to find latent structure in data by assuming the data-generating model parameters can be expressed as the product of two low-rank matrices. Typically, the rank K of the matrices is interpreted as the number of "processes" or "activities" that generated the observed data. Thus, in practice, determining K is a critical inferential step for scientific understanding. However, because the assumed observation model is only an approximation to the true data-generating process, as the number of observations increases, rather than obtaining better inferences, the opposite occurs: the data is explained by adding spurious new "activities" that compensate for the shortcomings of the observation model. However, there are two important sources of prior knowledge that we can exploit to obtain well-defined results no matter the dataset size: known causal structure (e.g., knowing that the latent activities cause the observed signal but not vice-versa) and a rough sense of how wrong the observation model is (e.g., based on small amounts of expert-labeled data or some understanding of the data-generating process). We propose a new model selection criteria that, while model-based, uses this available knowledge to obtain inferences about the rank K that are robust to misspecification of the observation model. We provide theoretical support for our approach by proving a consistency result under intuitive assumptions. Numerical experiments demonstrate our model selection criteria consistently finds an appropriate number of latent activities in two applications: mutational signature discovery and hyper spectral unmixing.
Model selection
Mutational signature discovery
Nonnegative matrix factorization
Probabilistic matrix factorization
Misspecified model
You have unsaved changes.