A Novel Statistical Method for Dynamic Topic Modeling
Qing Nie
Co-Author
University of California, Irvine: Mathematics; Developmental and Cell Biology
Annie Qu
Co-Author
University of California At Irvine
Hanjia Gao
First Author
University of California, Irvine
Hanjia Gao
Presenting Author
University of California, Irvine
Monday, Aug 4: 11:35 AM - 11:50 AM
1512
Contributed Papers
Music City Center
Topic modeling aims at extracting a low-rank semantic structure from a large corpus of text documents. Most existing methods fall into either the Latent Dirichlet Allocation (LDA) framework or the Probabilistic Latent Semantic Indexing (pLSI) framework. However, the underlying word co-occurrence pattern is often neglected. Motivated by this limitation, we have proposed a novel statistical method by incorporating word co-occurrence. Specifically, we use a hypergraph structure to model word interaction and use the node heterogeneity to model word frequency. Then, we learn a latent low-rank factorization of the hypergraph parameters to recover the topics. Moreover, our proposed method can be flexibly generalized for dynamic topic modeling of a sequence of corpora over multiple time windows via a temporal constraint on the hypergraph structure. Overall, the proposed method is easy to implement and its versatility is supported by numerical studies on semi-synthetic data and a real corpus.
Topic model
Latent representation
Nonnegative matrix factorization
Vertex hunting algorithm
Anchor word
Dynamic textual data
Main Sponsor
Section on Text Analysis
You have unsaved changes.