A Novel Statistical Method for Dynamic Topic Modeling

Qing Nie Co-Author
University of California, Irvine: Mathematics; Developmental and Cell Biology
 
Annie Qu Co-Author
University of California At Irvine
 
Hanjia Gao First Author
University of California, Irvine
 
Hanjia Gao Presenting Author
University of California, Irvine
 
Monday, Aug 4: 11:35 AM - 11:50 AM
1512 
Contributed Papers 
Music City Center 
Topic modeling aims at extracting a low-rank semantic structure from a large corpus of text documents. Most existing methods fall into either the Latent Dirichlet Allocation (LDA) framework or the Probabilistic Latent Semantic Indexing (pLSI) framework. However, the underlying word co-occurrence pattern is often neglected. Motivated by this limitation, we have proposed a novel statistical method by incorporating word co-occurrence. Specifically, we use a hypergraph structure to model word interaction and use the node heterogeneity to model word frequency. Then, we learn a latent low-rank factorization of the hypergraph parameters to recover the topics. Moreover, our proposed method can be flexibly generalized for dynamic topic modeling of a sequence of corpora over multiple time windows via a temporal constraint on the hypergraph structure. Overall, the proposed method is easy to implement and its versatility is supported by numerical studies on semi-synthetic data and a real corpus.

Keywords

Topic model

Latent representation

Nonnegative matrix factorization

Vertex hunting algorithm

Anchor word

Dynamic textual data 

Main Sponsor

Section on Text Analysis