Enhancing Research Discovery with LLMs: A Comparative Study of Traditional Topic Modeling Algorithms
Larry Tang
Co-Author
University of Central Florida
Monday, Aug 4: 10:35 AM - 10:50 AM
1798
Contributed Papers
Music City Center
Topic modeling is essential for uncovering latent themes in scientific literature, aiding research discovery. Traditional models like Latent Dirichlet Allocation (LDA) rely on probabilistic word distributions, often producing incoherent topics, while BERTopic improves clustering with transformer embeddings but requires manual post-processing. TopicGPT, a prompt-based framework powered by Large Language Models (LLMs), generates interpretable topics as natural language descriptions rather than ambiguous word clusters.
This study compares TopicGPT, LDA, and BERTopic across cyberbullying research, forensic science literature, and general scientific papers. The models are evaluated on coherence, diversity, redundancy, interpretability, and research discovery efficiency. Results suggest TopicGPT produces more interpretable and distinct topics, improving classification accuracy and reducing redundancy. While BERTopic excels in semantic clustering, it shows higher topic overlap, and LDA struggles with coherence and interpretability.
These findings highlight LLM-driven topic modeling as a benchmark for enhancing literature analysis, research workflows, and knowledge discovery.
AI-Powered Topic Modeling
Latent Dirichlet Allocation (LDA)
BERTopic
Large Language Models (LLMs)
Natural Language Processing (NLP)
Scientific Literature Analysis
Main Sponsor
Section on Text Analysis
You have unsaved changes.