Enhancing Research Discovery with LLMs: A Comparative Study of Traditional Topic Modeling Algorithms

Larry Tang Co-Author
University of Central Florida
 
Amir Alipour Yengejeh First Author
University of Central Florida
 
Amir Alipour Yengejeh Presenting Author
University of Central Florida
 
Monday, Aug 4: 10:35 AM - 10:50 AM
1798 
Contributed Papers 
Music City Center 
Topic modeling is essential for uncovering latent themes in scientific literature, aiding research discovery. Traditional models like Latent Dirichlet Allocation (LDA) rely on probabilistic word distributions, often producing incoherent topics, while BERTopic improves clustering with transformer embeddings but requires manual post-processing. TopicGPT, a prompt-based framework powered by Large Language Models (LLMs), generates interpretable topics as natural language descriptions rather than ambiguous word clusters.
This study compares TopicGPT, LDA, and BERTopic across cyberbullying research, forensic science literature, and general scientific papers. The models are evaluated on coherence, diversity, redundancy, interpretability, and research discovery efficiency. Results suggest TopicGPT produces more interpretable and distinct topics, improving classification accuracy and reducing redundancy. While BERTopic excels in semantic clustering, it shows higher topic overlap, and LDA struggles with coherence and interpretability.
These findings highlight LLM-driven topic modeling as a benchmark for enhancing literature analysis, research workflows, and knowledge discovery.

Keywords

AI-Powered Topic Modeling

Latent Dirichlet Allocation (LDA)

BERTopic

Large Language Models (LLMs)

Natural Language Processing (NLP)

Scientific Literature Analysis 

Main Sponsor

Section on Text Analysis