Print Close

Enhancing Research Discovery with LLMs: A Comparative Study of Traditional Topic Modeling Algorithms

Presented During: Topics in Text Analysis: Topic Modeling, LLMs, and Beyond

Larry Tang Co-Author
University of Central Florida

Amir Alipour Yengejeh First Author
University of Central Florida

Amir Alipour Yengejeh Presenting Author
University of Central Florida

Monday, Aug 4: 10:35 AM - 10:50 AM
1798
Contributed Papers

Music City Center

Topic modeling is essential for uncovering latent themes in scientific literature, aiding research discovery. Traditional models like Latent Dirichlet Allocation (LDA) rely on probabilistic word distributions, often producing incoherent topics, while BERTopic improves clustering with transformer embeddings but requires manual post-processing. TopicGPT, a prompt-based framework powered by Large Language Models (LLMs), generates interpretable topics as natural language descriptions rather than ambiguous word clusters.
This study compares TopicGPT, LDA, and BERTopic across cyberbullying research, forensic science literature, and general scientific papers. The models are evaluated on coherence, diversity, redundancy, interpretability, and research discovery efficiency. Results suggest TopicGPT produces more interpretable and distinct topics, improving classification accuracy and reducing redundancy. While BERTopic excels in semantic clustering, it shows higher topic overlap, and LDA struggles with coherence and interpretability.
These findings highlight LLM-driven topic modeling as a benchmark for enhancing literature analysis, research workflows, and knowledge discovery.

Keywords

AI-Powered Topic Modeling

Latent Dirichlet Allocation (LDA)

BERTopic

Large Language Models (LLMs)

Natural Language Processing (NLP)

Scientific Literature Analysis

Main Sponsor

Section on Text Analysis