AI-Generated Text Detection in the Context of Domain- and Prompt-Specific Essays

Angela Lui Co-Author
City University of New York
 
Jason Bryer Co-Author
City University of New York
 
Bruno de Melo First Author
 
Bruno de Melo Presenting Author
 
Wednesday, Aug 6: 10:50 AM - 11:05 AM
1180 
Contributed Papers 
Music City Center 
The widespread adoption of Large Language Models has made distinguishing between human- and AI-generated essays more challenging. This study explores AI detection methods for domain- and prompt-specific essays within the Diagnostic Assessment and Achievement of College Skills (DAACS) framework, applying both random forest and fine-tuned ModernBERT classifiers. Our approach incorporates pre-chatGPT essays, likely human-generated, alongside synthetic datasets of essays generated and modified by AI. The random forest classifier was trained with open-source embeddings such as miniLM, RoBERTa, and a low-cost OpenAI model, using a one-versus-one strategy. The ModernBERT method employed a novel two-level fine-tuning strategy, incorporating essay-level and sentence-pair classifications that combines global text features with detailed sentence transitions through coherence scoring and style consistency detection. Together, these methods effectively identify whether essays have been altered by AI. Our approach provides a cost-effective solution for specific domains and serves as a robust alternative to generic AI detection tools, all while enabling local execution on consumer-grade hardware.

Keywords

artificial intelligence, machine learning, large language models, text classification, AI detection, ModernBERT, random forest, embedding models, academic integrity

AI ethics, chatGPT, text analysis, coherence detection, style consistency, AI-generated content, synthetic data, DAACS, self-regulated learning, deep learning

essay assessment, diagnostic assessment, educational technology, sentence-pair analysis, miniLM, RoBERTa, OpenAI 

Main Sponsor

Section on Statistics and Data Science Education