Print Close

AI-Generated Text Detection in the Context of Domain- and Prompt-Specific Essays

Presented During: Using New Technology to Enhance Statistics and Data Science Education

Angela Lui Co-Author
City University of New York

Jason Bryer Co-Author
City University of New York

Bruno de Melo First Author

Bruno de Melo Presenting Author

Wednesday, Aug 6: 10:50 AM - 11:05 AM
1180
Contributed Papers

Music City Center

The widespread adoption of Large Language Models has made distinguishing between human- and AI-generated essays more challenging. This study explores AI detection methods for domain- and prompt-specific essays within the Diagnostic Assessment and Achievement of College Skills (DAACS) framework, applying both random forest and fine-tuned ModernBERT classifiers. Our approach incorporates pre-chatGPT essays, likely human-generated, alongside synthetic datasets of essays generated and modified by AI. The random forest classifier was trained with open-source embeddings such as miniLM, RoBERTa, and a low-cost OpenAI model, using a one-versus-one strategy. The ModernBERT method employed a novel two-level fine-tuning strategy, incorporating essay-level and sentence-pair classifications that combines global text features with detailed sentence transitions through coherence scoring and style consistency detection. Together, these methods effectively identify whether essays have been altered by AI. Our approach provides a cost-effective solution for specific domains and serves as a robust alternative to generic AI detection tools, all while enabling local execution on consumer-grade hardware.

Keywords

artificial intelligence, machine learning, large language models, text classification, AI detection, ModernBERT, random forest, embedding models, academic integrity

AI ethics, chatGPT, text analysis, coherence detection, style consistency, AI-generated content, synthetic data, DAACS, self-regulated learning, deep learning

essay assessment, diagnostic assessment, educational technology, sentence-pair analysis, miniLM, RoBERTa, OpenAI

Main Sponsor

Section on Statistics and Data Science Education