Automated and Secure Text Scraping with Generative AI of School Transcripts for Surveys
Monday, Aug 4: 11:20 AM - 11:35 AM
1830
Contributed Papers
Music City Center
We present TranscriptGenie, a prototype application developed to address the need for efficient and accurate text extraction from PDF school transcripts for several large federal surveys. Secondary and postsecondary transcript data are crucial for understanding student educational journeys and outcomes. Yet extracting meaningful data from PDF school transcripts has long been a labor-intensive process that is often fraught with challenges due to variability in transcript formats, embedded tables, and diverse data structures. In this session, we will provide a comprehensive overview of TranscriptGenie's development process by highlighting the requirements that drove its design and the novel solutions that underpin its capabilities. This includes integrating generative AI technology to handle text variations and leveraging natural language processing techniques for data annotation. We will discuss how this tool is designed to comply with security standards and the use of a graph database to efficiently manage and query the extracted data. Finally, we will discuss next steps needed for deployment and broader implications for transcript analysis in surveys.
Text analysis
Surveys
Generative Artificial Intelligence
Natural Language Processing
Education
Graph database
Main Sponsor
Section on Text Analysis
You have unsaved changes.