Implementing retrieval-augmented generation with survey question evaluation reports.

Conference: Symposium on Data Science and Statistics (SDSS) 2024
06/06/2024: 2:25 PM - 2:30 PM EDT
Lightning 

Description

Survey question evaluation studies play a crucial role in improving questionnaire design and enhancing the interpretation and analysis of survey data. The Collaborating Center for Questionnaire Design and Evaluation Research at the Centers for Disease Control and Prevention's National Center for Health Statistics maintains an online repository, Q-Bank, which houses extensive research reports on survey questions spanning back through 1990. Many are validity studies that delineate construct(s) captured by individual questions as they relay the phenomena considered by respondents when formulating answers within an interview setting. This research enables data users to have a better understanding of the data, allowing for a more sophisticated interpretation of findings. The objective of this project is to determine the feasibility of using AI tools to enhance user navigation within Q-Bank. We developed a retrieval-augmented generation (RAG) based interface that leverages generative AI tools to facilitate user access to relevant information from Q-Bank. The RAG aims to index information about research documents in the repository and retrieve salient details such as citation information and links in response to user queries. Improved indexing and information retrieval increases the usefulness of Q-Bank as it would allow for a more comprehensive search of questions, enabling researchers and survey methodologists to access insights on question validity and construct capture. We also implemented an evaluation framework to derive performance metrics of the RAG. These findings can be used to inform approaches to index other sources of data, disseminate research, and streamline literature review processes, saving time and effort while ensuring informed decision-making.

Keywords

Generative AI

Artificial Intelligence

Large Language Models 

Presenting Author

Priyam Patel, Centers for Disease Control and Prevention

First Author

Priyam Patel, Centers for Disease Control and Prevention

CoAuthor(s)

Justin Mezetin, Swan Solutions/NCHS
Benjamin Rogers, NCHS

Tracks

Practice and Applications
Symposium on Data Science and Statistics (SDSS) 2024