Exploring Computational Approaches for Coding Qualitative Responses in the Medical Expenditure Panel Survey

Conference: Symposium on Data Science and Statistics (SDSS) 2024
06/06/2024: 2:10 PM - 2:15 PM EDT
Lightning 

Description

The Medical Expenditure Panel Survey (MEPS) is a widely utilized nationally representative survey designed to explore healthcare utilization and expenditure patterns within the U.S. Information in the MEPS, such as the use of healthcare services, is represented by both quantitative (close-ended) and qualitative (open-ended) responses. One of the primary challenges when working with MEPS data involves the process of coding open-ended responses into standardized categories. Manual coding of text data from open-ended questions is time-consuming and costly. The accumulated manual coding data in MEPS has enabled the training of computational models to automate the process of coding qualitative responses. However, such efforts have not been undertaken within the context of MEPS.

To accelerate the data preprocessing of MEPS data, we explored computational approaches to automatically code the qualitative responses. We began by transforming qualitative responses into word embeddings using BERT-based models. Our category prediction process involves two approaches: (1) predicting the code by identifying the most similar responses from previous years using embedding similarities and linking the current qualitative response to the coding results from those prior years, and (2) using the embeddings as features to train machine learning models for predicting the code.

We evaluated our approaches to coding two open-ended questions. The responses collected for both questions, along with their coding results from 2018 to 2021, were utilized as the training dataset, while the data from 2022 was used as the testing dataset. Both approaches consistently achieve high accuracy, ranging from 90.7% to 95.4%, in coding responses to the two questions. Our results indicate that computational models hold significant promise for coding qualitative responses in MEPS, underscoring the need for further exploration in future studies.

Keywords

Natural Language Processing

Machine Learning

Qualitative Coding

Transfer Learning

Survey Statistics 

Presenting Author

Mengshi Zhou

First Author

Mengshi Zhou

CoAuthor(s)

Oliva He, Westat
Chris Barzola, Westat
Alexandra Marin, Westat
Michael Raithel, Westat
Jeannie Hudnall, Westat
Kevin Wilson, Westat

Tracks

Statistical Data Science
Symposium on Data Science and Statistics (SDSS) 2024