Analyzing multi-label open-ended questions: combining texts from individual answer boxes improves classification with language models

Ruben Bach Co-Author
University of Mannheim
 
Katharina Meitinger Co-Author
GESIS - Leibniz-Institute for the Social Sciences
 
Matthias Schonlau Speaker
University of Waterloo
 
Wednesday, Aug 6: 3:05 PM - 3:25 PM
Topic-Contributed Paper Session 
Music City Center 
Multi-label or check-all-that-apply open-ended questions allow for multiple answers. Previous research on the design of such questions found that providing multiple small answer boxes yields more and richer answers than providing one larger answer box. Using a series of classifiers based on the BERT language model, we empirically study how this design choice affects the classification of such answers. We design a 2x2 factorial experiment: 1) analysis with a multi-label vs. single-label classifier and 2) answers obtained from one larger answer box vs. multiple smaller answer boxes. We find the multi-label classifier gives more accurate results than the single-label classifier (1 % vs. 9 % misclassification of individual labels) regardless of how the answers were obtained. Surprisingly, analysis with a multi-label classifier is preferable. We attribute this success to the classifier's ability to take advantage of label correlations. We conclude that multi-label open-ends should continue to provide multiple answer boxes due to better data quality. However, answer boxes should be concatenated for analysis to improve classification performance.

Keywords

open-ended question

multi-label

large language model

survey methodology

statistical learning

check-all-that-apply