Analyzing multi-label open-ended questions: combining texts from individual answer boxes improves classification with language models
Wednesday, Aug 6: 3:05 PM - 3:25 PM
Topic-Contributed Paper Session
Music City Center
Multi-label or check-all-that-apply open-ended questions allow for multiple answers. Previous research on the design of such questions found that providing multiple small answer boxes yields more and richer answers than providing one larger answer box. Using a series of classifiers based on the BERT language model, we empirically study how this design choice affects the classification of such answers. We design a 2x2 factorial experiment: 1) analysis with a multi-label vs. single-label classifier and 2) answers obtained from one larger answer box vs. multiple smaller answer boxes. We find the multi-label classifier gives more accurate results than the single-label classifier (1 % vs. 9 % misclassification of individual labels) regardless of how the answers were obtained. Surprisingly, analysis with a multi-label classifier is preferable. We attribute this success to the classifier's ability to take advantage of label correlations. We conclude that multi-label open-ends should continue to provide multiple answer boxes due to better data quality. However, answer boxes should be concatenated for analysis to improve classification performance.
open-ended question
multi-label
large language model
survey methodology
statistical learning
check-all-that-apply
You have unsaved changes.