Predicting Quality of a Survey Item from the Question Text

Matthias Schonlau Co-Author
University of Waterloo
 
Lydia Repke Co-Author
GESIS – Leibniz Institute for the Social Sciences
 
Barbara Felderer Co-Author
GESIS – Leibniz Institute for the Social Sciences
 
Tiancheng Yang First Author
University of Waterloo
 
Tiancheng Yang Presenting Author
University of Waterloo
 
Tuesday, Aug 5: 3:20 PM - 3:35 PM
1280 
Contributed Papers 
Music City Center 
The Survey Quality Predictor (SQP) predicts the quality of survey questions based on 72 question characteristics (e.g. domain, nouns word count, answer scale, length of question). The question characteristics are manually coded. We evaluate whether it is possible to predict the quality of a survey question directly from the natural language text rather than from the 72 survey characteristics. We found that a language model can predict survey item quality directly from the question/answer options text and do so as good as the random forest model based on the 72 manually coded characteristics.
Specifically, we fine-tuned xlm-RoBERTa, a multilingual transformer-based model trained on multiple text corpora in over 100 languages, on our SQP dataset. The current web interface of the survey quality predictor (https://sqp.gesis.org) asks users to manually input the 72 features that users must code themselves based on a coding manual. Our work shows that the current implementation can be replaced with a much more user friendly web interface: the users simply enter the question text (and answer choices), and our natural language model predicts the question quality.

Keywords

Survey Quality

Language Model

Natural Language Processing

Transformer Model

Random Forest

Deep Learning 

Main Sponsor

Section on Statistical Learning and Data Science