Print Close

Predicting Quality of a Survey Item from the Question Text

Presented During: Applications of AI and Machine Learning in Science and Business

Matthias Schonlau Co-Author
University of Waterloo

Lydia Repke Co-Author
GESIS – Leibniz Institute for the Social Sciences

Barbara Felderer Co-Author
GESIS – Leibniz Institute for the Social Sciences

Tiancheng Yang First Author
University of Waterloo

Tiancheng Yang Presenting Author
University of Waterloo

Tuesday, Aug 5: 3:20 PM - 3:35 PM
1280
Contributed Papers

Music City Center

The Survey Quality Predictor (SQP) predicts the quality of survey questions based on 72 question characteristics (e.g. domain, nouns word count, answer scale, length of question). The question characteristics are manually coded. We evaluate whether it is possible to predict the quality of a survey question directly from the natural language text rather than from the 72 survey characteristics. We found that a language model can predict survey item quality directly from the question/answer options text and do so as good as the random forest model based on the 72 manually coded characteristics.
Specifically, we fine-tuned xlm-RoBERTa, a multilingual transformer-based model trained on multiple text corpora in over 100 languages, on our SQP dataset. The current web interface of the survey quality predictor (https://sqp.gesis.org) asks users to manually input the 72 features that users must code themselves based on a coding manual. Our work shows that the current implementation can be replaced with a much more user friendly web interface: the users simply enter the question text (and answer choices), and our natural language model predicts the question quality.

Keywords

Survey Quality

Language Model

Natural Language Processing

Transformer Model

Random Forest

Deep Learning

Main Sponsor

Section on Statistical Learning and Data Science