Predicting the Need to Recontact in Household Survey Data: A Machine Learning Approach
Conference: Symposium on Data Science and Statistics (SDSS) 2023
05/26/2023: 12:25 PM - 12:50 PM CDT
Refereed
The Spanish Survey of Household Finances (EFF) is a large-scale survey and a complex statistical operation. Data editing is a major task in the production process of survey data where the revision team manually checks the consistency among questions and considers the help of interviewer comments and audio records to edit the data if necessary. Household interviews are sometimes fled with data ommisions and inconsistencies. When this occurs, households are recontacted and are re-asked certain parts of the questionnaire. In essence, the manual revision process enteails several costs, namely, time and measurement error. In this paper, using structured and unstructured surgey-generated data, we examine the use of machine learning techniques that allow to classify interviews that require the need to carefully analyze its questionnaire and potentially recontact the interviewed household. We find an algorithm or score function that predicts with relative high accuracy such kind of household interviews. Our contribution to the survey data production literature is twofold. First, we provide a way to shorten revision and data production time. Second, we propose a methodology to reduce the time between first and second contact for recontacted households, potentially also reducing measurement error.
machine learning
survey data
household survey
Presenting Author
Nicolás Forteza, Bank of Spain
First Author
Nicolás Forteza, Bank of Spain
CoAuthor
Sandra García Uribe, Bank of Spain
Target Audience
Mid-Level
Tracks
Practice and Applications
Symposium on Data Science and Statistics (SDSS) 2023
You have unsaved changes.