Predicting the Need to Recontact in Household Survey Data: A Machine Learning Approach

Conference: Symposium on Data Science and Statistics (SDSS) 2023
05/26/2023: 12:25 PM - 12:50 PM CDT
Refereed 

Description

The Spanish Survey of Household Finances (EFF) is a large-scale survey and a complex statistical operation. Data editing is a major task in the production process of survey data where the revision team manually checks the consistency among questions and considers the help of interviewer comments and audio records to edit the data if necessary. Household interviews are sometimes fled with data ommisions and inconsistencies. When this occurs, households are recontacted and are re-asked certain parts of the questionnaire. In essence, the manual revision process enteails several costs, namely, time and measurement error. In this paper, using structured and unstructured surgey-generated data, we examine the use of machine learning techniques that allow to classify interviews that require the need to carefully analyze its questionnaire and potentially recontact the interviewed household. We find an algorithm or score function that predicts with relative high accuracy such kind of household interviews. Our contribution to the survey data production literature is twofold. First, we provide a way to shorten revision and data production time. Second, we propose a methodology to reduce the time between first and second contact for recontacted households, potentially also reducing measurement error.

Keywords

machine learning

survey data

household survey 

Presenting Author

Nicolás Forteza, Bank of Spain

First Author

Nicolás Forteza, Bank of Spain

CoAuthor

Sandra García Uribe, Bank of Spain

Target Audience

Mid-Level

Tracks

Practice and Applications
Symposium on Data Science and Statistics (SDSS) 2023