The Use of Big Data-Based Model Prediction for Stratification of Household Addresses

Noah Bassel First Author
NORC
 
Noah Bassel Presenting Author
NORC
 
Tuesday, Aug 6: 9:50 AM - 10:05 AM
2919 
Contributed Papers 
Oregon Convention Center 
The National Survey of Early Care and Education (NSECE) is the most comprehensive study of the availability and use of early care and education (ECE) in the U.S. Bec ause the target population of the NSECE's household survey is a relatively small proportion of all households, the cost of screening households to determine eligibility has always been an important constraint for the NSECE. Like many household surveys the NSECE also faces the twin challenges of declining response rates and rising data collection costs. In response the 2024 NSECE incorporates big data classification and disproportionate stratification into its frame construction and sampling design. Household commercial data are used as inputs for a machine learning model that predicts the probability that a given household on the frame falls within the target population. Household addresses are then stratified accordingly and households with a high probability of eligibility are oversampled. In this study we will evaluate the tradeoff between cost savings and survey precision and compare realized eligibility rates during data collection to their predicted equivalents at the design stage.

Keywords

Big Data

Machine Learning

Stratification

Sample Design 

Main Sponsor

Survey Research Methods Section