The Use of Big Data-Based Model Prediction for Stratification of Household Addresses
Tuesday, Aug 6: 9:50 AM - 10:05 AM
2919
Contributed Papers
Oregon Convention Center
The National Survey of Early Care and Education (NSECE) is the most comprehensive study of the availability and use of early care and education (ECE) in the U.S. Bec ause the target population of the NSECE's household survey is a relatively small proportion of all households, the cost of screening households to determine eligibility has always been an important constraint for the NSECE. Like many household surveys the NSECE also faces the twin challenges of declining response rates and rising data collection costs. In response the 2024 NSECE incorporates big data classification and disproportionate stratification into its frame construction and sampling design. Household commercial data are used as inputs for a machine learning model that predicts the probability that a given household on the frame falls within the target population. Household addresses are then stratified accordingly and households with a high probability of eligibility are oversampled. In this study we will evaluate the tradeoff between cost savings and survey precision and compare realized eligibility rates during data collection to their predicted equivalents at the design stage.
Big Data
Machine Learning
Stratification
Sample Design
Main Sponsor
Survey Research Methods Section
You have unsaved changes.