Wednesday, Aug 6: 2:00 PM - 3:50 PM
0318
Invited Paper Session
Music City Center
Room: CC-Davidson Ballroom B
Applied
Yes
Main Sponsor
Government Statistics Section
Co Sponsors
Social Statistics Section
Survey Research Methods Section
Presentations
Title: Integration of Concepts, Methodology and Empirical Results on Biases from Incomplete Data in Survey and Non-Survey Information Sources
Author: John L. Eltinge, United States Census Bureau
[email protected] Key words: auxiliary data; data quality; incomplete frame coverage; total survey error model; total uncertainty analysis; unit, wave and item survey nonresponse
Abstract:
This paper reviews and integrates the wide range of literature on concepts, methodology and empirical results related to biases from incomplete data in survey and non-survey information sources. Two areas receive principal attention.
The first area focuses on analyses of incomplete-data biases as such, and on related mitigation efforts. This includes work with incomplete-data patterns arising from:
- unit, wave and item nonresponse in sample surveys; and
- problems with administrative records and other organic-data sources used to develop and refine survey frames, weighting and imputation procedures, and also used as direct inputs for production of statistical information
The discussion of incomplete-data bias places special emphasis on:
- availability, costs and limitations of auxiliary data used for evaluation of biases;
- development and evaluation of models used for those evaluations; and
- reporting of empirical results from those evaluations
The second area focuses on integration of nonresponse bias into a broader context, including:
- Comparison of the magnitudes of incomplete-data biases with the magnitudes of other components of total survey error models, e.g., measurement error and modeling error
- Quantitative and qualitative assessment of the ways in which incomplete-data biases, and related mitigation efforts, may affect multiple dimensions of data quality, e.g., accuracy; comparability; temporal and cross-sectional granularity; interpretability; and relevance
- Evaluation of the impact of incomplete-data bias on the value delivered to stakeholders through a specified suite of statistical information products
- Transparent and actionable communication with stakeholders regarding the above-mentioned concepts and empirical results
Keywords
Nonresponse Bias
Incomplete Frame
Diagnostics and Sensitivity Analyses
Total Survey Error Model
Stakeholder Utility Functions
Transparency, Reproducibility and Replicability
The suite of 80+ ONS business surveys have developed organically over the past 80+ years, with different launch dates meaning their statistical methods were optimized for a variety of data collection processes and to evolving guidelines for best practice. This has led to a lack of consistency, and esoteric practices, which presents challenges both in ongoing support and transformation activities, such as combining existing surveys to meet new requirements.
A specific example of the problems this evolution caused was the Business Enterprise Research and Development survey (BERD). Comparisons with a similar output, produced by a different government department, showed an increasing divergence over time. The root cause was found to be an incomplete sampling frame developed specifically for the BERD survey, which had worked historically due to targeting the specific businesses engaged in R&D activity, but had failed to adapt when the economic landscape changed and many more small businesses started undertaking R&D. Complex bias adjustment methodology was developed, which accounted for multiple specific methods issues unique to the survey, including non-response in a source responsible for updating the incomplete frame.
Following lengthy consultation, communication with interested bodies both nationally and internationally, and thorough internal QA and scrutiny, the bias adjustment was successfully implemented and nine years of data were revised and republished in 2022. The new figures attracted immense public and press interest, which although diminished over time, still makes headlines regularly. After some early surprise despite our extensive pre-release efforts to inform our users, the new figures have been widely welcomed and seen as an improvement.
To continue the improvement, the BERD survey was radically redesigned for 2023, again with extensive QA. The approach taken was to apply a robust sample design, which would be easily combined with other standard ONS surveys, and implement modern best practice methods throughout. One of the biggest changes was to increase the sample size ten-fold, and sample directly from the ONS business register to identify R&D hotspots. The sample increase, and associated costs, were planned for the first year only – after which sample optimisation based on the first year's data aimed to reduce the sample by half. How this succeeded will be revealed in this talk.
The approach taken for BERD blazed the trail for a wider holistic transformation of ONS business surveys, ongoing since late 2023, with the aims of increased quality, reduced business burden, improved response rates, consistent and best practice methods. The vision is of integrated modular surveys built on a core foundation of our largest annual survey, updated and re-invigorated for that over-arching purpose.
Keywords
Bias
Frame
Methodology
Sampling
Revision
Future
Speaker
Gary Brown, Office for National Statistics
The increasing popularity of non-probability sampling is attributed to its convenience and cost-effectiveness. However, these samples often suffer from selection bias, making it essential to assess the extent of this bias to inform comparisons of sample quality. In this paper, we propose a method for evaluating selection bias in non-probability samples using multiple propensity score and outcome regression models. Our approach ensures that the estimator of selection bias remains consistent if at least one of the models is correctly specified. We further assess the effectiveness of our proposed methods through a Monte Carlo simulation study and an application to real data from the Household Pulse Surveys.
Keywords
Multiply robust
Non-probability sample
Variance estimation