Biases Arising from Incomplete Data and Selection Effects in Survey and Non-Survey Data Sources

Jennifer Parker Chair
National Center for Health Statistics
 
John Eltinge Organizer
United States Census Bureau (retired)
 
Wednesday, Aug 6: 2:00 PM - 3:50 PM
0318 
Invited Paper Session 
Music City Center 
Room: CC-Davidson Ballroom B 

Applied

Yes

Main Sponsor

Government Statistics Section

Co Sponsors

Social Statistics Section
Survey Research Methods Section

Presentations

Integration of Concepts, Methodology and Empirical Results on Biases from Incomplete Data in Survey and Non-Survey Information Sources

Title: Integration of Concepts, Methodology and Empirical Results on Biases from Incomplete Data in Survey and Non-Survey Information Sources

Author: John L. Eltinge, United States Census Bureau [email protected]

Key words: auxiliary data; data quality; incomplete frame coverage; total survey error model; total uncertainty analysis; unit, wave and item survey nonresponse

Abstract:

This paper reviews and integrates the wide range of literature on concepts, methodology and empirical results related to biases from incomplete data in survey and non-survey information sources. Two areas receive principal attention.
The first area focuses on analyses of incomplete-data biases as such, and on related mitigation efforts. This includes work with incomplete-data patterns arising from:

- unit, wave and item nonresponse in sample surveys; and

- problems with administrative records and other organic-data sources used to develop and refine survey frames, weighting and imputation procedures, and also used as direct inputs for production of statistical information

The discussion of incomplete-data bias places special emphasis on:

- availability, costs and limitations of auxiliary data used for evaluation of biases;

- development and evaluation of models used for those evaluations; and

- reporting of empirical results from those evaluations

The second area focuses on integration of nonresponse bias into a broader context, including:

- Comparison of the magnitudes of incomplete-data biases with the magnitudes of other components of total survey error models, e.g., measurement error and modeling error

- Quantitative and qualitative assessment of the ways in which incomplete-data biases, and related mitigation efforts, may affect multiple dimensions of data quality, e.g., accuracy; comparability; temporal and cross-sectional granularity; interpretability; and relevance

- Evaluation of the impact of incomplete-data bias on the value delivered to stakeholders through a specified suite of statistical information products

- Transparent and actionable communication with stakeholders regarding the above-mentioned concepts and empirical results


 

Keywords

Nonresponse Bias

Incomplete Frame

Diagnostics and Sensitivity Analyses

Total Survey Error Model

Stakeholder Utility Functions

Transparency, Reproducibility and Replicability 

Speaker

John Eltinge, United States Census Bureau (retired)

From bias adjustment to bias avoidance: an unexpected journey

The suite of 80+ ONS business surveys have developed organically over the past 80+ years, with different launch dates meaning their statistical methods were optimized for a variety of data collection processes and to evolving guidelines for best practice. This has led to a lack of consistency, and esoteric practices, which presents challenges both in ongoing support and transformation activities, such as combining existing surveys to meet new requirements.

A specific example of the problems this evolution caused was the Business Enterprise Research and Development survey (BERD). Comparisons with a similar output, produced by a different government department, showed an increasing divergence over time. The root cause was found to be an incomplete sampling frame developed specifically for the BERD survey, which had worked historically due to targeting the specific businesses engaged in R&D activity, but had failed to adapt when the economic landscape changed and many more small businesses started undertaking R&D. Complex bias adjustment methodology was developed, which accounted for multiple specific methods issues unique to the survey, including non-response in a source responsible for updating the incomplete frame.

Following lengthy consultation, communication with interested bodies both nationally and internationally, and thorough internal QA and scrutiny, the bias adjustment was successfully implemented and nine years of data were revised and republished in 2022. The new figures attracted immense public and press interest, which although diminished over time, still makes headlines regularly. After some early surprise despite our extensive pre-release efforts to inform our users, the new figures have been widely welcomed and seen as an improvement.

To continue the improvement, the BERD survey was radically redesigned for 2023, again with extensive QA. The approach taken was to apply a robust sample design, which would be easily combined with other standard ONS surveys, and implement modern best practice methods throughout. One of the biggest changes was to increase the sample size ten-fold, and sample directly from the ONS business register to identify R&D hotspots. The sample increase, and associated costs, were planned for the first year only – after which sample optimisation based on the first year's data aimed to reduce the sample by half. How this succeeded will be revealed in this talk.

The approach taken for BERD blazed the trail for a wider holistic transformation of ONS business surveys, ongoing since late 2023, with the aims of increased quality, reduced business burden, improved response rates, consistent and best practice methods. The vision is of integrated modular surveys built on a core foundation of our largest annual survey, updated and re-invigorated for that over-arching purpose.
 

Keywords

Bias

Frame

Methodology

Sampling

Revision

Future 

Speaker

Gary Brown, Office for National Statistics

WITHDRAWN Evaluation of the selection bias for the non-probability sample with multiply robust methods

The increasing popularity of non-probability sampling is attributed to its convenience and cost-effectiveness. However, these samples often suffer from selection bias, making it essential to assess the extent of this bias to inform comparisons of sample quality. In this paper, we propose a method for evaluating selection bias in non-probability samples using multiple propensity score and outcome regression models. Our approach ensures that the estimator of selection bias remains consistent if at least one of the models is correctly specified. We further assess the effectiveness of our proposed methods through a Monte Carlo simulation study and an application to real data from the Household Pulse Surveys.

 

Keywords

Multiply robust

Non-probability sample

Variance estimation 

Co-Author

Sixia Chen