Challenges in Error Estimation for Survey Data

Adriana Perez Chair
University of Texas At Houston, Health Science Center
 
Tuesday, Aug 6: 10:30 AM - 12:20 PM
5110 
Contributed Papers 
Oregon Convention Center 
Room: CC-G132 

Main Sponsor

Survey Research Methods Section

Presentations

Comparison of Variance Estimators for Self-Representing Primary Sample Units

Many surveys estimate variances with the balance repeated replication (BRR) variance estimator. With the self-representing (SR) Primary Sample Units (PSUs), surveys sometimes split them into parts which are then paired into pseudo strata and then BRR is applied to the pseudo strata. However, there is not much guidance on the number of pseudo strata to split the SR strata into or how (or if) the sort order should be used to split the sample when the sample was selected with systematic random sampling. Our research considered twelve different applications of the BRR variance estimators that varied by the number of pseudo strata formed and by how the sort order of a systematic random sample was used to split the PSU. We also included variations of the delete-a-group jackknife and successive difference replication variance estimators. Using simulations involving data from the Consumer Expenditures Survey, we found that the BRR variance estimator that split the sample of the SR PSUs into the most replicates possible and split the sample using the sort order was the best overall variance estimator for both national-level estimates and individual PSU-level estimates. 

Keywords

Variance estimation

Self-representing strata

Balanced-repeated replication

Delete-a-group jackknife

Successive difference replication 

View Abstract 2731

First Author

Stephen Ash, Bureau of Labor Statistics

Presenting Author

Stephen Ash, Bureau of Labor Statistics

Improved estimators of variance of the regression estimator in two-phase sampling

In this paper, a few interesting estimators for estimating the variance of the regression estimator in two phase sampling have been considered. An improved Jackknife technique for estimating the variance of the regression estimator in two-phase sampling has been suggested. The jackknife estimator proposed by Sitter (1997: Journal of the American Statistical Association, pp. 780-787) has been shown to be a special case of the proposed strategy. Improved strategies are based on the estimation techniques suggested by Isaki (1983: Journal of the American Statistical Association, pp. 117-123) for estimating the finite population variance. An empirical study has been carried out to show the performance of the proposed strategies over the Sitter estimators. 

Keywords

Two-phase sampling

Jackknife

Regression estimator

Variance estimation 

View Abstract 2489

Co-Author

Sarjinder Singh, Texas A&M University-Kingsville

First Author

Lane Christiansen

Presenting Author

Lane Christiansen

Incorporating Inclconlsuve Outcomes in Error Rate Estimation with Applications in Forensic Science

Binary decision-making occurs in many areas of science and policy; e.g., medicine (tumor present or absent), forensics (ID or exclusion), finance (good or bad credit risk), and agriculture (healthy or diseased plant). Lab or field studies may be conducted to assess the error rates in such binary decision-making processes (e.g., proficiency tests for radiologists or latent print examiners). In such tests, a true outcome is known (e.g., latent print and file print did or did not come from the same source), but study outcomes allow three responses (e.g., ``same,'' ``different,'' ``inconclusive''). Many forensic science articles report such studies' results by completely ignoring inconclusive decisions, which can artificially increase the apparent error rate. In this talk, we propose a weighting scheme to incorporate inconclusive decisions into error rates stratified by latent print quality. Additionally, we propose that Standardization can be used to compare error rates across labs and studies. 

Keywords

error rates

inconclusive decisions

standardization

small sample size

quality

forensic science 

View Abstract 2737

Co-Author(s)

Karen Kafadar, University of Virginia
Jordan Rodu, University of Virginia

First Author

Sydney Campbell, University of Virginia

Presenting Author

Sydney Campbell, University of Virginia

Jackknife Variance Estimation for Web Panel Health Survey Estimates Based on a Propensity-Score Meth

Taking advantage of web-based technology to develop and implement web surveys can be an efficient way of conducting surveys . The development of probability panels for administering web surveys has increased their usefulness. However, in addition to possible mode effects, differences remain between these and large national population surveys, which generally have lower sampling and non-sampling errors.
To improve the consistency of web survey estimates, it is common to adjust the estimates using a higher quality survey as the reference (benchmark) survey. One statistical method is a propensity score strategy. By concatenating the web survey and reference survey and applying a propensity score model to the combined data, the odds of being in the web survey is estimated by conditioning on selected covariates. For the variance estimation of adjusted estimates, typical Taylor-series or Jackknife variance estimators, based only on the web survey, underestimate the variance since the estimators ignore the variance components due to sampling variation in the reference survey.
To consider the sampling variation in the reference survey, we develop a Jackknife variance estimator for ad 

Keywords

Variance

Complex Sample

Jackknife 

View Abstract 2921

First Author

Hee-Choon Shin, National Center for Health Statistics

Presenting Author

Hee-Choon Shin, National Center for Health Statistics

Survey data integration with applications to hypertension among US children and adolescents

Probability sampling has served as the major approach for finite population inference for decades. In the era of big data, nonprobability samples become popular for their feasibility and cost-effectiveness. However, without a known inclusion mechanism, nonprobability samples fail to represent the target population unless appropriate adjustments are made. To leverage the strengths of both sources, we develop a data integration method of probability and nonprobability samples when the variable of interest is observed in both samples. The proposed optimal estimator exhibits efficiency over estimators from either sample. The method also accommodates informative selection of the nonprobability sample and ignorable nonresponse within the probability sample. We implement the method to analyze blood pressure data of US children and adolescents from the National Health and Nutrition Examination Survey (NHANES) and well-child visits throughout the Geisinger Health System. Replication method is used in variance estimation to account for the complex probability survey design of NHANES. 

Keywords

Nonprobability sample

Probability sample

Informative sampling

Missing at random

Variance estimation

NHANES 

View Abstract 3751

Co-Author(s)

Emily Berg, Iowa State University
Zhengyuan Zhu, Iowa State University

First Author

Chengpeng Zeng

Presenting Author

Chengpeng Zeng

The Effects of Measurement Error on Health Estimates in Web vs Face-to-Face National Health Surveys

This report explores the differences for seven national health estimates from a web-based survey, the third round of the Research and Development Survey (RANDS 3, n=2,616), and an in-person survey, the 2019 National Health Interview Survey (2019 NHIS, n=31,997). The five physical health variables include ever diagnosed by a physician or other medical professional with asthma, diabetes, high blood pressure or hypertension, high cholesterol, and chronic obstructive pulmonary disease (COPD). The two mental health variables are major depressive disorder (depression) and generalized anxiety disorder (GAD). The statistical analysis included two main components: 1) comparing weighted estimates by data source and conducting Rao-Scott significance testing to detect initial evidence of significant differences by data source, and 2) building logistic regression models for each health outcome, and conducting Wald tests to determine statistical significance of interaction terms. The results find the estimates from the web survey are consistently higher than the in-person survey. One possible explanation is the web survey is less subject to social desirability bias. 

Keywords

web survey

face-to-face survey

total survey error

secondary data analysis

significance testing 

View Abstract 2168

First Author

Leanna Moron, Westat

Presenting Author

Leanna Moron, Westat