Tuesday, Aug 6: 10:30 AM - 12:20 PM
5110
Contributed Papers
Oregon Convention Center
Room: CC-G132
Main Sponsor
Survey Research Methods Section
Presentations
Many surveys estimate variances with the balance repeated replication (BRR) variance estimator. With the self-representing (SR) Primary Sample Units (PSUs), surveys sometimes split them into parts which are then paired into pseudo strata and then BRR is applied to the pseudo strata. However, there is not much guidance on the number of pseudo strata to split the SR strata into or how (or if) the sort order should be used to split the sample when the sample was selected with systematic random sampling. Our research considered twelve different applications of the BRR variance estimators that varied by the number of pseudo strata formed and by how the sort order of a systematic random sample was used to split the PSU. We also included variations of the delete-a-group jackknife and successive difference replication variance estimators. Using simulations involving data from the Consumer Expenditures Survey, we found that the BRR variance estimator that split the sample of the SR PSUs into the most replicates possible and split the sample using the sort order was the best overall variance estimator for both national-level estimates and individual PSU-level estimates.
Keywords
Variance estimation
Self-representing strata
Balanced-repeated replication
Delete-a-group jackknife
Successive difference replication
First Author
Stephen Ash, Bureau of Labor Statistics
Presenting Author
Stephen Ash, Bureau of Labor Statistics
In this paper, a few interesting estimators for estimating the variance of the regression estimator in two phase sampling have been considered. An improved Jackknife technique for estimating the variance of the regression estimator in two-phase sampling has been suggested. The jackknife estimator proposed by Sitter (1997: Journal of the American Statistical Association, pp. 780-787) has been shown to be a special case of the proposed strategy. Improved strategies are based on the estimation techniques suggested by Isaki (1983: Journal of the American Statistical Association, pp. 117-123) for estimating the finite population variance. An empirical study has been carried out to show the performance of the proposed strategies over the Sitter estimators.
Keywords
Two-phase sampling
Jackknife
Regression estimator
Variance estimation
Binary decision-making occurs in many areas of science and policy; e.g., medicine (tumor present or absent), forensics (ID or exclusion), finance (good or bad credit risk), and agriculture (healthy or diseased plant). Lab or field studies may be conducted to assess the error rates in such binary decision-making processes (e.g., proficiency tests for radiologists or latent print examiners). In such tests, a true outcome is known (e.g., latent print and file print did or did not come from the same source), but study outcomes allow three responses (e.g., ``same,'' ``different,'' ``inconclusive''). Many forensic science articles report such studies' results by completely ignoring inconclusive decisions, which can artificially increase the apparent error rate. In this talk, we propose a weighting scheme to incorporate inconclusive decisions into error rates stratified by latent print quality. Additionally, we propose that Standardization can be used to compare error rates across labs and studies.
Keywords
error rates
inconclusive decisions
standardization
small sample size
quality
forensic science
Taking advantage of web-based technology to develop and implement web surveys can be an efficient way of conducting surveys . The development of probability panels for administering web surveys has increased their usefulness. However, in addition to possible mode effects, differences remain between these and large national population surveys, which generally have lower sampling and non-sampling errors.
To improve the consistency of web survey estimates, it is common to adjust the estimates using a higher quality survey as the reference (benchmark) survey. One statistical method is a propensity score strategy. By concatenating the web survey and reference survey and applying a propensity score model to the combined data, the odds of being in the web survey is estimated by conditioning on selected covariates. For the variance estimation of adjusted estimates, typical Taylor-series or Jackknife variance estimators, based only on the web survey, underestimate the variance since the estimators ignore the variance components due to sampling variation in the reference survey.
To consider the sampling variation in the reference survey, we develop a Jackknife variance estimator for ad
Keywords
Variance
Complex Sample
Jackknife
Probability sampling has served as the major approach for finite population inference for decades. In the era of big data, nonprobability samples become popular for their feasibility and cost-effectiveness. However, without a known inclusion mechanism, nonprobability samples fail to represent the target population unless appropriate adjustments are made. To leverage the strengths of both sources, we develop a data integration method of probability and nonprobability samples when the variable of interest is observed in both samples. The proposed optimal estimator exhibits efficiency over estimators from either sample. The method also accommodates informative selection of the nonprobability sample and ignorable nonresponse within the probability sample. We implement the method to analyze blood pressure data of US children and adolescents from the National Health and Nutrition Examination Survey (NHANES) and well-child visits throughout the Geisinger Health System. Replication method is used in variance estimation to account for the complex probability survey design of NHANES.
Keywords
Nonprobability sample
Probability sample
Informative sampling
Missing at random
Variance estimation
NHANES
This report explores the differences for seven national health estimates from a web-based survey, the third round of the Research and Development Survey (RANDS 3, n=2,616), and an in-person survey, the 2019 National Health Interview Survey (2019 NHIS, n=31,997). The five physical health variables include ever diagnosed by a physician or other medical professional with asthma, diabetes, high blood pressure or hypertension, high cholesterol, and chronic obstructive pulmonary disease (COPD). The two mental health variables are major depressive disorder (depression) and generalized anxiety disorder (GAD). The statistical analysis included two main components: 1) comparing weighted estimates by data source and conducting Rao-Scott significance testing to detect initial evidence of significant differences by data source, and 2) building logistic regression models for each health outcome, and conducting Wald tests to determine statistical significance of interaction terms. The results find the estimates from the web survey are consistently higher than the in-person survey. One possible explanation is the web survey is less subject to social desirability bias.
Keywords
web survey
face-to-face survey
total survey error
secondary data analysis
significance testing