Survey Data Collection, Estimation, and Disclosure Limitation Methods

David Kinyon Chair
Energy Information Administration
 
Thursday, Aug 8: 10:30 AM - 12:20 PM
5203 
Contributed Papers 
Oregon Convention Center 
Room: CC-D138 

Main Sponsor

Survey Research Methods Section

Presentations

Effective data visualizations for large meta-analyses: Evidence from a randomized survey experiment

Meta-analysis, a statistical method for synthesizing effect sizes across studies, aims to provide a robust summary of evidence that can facilitate better evidence-based decision-making by policy-makers or practitioners. However, common meta-analytic visualizations such as forest plots rely on statistical conventions that may be unfamiliar to decision-makers. We argue that incorporating empirical evidence from cognitive science into the design of meta-analytic visualizations will improve lay persons' comprehension of the evidence. Prior work showed that an alternative design - the Meta-Analytic Rain Cloud (MARC) plot - is more effective than existing visualizations for communicating to lay audiences, at least for small meta-analyses (k = 5 studies). The present research proposes adjustments to the MARC plot and conducts a statistical cognition experiment to examine whether the advantages of the MARC plot persist in larger meta-analyses (k = 10, 20, 50, 100). Of all visualization types, the adjusted MARC plot had the best performance, offering a 1.03 sd improvement over a bar plot and a 0.36 sd improvement over a forest plot (p < 0.05, adjusting for multiple comparisons). 

Keywords

meta-analysis

data visualization

survey experiment

statistical cognition 

View Abstract 3190

Co-Author(s)

Kaitlyn Fitzgerald, Azusa Pacific University
Avery Charles, Azusa Pacific University

First Author

David Khella

Presenting Author

David Khella

Design-Based Methods for State-Level Survey Estimation under Three-year Data Pooling

Household surveys administered by the U.S. Census Bureau, including the National Health Interview Survey, are stratified by state yet do not provide reliable survey estimates for many individual states. Bayesian hierarchical models can stabilize small area estimates using auxiliary data, yet they are still subject to model bias and misspecification. Thus, it is desirable to employ design-based methods that improve both the underlying sampling variance and their corresponding estimators.

The lack of adequate sample sizes and PSUs for general state-level estimation stems, in part, from the high cost of in-person recruitment of sampled households. To mitigate these costs, PSUs are fixed for ten-year periods. A by-product of this design is that combining three years of cross-sectional survey data ostensibly to improve precision by tripling state sample sizes has lower inferential benefits compared to independent annual PSU samples. Furthermore, spatial clustering abates the amount of travel for an interviewer in exchange for a higher design effect. We show potential gains in reliability under three-year pooled survey data with relatively cost-effective changes to the sample design. 

Keywords

Survey Methods, Cross-sectional data.

Design effect, Variance Estimation, Taylor-Series

Sample Design, Clustering, Intraclass Correlation

Small Area Estimation

Design-based methods

Markov Chain methods 

View Abstract 2379

First Author

William Waldron, National Center for Health Statistics

Presenting Author

William Waldron, National Center for Health Statistics

Unbiased Survey Estimation with Population Auxiliary Variables

In many applications, population auxiliary variables and predictive models can be used to increase the precision and accuracy of survey estimates. We propose a new model-assisted approach that makes it possible incorporate model predictions into survey estimation to improve precision, while maintaining the unbiasedness property of the Horvitz-Thompson estimator. Our method allows for any prediction function or machine learning algorithm to be used to predict the response for out-of-sample observations. The unbiasedness property remains fully design-based and does not require the validity of the prediction model. 

Keywords

model-assisted inference

survey estimation

auxiliary data

finite population inference

machine learning

regression 

View Abstract 3571

Co-Author(s)

Johann Gagnon-Bartsch
Jaylin Lowe, University of Michigan
James Green, Westat

First Author

Robyn Ferg, Westat

Presenting Author

Robyn Ferg, Westat

How to Collect Data on Agricultural Nutrient Management Practices: Survey Results from Iowa

In response to the 2008 Gulf Hypoxia Action Plan, the Iowa Department of Agriculture partnered with the Iowa State University to create the Iowa Nutrient Reduction Strategy (INRS) to assess and reduce nutrient loadings in Iowa waters and the Gulf of Mexico. In this presentation we will discuss how survey sampling methodologies can be used efficiently to reach such goal of nutrient reduction plans based on the finite sample data collected in Iowa. The two-stage procedure employed initiates with a sampling methodology of randomly selecting 150 individuals out of 580 agricultural retailers across all eight MLRAs(Major Land Resource Areas) using the Local Pivotal Method(LPM) to ensure a balanced sampling scheme. Once data are collected, the average, as well as the standard errors for the percentage of total land where different categories of nutrients are used, is also being extrapolated to the entire state to help policymakers chalk out their nutrient-reduction plans for the succeeding crop years using stratified sampling. To wrap up, We will present some of the findings regarding land use for different nutrients for all these years, and their uncertainty estimates for visualization. 

Keywords

Nutrient reduction plans

Balanced sampling

Local Pivotal Method

Stratified sampling 

View Abstract 2295

Co-Author(s)

Kunal Das, Iowa State Univ
Rob Davis, Iowa State University
Matthew Helmers, Iowa State University
Ben Gleason, Iowa Nutrient Research and Education Council
Isenhert Thomas, Iowa State University

First Author

Zhengyuan Zhu, Iowa State University

Presenting Author

Kunal Das, Iowa State Univ

Primary Sampling Unit Stratification for the Current Population Survey 2020 Sample Redesign

The Current Population Survey (CPS) uses a two-stage sample design, in which a group of counties called primary sampling units (PSUs) are selected in the first stage and housing units from those selected counties are selected in the second stage. After every decennial census, the CPS redesigns its sample to make it more relevant to the current decade. PSUs are classified as either self-representing (SR) or non-self-representing (NSR). SR PSUs are included in sample with certainty while NSR PSUs are clustered into strata with one PSU sampled per stratum. In this redesign, we sought a NSR PSU stratification to improve variance of auxiliary measures of childhood poverty and American Indian and Alaska Native (AIAN) people with minimal impact to PSU workload variance and unemployment variance. This talk describes the process and some summary results. 

Keywords

Sample Design

Stratification 

View Abstract 2355

Co-Author(s)

Yarissa Gonzalez, US Census Bureau
Timothy Trudell

First Author

Brian Shaffer

Presenting Author

Brian Shaffer

Evaluating the Disclosure Risk and Analytic Utility of Synthetic Data in a Municipal Health Survey

Releasing public-use micro-level data files from health surveys holds immense value for science and health policy. However, even after removing personally identifying information, the privacy of survey respondents may still be compromised. Using a large NYC population-representative health survey (n=10,271), we identified high-risk observations based on population estimates through a combination of key variables. We compared three different solutions to mitigate the risk of re-identification – suppression, synthesis using Classification and Regression Trees, and synthesis via Bayesian models – and assess their impact on both risk and loss of utility of the resulting protected data. While both synthesis methods resulted in slightly higher disclosure risks compared to the suppression method, the synthetic datasets preserved a higher level of utility. We will discuss our proposed solutions to avoid over-protecting and potentially obscuring estimates for underserved and vulnerable groups and share our experiences with data curators in advancing disclosure risk controls and data sharing in public health. 

Keywords

Health Surveys

Data Privacy Risk

Synthetic Data

Survey Research Methods

Government Statistics 

View Abstract 3545

Co-Author(s)

Wen Qin Deng, NYC Department of Health and Mental Hygiene
Jingchen Hu, Vassar College
Tashema Bholanath, NYC Department of Health and Mental Hygiene
Fangtao He, NYC Department of Health and Mental Hygiene
Nneka Lundy De La Cruz, NYC Department of Health and Mental Hygiene

First Author

Stephen Immerwahr, NYC Department of Health and Mental Hygiene

Presenting Author

Stephen Immerwahr, NYC Department of Health and Mental Hygiene

Small Area Modeling for Differentially Private Counts

The Census Bureau adopted differential privacy (DP) as implemented through the TopDown Algorithm (TDA) for the 2020 Decennial Census in order to protect respondent confidentiality. Though the variances of the additive DP noise are publicly available, the impacts of postprocessing in the TDA to ensure various quality metrics, such as hierarchical consistency and non-negativity are met are less easily quantified as the unprotected counts are not publicly available for 2020 data. In this work, we investigate the use of a small area estimation approach to strengthen estimates of variability obtained using the 2010 demonstration products, as compared to the official 2010 redistricting file. We propose using a grouping of similar geographies to obtain estimates of variance from the 2010 data, and to incorporate these updated variance estimates to improve the estimates for 2020. 

Keywords

Small Area Estimation

Differential Privacy

Generalized variance function 

Abstracts


First Author

Kyle Irimata

Presenting Author

Kyle Irimata