Thursday, Aug 8: 10:30 AM - 12:20 PM
5203
Contributed Papers
Oregon Convention Center
Room: CC-D138
Main Sponsor
Survey Research Methods Section
Presentations
Meta-analysis, a statistical method for synthesizing effect sizes across studies, aims to provide a robust summary of evidence that can facilitate better evidence-based decision-making by policy-makers or practitioners. However, common meta-analytic visualizations such as forest plots rely on statistical conventions that may be unfamiliar to decision-makers. We argue that incorporating empirical evidence from cognitive science into the design of meta-analytic visualizations will improve lay persons' comprehension of the evidence. Prior work showed that an alternative design - the Meta-Analytic Rain Cloud (MARC) plot - is more effective than existing visualizations for communicating to lay audiences, at least for small meta-analyses (k = 5 studies). The present research proposes adjustments to the MARC plot and conducts a statistical cognition experiment to examine whether the advantages of the MARC plot persist in larger meta-analyses (k = 10, 20, 50, 100). Of all visualization types, the adjusted MARC plot had the best performance, offering a 1.03 sd improvement over a bar plot and a 0.36 sd improvement over a forest plot (p < 0.05, adjusting for multiple comparisons).
Keywords
meta-analysis
data visualization
survey experiment
statistical cognition
Household surveys administered by the U.S. Census Bureau, including the National Health Interview Survey, are stratified by state yet do not provide reliable survey estimates for many individual states. Bayesian hierarchical models can stabilize small area estimates using auxiliary data, yet they are still subject to model bias and misspecification. Thus, it is desirable to employ design-based methods that improve both the underlying sampling variance and their corresponding estimators.
The lack of adequate sample sizes and PSUs for general state-level estimation stems, in part, from the high cost of in-person recruitment of sampled households. To mitigate these costs, PSUs are fixed for ten-year periods. A by-product of this design is that combining three years of cross-sectional survey data ostensibly to improve precision by tripling state sample sizes has lower inferential benefits compared to independent annual PSU samples. Furthermore, spatial clustering abates the amount of travel for an interviewer in exchange for a higher design effect. We show potential gains in reliability under three-year pooled survey data with relatively cost-effective changes to the sample design.
Keywords
Survey Methods, Cross-sectional data.
Design effect, Variance Estimation, Taylor-Series
Sample Design, Clustering, Intraclass Correlation
Small Area Estimation
Design-based methods
Markov Chain methods
In many applications, population auxiliary variables and predictive models can be used to increase the precision and accuracy of survey estimates. We propose a new model-assisted approach that makes it possible incorporate model predictions into survey estimation to improve precision, while maintaining the unbiasedness property of the Horvitz-Thompson estimator. Our method allows for any prediction function or machine learning algorithm to be used to predict the response for out-of-sample observations. The unbiasedness property remains fully design-based and does not require the validity of the prediction model.
Keywords
model-assisted inference
survey estimation
auxiliary data
finite population inference
machine learning
regression
In response to the 2008 Gulf Hypoxia Action Plan, the Iowa Department of Agriculture partnered with the Iowa State University to create the Iowa Nutrient Reduction Strategy (INRS) to assess and reduce nutrient loadings in Iowa waters and the Gulf of Mexico. In this presentation we will discuss how survey sampling methodologies can be used efficiently to reach such goal of nutrient reduction plans based on the finite sample data collected in Iowa. The two-stage procedure employed initiates with a sampling methodology of randomly selecting 150 individuals out of 580 agricultural retailers across all eight MLRAs(Major Land Resource Areas) using the Local Pivotal Method(LPM) to ensure a balanced sampling scheme. Once data are collected, the average, as well as the standard errors for the percentage of total land where different categories of nutrients are used, is also being extrapolated to the entire state to help policymakers chalk out their nutrient-reduction plans for the succeeding crop years using stratified sampling. To wrap up, We will present some of the findings regarding land use for different nutrients for all these years, and their uncertainty estimates for visualization.
Keywords
Nutrient reduction plans
Balanced sampling
Local Pivotal Method
Stratified sampling
The Current Population Survey (CPS) uses a two-stage sample design, in which a group of counties called primary sampling units (PSUs) are selected in the first stage and housing units from those selected counties are selected in the second stage. After every decennial census, the CPS redesigns its sample to make it more relevant to the current decade. PSUs are classified as either self-representing (SR) or non-self-representing (NSR). SR PSUs are included in sample with certainty while NSR PSUs are clustered into strata with one PSU sampled per stratum. In this redesign, we sought a NSR PSU stratification to improve variance of auxiliary measures of childhood poverty and American Indian and Alaska Native (AIAN) people with minimal impact to PSU workload variance and unemployment variance. This talk describes the process and some summary results.
Keywords
Sample Design
Stratification
Releasing public-use micro-level data files from health surveys holds immense value for science and health policy. However, even after removing personally identifying information, the privacy of survey respondents may still be compromised. Using a large NYC population-representative health survey (n=10,271), we identified high-risk observations based on population estimates through a combination of key variables. We compared three different solutions to mitigate the risk of re-identification – suppression, synthesis using Classification and Regression Trees, and synthesis via Bayesian models – and assess their impact on both risk and loss of utility of the resulting protected data. While both synthesis methods resulted in slightly higher disclosure risks compared to the suppression method, the synthetic datasets preserved a higher level of utility. We will discuss our proposed solutions to avoid over-protecting and potentially obscuring estimates for underserved and vulnerable groups and share our experiences with data curators in advancing disclosure risk controls and data sharing in public health.
Keywords
Health Surveys
Data Privacy Risk
Synthetic Data
Survey Research Methods
Government Statistics
The Census Bureau adopted differential privacy (DP) as implemented through the TopDown Algorithm (TDA) for the 2020 Decennial Census in order to protect respondent confidentiality. Though the variances of the additive DP noise are publicly available, the impacts of postprocessing in the TDA to ensure various quality metrics, such as hierarchical consistency and non-negativity are met are less easily quantified as the unprotected counts are not publicly available for 2020 data. In this work, we investigate the use of a small area estimation approach to strengthen estimates of variability obtained using the 2010 demonstration products, as compared to the official 2010 redistricting file. We propose using a grouping of similar geographies to obtain estimates of variance from the 2010 data, and to incorporate these updated variance estimates to improve the estimates for 2020.
Keywords
Small Area Estimation
Differential Privacy
Generalized variance function
Abstracts