SRMS/GSS/SSS Student Paper Competition Winners

Kristen Olson Chair
University of Nebraska-Lincoln
 
Kristen Olson Organizer
University of Nebraska-Lincoln
 
Wednesday, Aug 6: 8:30 AM - 10:20 AM
0599 
Topic-Contributed Paper Session 
Music City Center 
Room: CC-202B 
This session contains the papers from the 5 winners of the joint SRMS/GSS/SSS Student Paper Competition. These five papers cover a wide range of topics relevant to survey, government, and social statistics.

Keywords

Student Paper Competition Winners

Small area estimation

Weighting adjustments

Causal inference

Selection models 

Applied

No

Main Sponsor

Survey Research Methods Section

Co Sponsors

Government Statistics Section
Social Statistics Section

Presentations

Causal inference and racial bias in policing: New estimands and the importance of mobility data

Studying racial bias in policing is a critically important problem, but one that comes with a number of inherent difficulties due to the nature of the available data. In this manuscript we tackle multiple key issues in the causal analysis of racial bias in policing. First, we formalize race and place policing, the idea that individuals of one race are policed differently when they are in neighborhoods primarily made up of individuals of other races. We develop an estimand to study this question rigorously, show the assumptions necessary for causal identification, and develop sensitivity analyses to assess robustness to violations of key assumptions. Additionally, we investigate difficulties with existing estimands targeting racial bias in policing. We show for these estimands, and the estimands developed in this manuscript, that estimation can benefit from incorporating mobility data into analyses. We apply these ideas to a study in New York City, where we find a large amount of racial bias, as well as race and place policing, and that these findings are robust to large violations of untestable assumptions. We additionally show that mobility data can make substantial impacts on the resulting estimates, suggesting it should be used whenever possible in subsequent studies. 

Keywords

Causal inference

Mobility data

Racial discrimination

Race and place

Sensitivity analysis 

Co-Author(s)

Brenden Beck, School of Criminal Justice, Rutgers University
Joseph Antonelli, University of Florida

Speaker

Zhuochao Huang, University of Florida

Echo State Networks for Spatio-Temporal Area-Level Data

Spatio-temporal area-level datasets play a critical role in official statistics, providing valuable insights for policy-making and regional planning. Accurate modeling and forecasting of these datasets can be extremely useful for policymakers to develop informed strategies for future planning. Echo State Networks (ESNs) are efficient methods for capturing nonlinear temporal dynamics and generating forecasts. However, ESNs lack a direct mechanism to account for the neighborhood structure inherent in area-level data. Ignoring these spatial relationships can significantly compromise the accuracy and utility of forecasts. In this paper, we incorporate approximate graph spectral filters at the input stage of the ESN, thereby improving forecast accuracy while preserving the model's computational efficiency during training. We demonstrate the effectiveness of our approach using Eurostat's tourism occupancy dataset and show how it can support more informed decision-making in policy and planning contexts. 

Keywords

Areal data

Echo State Network

Graph Convolutional Network

Survey 

Co-Author(s)

Christopher Wikle, University of Missouri
Scott Holan, University of Missouri/U.S. Census Bureau

Speaker

Zhenhua Wang

Formulating the Proxy Pattern-Mixture Model as a Selection Model to Assist with Sensitivity Analysis

Proxy pattern-mixture models (PPMM) have previously been proposed as a model-based framework for assessing the potential for nonignorable nonresponse in sample surveys and nonignorable selection in nonprobability samples. One defining feature of the PPMM is the single sensitivity parameter, ø, that ranges from 0 to 1 and governs the degree of departure from ignorability. While this sensitivity parameter is attractive in its simplicity, it may also be of interest to describe departures from ignorability in terms of how the odds of response (or selection) depend on the outcome being measured. In this paper, we re-express the PPMM as a selection model, using the known relationship between pattern-mixture models and selection models, in order to better understand the underlying assumptions of the PPMM and the implied effect of the outcome on nonresponse. The selection model that corresponds to the PPMM is a quadratic function of the survey outcome and proxy variable, and the magnitude of the effect depends on the value of the sensitivity parameter, ø (missingness/selection mechanism), the differences in the proxy means and standard deviations for the respondent and nonrespondent populations, and the strength of the proxy, ρ. Large values of ø (beyond 0.5) often result in unrealistic selection mechanisms, and the corresponding selection model can be used to establish more realistic bounds on ø. We illustrate the results using data from the U.S. Census Household Pulse Survey. 

Keywords

nonignorable nonresponse


nonignorable selection

Proxy pattern-mixture models

nonprobability samples

sensitivity analysis

nonresponse and selection bias 

Co-Author

Rebecca Andridge, The Ohio State University

Speaker

Seth Adarkwah Yiadom, The Ohio State University

Impact of existence and nonexistence of pivot on the coverage of empirical best linear prediction intervals for small areas

We advance the theory of parametric bootstrap in constructing highly efficient empirical best (EB) prediction intervals of small area means. The coverage error of such a prediction interval is of the order O(m−3/2), where m is the number of small areas to be pooled using a linear mixed normal model. In the context of an area level model where the random effects follow a non-normal known distribution except possibly for unknown hyperparameters, we analytically show that the order of coverage error of empirical best linear (EBL) prediction interval remains the same even if we relax the normality of the random effects by the existence of pivot for a suitably standardized random effects when hyperpameters are known. Recognizing the challenge of showing existence of a pivot, we develop a simple moment-based method to claim non-existence of pivot. We show that existing parametric bootstrap EBL prediction interval fails to achieve the desired order of the coverage error, i.e. O(m−3/2), in absence of a pivot. We obtain a surprising result that the order O(m−1) term is always positive under certain conditions indicating possible overcoverage of the existing parametric bootstrap EBL prediction interval. In general, we analytically show for the first time that the coverage problem can be corrected by adopting a suitably devised double parametric bootstrap. Our Monte Carlo simulations show that our proposed single bootstrap method performs
reasonably well when compared to rival methods. 

Keywords

Small area estimation

empirical Bayes

linear mixed model

best linear predictor 

Co-Author(s)

Yuting Chen, University of Maryland College Park
Masayo Hirose, Kyushu University, Institute of Mathematics for Industry
Partha Lahiri, University of Maryland-College Park

Speaker

Yuting Chen, University of Maryland College Park

Synthetic Sampling Weights for Volunteer-Based National Biobanks: A Case Study with the All of Us Research Program

While national biobanks are essential for advancing medical research, their nonprobability sampling designs limit their
representativeness of the target population. This paper proposes a method that leverages high-quality national surveys to create
synthetic sampling weights for non-probabilistic cohort studies, aiming to improve representativeness. Specifically, we focus on deriving more accurate base weights, which enhance calibration by meeting population constraints, and on automating data-supported selection of cross-tabulations for calibration. This approach combines a pseudo-design-based model with a novel Last-In-First-Out criterion, enhancing both the accuracy and stability of estimates. Extensive simulations demonstrate that our method, named nps-lifo-rake, reduces bias, improves efficiency, and strengthens inference compared to existing approaches. We apply the proposed method to the All of Us Research Program, leveraging data from the National Health Interview Survey 2020 and American Community Survey 2022, and compare the resulting prevalence estimates for common phenotypes against national benchmarks. The results underscore our method's ability to effectively reduce selection bias in non-probability samples, offering a valuable tool for enhancing biobank representativeness. Using the developed sampling weights for the All of Us Research Program, we can estimate the
United States population prevalence for phenotypes and genotypes not captured by national probability studies. 

Keywords

Calibration Weighting

Generalized Raking

Nested Propensity Score

Non-Probability

Prevalence

Sampling Design 

Co-Author(s)

Andrew Guide, Vanderbilt University Medical Center
Lina Sulieman, Department of Biomedical Informatics, Vanderbilt University
Robert Cronin, Department of Internal Medicine, The Ohio State University
Thomas Lumley, University of Auckland
Qingxia Chen, Vanderbilt University Medical Center

Speaker

Huiding Chen