Non-probability samples, administrative records, and data fusion

Cynthia Bland Chair
RTI International
 
Sunday, Aug 3: 2:00 PM - 3:50 PM
4011 
Contributed Papers 
Music City Center 
Room: CC-210 

Main Sponsor

Survey Research Methods Section

Presentations

Doubly Robust Quantile Estimation for Finite Populations with Non-Probability Samples

The growing use of non-probability samples in survey research highlights the need for robust methods to control selection bias. Quantiles capture distributional characteristics that mean-based analyses often overlook, yet existing methods primarily focus on means or totals, leaving a gap in rigorous quantile estimation. Current bias mitigation approaches rely on model-based frameworks, which can degrade in performance when misspecified. This paper introduces a doubly robust quantile estimator that is asymptotically unbiased under misspecification of either the outcome or selection model. Our method constructs a robust distribution function and is evaluated through simulations and an application to the Korean Household Financial Welfare Survey. Unlike existing approaches, this is new doubly robust estimator designed for distribution functions. Resampling techniques are employed to construct confidence intervals. Results confirm its effectiveness for quantile estimation in non-probability sample surveys. 

Keywords

non-probability sample

quantile

doubly robust estimator

difference estimator

data integration 

Co-Author(s)

Dongmin Jang, University of Seoul
Kyu-Seong Kim, University of Seoul

First Author

Soonpil Kwon, Statistics Korea & University of Seoul

Presenting Author

Dongmin Jang, University of Seoul

Prevalence Estimation for Nonprobability Samples: Link-Tracing and Respondent-Driven Samplings

Link-Tracing Sampling (LTS) methods are commonly used to estimate the prevalence of hidden populations. Respondent-Driven Sampling (RDS) is a variant of LTS that utilizes unique inference procedures for prevalence estimation. Challenges have been found with RDS in practice, such as long recruitment periods, homophily of samples, and violation of assumptions. The Vincent Link Tracing Sampling (VLTS), another variant of LTS, has been adopted as an alternative. However, little literature exists to guide prevalence estimation for VLTS but with the prevalence estimation procedure of RDS being applied crudely. Drawing on a study conducted in gold mining areas of the Kédougou district in Senegal that used RDS and VLTS methods to estimate prevalence of sex trafficking in 2021 and 2024, respectively, we study and compare the two methods. A survey of 561 respondents guided by RDS indicated 19% of women who engaged in commercial sex had experienced sex trafficking. Endline surveys of 850 respondents showed prevalence of 51%, a notable rise from baseline. We present reliable prevalence estimates for both methods, providing evidence for addressing the problem. 

Keywords

Link-Tracing Sampling

Respondent-Driven Sampling

Prevalence Estimation

Senegal

Nonprobability Samples

Hidden Population 

Co-Author(s)

David Okech, University of Georgia
Jody Clay-Warner, University of Georgia
Anne Waswa, Center on Human Trafficking Research and Outreach
Pedro Goulart

First Author

Hui Yi, University of Georgia

Presenting Author

Hui Yi, University of Georgia

Using the Directed Seed Method to Survey Venezuelan Refugees and Migrants in Colombia

Respondent-driven sampling (RDS) has been adopted worldwide as an important method for sampling vulnerable populations to enable vital decisions about resource allocation and program planning. However, RDS relies on many assumptions about the population and sample dynamics, and continued innovation is needed. We introduce the directed seed method as a modification to RDS where seeds work with an interviewer to enumerate their potential recruits across important characteristics using a diversity recruitment grid and are then instructed on which of these people to recruit. The directed seed method shows promise for enhancing the recruitment of additional diverse individuals and overcoming potential bottlenecks in the population network. We provide an example from surveys conducted among Venezuelan refugees and migrants in Colombia in 2020. We assess the method using existing and novel diagnostic tools, including visual and metric-based techniques like all-points plots, recruitment homophily, expected gain in wave 1 diverse participants, and threshold standard deviation of seed-based characteristic estimates. Finally, we discuss best practices for implementation and inference. 

Keywords

respondent-driven sampling

hard-to-reach population

migrants

social network analysis

hidden population

adaptive sampling 

First Author

Katherine McLaughlin, Oregon State University

Presenting Author

Katherine McLaughlin, Oregon State University

Data Fusion: Calculation of Feasible Correlations

I have described a method (2001, 2003, 2004, 2009, 2010) for merging two independent samples using data fusion (also known as statistical matching). One sample contains (X,Z) and the other contains (X,Y), both drawn from a common nonsingular normal (X,Y,Z) distribution. Following Kadane (1978) and Rubin (1986), I employ regression in my approach. I assess the uncertainty introduced during the merge that is due to the unobserved (Y,Z) relationship by repetition over a range of (Y,Z) values that are consistent with the observed data. An essential part of my algorithm is to add random residuals to the regression estimates. My initial approach for estimating the residual variance could fail (be negative) because it used subtraction of estimates from both files. An innovation due to Raessler and Kiesl (2009) give improved results for estimating the residual variance, solving one of the two open problems in the paradigm. The remaining open problem was determining the area of feasible correlations between Y and Z when both Y and Z are multivariate. Building on the foundation described by Kiesl and Raessler (2006), the solution to this problem is now known. 

Keywords

statistical matching 

First Author

Chris Moriarity

Presenting Author

Chris Moriarity