Sample Design and Non-response Modeling

Hans Kiesl Chair
Regensburg University of Applied Sciences
 
Tuesday, Aug 5: 2:00 PM - 3:50 PM
4136 
Contributed Papers 
Music City Center 
Room: CC-205C 

Main Sponsor

Survey Research Methods Section

Presentations

WITHDRAWN Modeling Non-Response in the National Agricultural Classification Survey

The USDA's National Agricultural Statistics Service (NASS) is committed to ensuring comprehensive coverage and representation of farms across the United States by maintaining a list frame of all known and potential U.S. farms. This comprehensive database serves as the foundation for data collection for agricultural surveys and censuses. A key tool in updating and refining the list frame is the National Agricultural Classification Survey (NACS). NACS is conducted in four phases leading up to the quinquennial Census of Agriculture (COA) (conducted in years ending in 2 and 7). NACS evaluates whether operations have agricultural activity and, if eligible, NASS adds them to the Census Mailing List (CML). However, budget constraints and rising nonresponse rates challenge the accuracy and representativeness of the NASS list frame. This study addresses these challenges by analyzing the integration of data from the most recent phase of NACS with both administrative and auxiliary data sources from the American Community Survey (ACS). The findings aim to inform strategies to enhance the list frame, improve sampling efficiency, and optimize resource allocation. 

Keywords

Machine Learning

Non-response

USDA 

Co-Author(s)

Robert Emmet
Darcy Miller, USDA/NASS

First Author

Kenzhane Pantin

Multivariate Bernoulli-Based Sampling Method for Multi-label Data with Application to Meta-Research

In real-world applications, datasets may contain observations with multiple labels that are not necessarily mutually exclusive. Sampling methods therefore require accounting for label dependencies. We propose a novel sampling algorithm designed for multi-label datasets. Our algorithm uses the observed label frequencies to estimate the parameters of a multivariate Bernoulli distribution. By adopting optimization constrained to the target distribution, we calculated the weights of each combination of labels. This approach ensures that after weighted sampling, our sub-sample acquires the characteristics of the target distribution while accounting for the label dependencies. Our use case included a broad sample of research articles from Scopus labeled with 66 biomedical topic categories, with an imbalanced distribution typical of multi-label data. We needed to sample from the literature in a way that 1) preserved category frequency order, 2) decreased the differences in frequency of the most to least categories, and 3) accounted for the category dependencies. With this approach, we produced a more balanced sub-sample, thereby enhancing the representation of minority categories. 

Keywords

Multivariate Bernoulli Distribution

Constrained optimization

Weighted Sampling 

Co-Author(s)

Colby Vorland
Donna Maney, Emory University, Dept. Psychology
Andrew Brown, University of Arkansas for Medical Sciences

First Author

Simon Chung, University of Arkansas for Medical Sciences, Department of Biostatistics

Presenting Author

Simon Chung, University of Arkansas for Medical Sciences, Department of Biostatistics

WITHDRAWN: Optimizing Sample Coordination with Multiple Measures of Size

We consider model-based optimal sampling designs for multipurpose surveys with multiple measures of size when coordinating samples among multiple surveys. The problem is motivated by crop surveys conducted by the United States National Agricultural Statistics Service (NASS), in which estimates of interest include planted and harvested acres of different crops as well as crop yields, and historical acreages are available on the frame as measures of size. Further, there is a need to coordinate three disjoint samples to minimize respondent burden. We use a subframe design to coordinate samples paired with convex optimization to find the inclusion probabilities that minimize expected sample size subject to target precision requirements for different study variables, along with other inequality constraints to select disjoint samples for multiple surveys. The precision requirements are computed as anticipated coefficients of variation under models relating study variables to frame measures of size. 

Keywords

Sample Coodination

Optimal Sample Designs

Balanced Sampling

Establishment Surveys 

Co-Author

F. Jay Breidt, NORC at The University of Chicago

First Author

Benjamin Reist, NORC at The University of Chicago

Presenting Author

Benjamin Reist, NORC at The University of Chicago

Optimizing Data Collection Interventions to Balance Cost and Quality in a Sequential Multimode Surve

Responsive and adaptive designs have emerged as a framework for targeting and reallocating resources during the data collection period in order to improve survey data collection efficiency. Here, we report on the implementation and evaluation of a responsive design experiment in the National Survey of College Graduates that optimizes the cost-quality tradeoff by minimizing a function of data collection costs and the root mean squared error of a key survey measure, self-reported salary. At three points during the data collection process, we predict outcomes and costs for remaining non-respondents and combine with data from respondents to optimize effort on remaining cases with respect to cost and root mean squared error (RMSE) of mean self-reported salary This process allowed us to reduce data collection costs by nearly 10%, without a statistically or practically significant increase in the RMSE of mean salary or decrease in the unweighted response rate. This experiment demonstrates the potential for these types of designs to more effectively target data collection resources in order to reach survey quality goals. 

Keywords

Responsive design

National Survey of College Graduates

Posterior predictive distribution 

Co-Author

Stephanie Coffey, US Census Bureau

First Author

Michael Elliott, University of Michigan

Presenting Author

Michael Elliott, University of Michigan