Wednesday, Aug 7: 8:30 AM - 10:20 AM
1780
Topic-Contributed Paper Session
Oregon Convention Center
Room: CC-253
Applied
Yes
Main Sponsor
Survey Research Methods Section
Co Sponsors
Government Statistics Section
Section on Statistical Learning and Data Science
Presentations
The June Area Survey (JAS) is an annual survey conducted by the United States (U.S.) Department of Agriculture's National Agricultural Statistics Service (NASS) to estimate crop acreages and to measure the coverage of the NASS list frame. The JAS is based on an area frame that offers complete coverage of the contiguous U.S. The design of the survey requires complete reports for all sampled tracts. Thus, the inevitable nonresponse in the survey must be addressed through observation of sampled areas or imputation. Time spent on these efforts is costly and the resulting data are less reliable than data obtained from full responses. Researchers at NASS have developed a new approach for integrating administrative data, geospatial data, and machine learning forecasting techniques to begin addressing nonresponse in the JAS with an automated imputation process. In this paper, the new process for automated imputation will be described, and the predicted impact on survey data quality is explored. Study results indicate that the automated imputation process produces estimates that are comparable to those produced using traditional methods.
Beginning in 2024, the economic directorate of the U.S. Census Bureau will the Annual Integrated Economic Survey (AIES), an economy wide survey that replaces a suite of seven independently designed ongoing surveys. The AIES requirements are informed by the user community's longstanding data needs (e.g. national and sub national tabulations), as well as by extensive respondent research on collection. This presentation provides a detailed overview of the nearest neighbor imputation methodology used for the establishment level collection of the survey. Throughout, I will highlight specific challenges of developing a viable imputation procedure for a new multi-purpose business survey whose collection covers a wide range of economic sectors.
The National Assessment of Educational Progress (NAEP) is a congressionally-mandated series of surveys measuring the proficiency of American students in a variety of academic subjects. Cooperating schools electronically submit lists of students in the target grade, from which sampled students are drawn. Schools store their data in different ways, so the incoming student lists must be standardized for use in NAEP. Each list submitter maps the columns in their file to specific NAEP fields and the values in each column to specific NAEP values. To ensure the quality of the student lists and the students' demographic data, data checks are run on each student list after the mapping work is done. The fields subject to these checks are student name, gender, student disability status, English learner status, race/ethnicity, school lunch eligibility status, grade, and month and year of birth. Some checks are straightforward, but others are more complex or involve statistical tests. This paper describes the types of data checks performed on the 11,500 student lists submitted for NAEP 2022 and presents results including the number of data check failures and false-positive rates by check type.
Each year the US Department of Agriculture's National Agricultural Statistics Service (NASS) conducts more than a hundred surveys to understand and enumerate agriculture in the United States. The quality of survey responses varies with survey and respondent. Ensuring that survey responses are valid, reliable, and internally consistent is vital to publishing accurate official statistics. NASS is undertaking modernization efforts to detect and edit survey responses through rule validation. These innovations include (1) a review and reconciliation of documented (e.g., written in business rules) and undocumented (e.g., only appearing in programming code) validation specifications, (2) distinguishing validation rules whose errors might be correctable with programming code or numeric methods, (3) using numeric methods, such as the Fellegi-Holt algorithm, and R software packages to automate response-level validation checks and error corrections, and (4) flagging instances of automated correction or validation errors for NASS analysts. This paper will describe the processes and procedures used for each step and highlight challenges and solutions to issues commonly encountered.