Cleaning Products for Your Data: Four Studies in Editing and Imputation
Abstract Number:
1780
Submission Type:
Topic-Contributed Paper Session
Participants:
Darcy Miller (1), Luca Sartore (2), Megan Lipke (1), Luca Sartore (2), Leslie Wallace (3), Arthur Rosales (4), Katherine Thompson (5), Gunnar Ingle (6)
Institutions:
(1) USDA/NASS, N/A, (2) National Institute of Statistical Sciences, N/A, (3) Westat, N/A, (4) N/A, N/A, (5) US Census Bureau, N/A, (6) Summit, N/A
Chair:
Co-Organizer:
Discussant:
Session Organizer:
Speaker(s):
Session Description:
The term GIGO (garbage in, garbage out) is a popular saying in statistical practice. Handling dirty (missing or erroneous) data is now considered more an art than a science. Established theory provides basic tools that need to be engineered for specific cases according to the data and operational constraints. In this session, four statisticians from varied backgrounds will present their approaches for dealing with dirty data. Two federal statisticians working on surveys will discuss updates being considered for establishment surveys. One federal statistician will present results from a nearest neighbor application to a survey of business establishments. The other federal statistician will reveal results from a machine learning approach leveraging remotely sensed data to impute an area frame survey of farming operations. The other two statisticians from the private sector will present methods used to identify errors and make corrections for education assessment and agricultural survey topics. One will present findings in developing a generalized system for editing and imputation of complex surveys. The other will present strategies used for the identification and correction of errors during the data collection process. A statistician from the federal government heavily involved in current editing and imputation projects will provide a discussion. All speakers will display their blueprints and tools considered for cleaning their dirty data.
Sponsors:
Government Statistics Section 2
Section on Statistical Learning and Data Science 3
Survey Research Methods Section 1
Theme:
Statistics and Data Science: Informing Policy and Countering Misinformation
Yes
Applied
Yes
Estimated Audience Size
Medium (80-150)
I have read and understand that JSM participants must abide by the Participant Guidelines.
Yes
I understand and have communicated to my proposed speakers that JSM participants must register and pay the appropriate registration fee by June 1, 2024. The registration fee is nonrefundable.
I understand
You have unsaved changes.