Cleaning Products for Your Data: Four Studies in Editing and Imputation

Abstract Number:

1780 

Submission Type:

Topic-Contributed Paper Session 

Participants:

Darcy Miller (1), Luca Sartore (2), Megan Lipke (1), Luca Sartore (2), Leslie Wallace (3), Arthur Rosales (4), Katherine Thompson (5), Gunnar Ingle (6)

Institutions:

(1) USDA/NASS, N/A, (2) National Institute of Statistical Sciences, N/A, (3) Westat, N/A, (4) N/A, N/A, (5) US Census Bureau, N/A, (6) Summit, N/A

Chair:

Luca Sartore  
National Institute of Statistical Sciences

Co-Organizer:

Luca Sartore  
National Institute of Statistical Sciences

Discussant:

Megan Lipke  
USDA/NASS

Session Organizer:

Darcy Miller  
USDA/NASS

Speaker(s):

Leslie Wallace  
Westat
Arthur Rosales  
N/A
Katherine Thompson  
US Census Bureau
Gunnar Ingle  
Summit

Session Description:

The term GIGO (garbage in, garbage out) is a popular saying in statistical practice. Handling dirty (missing or erroneous) data is now considered more an art than a science. Established theory provides basic tools that need to be engineered for specific cases according to the data and operational constraints. In this session, four statisticians from varied backgrounds will present their approaches for dealing with dirty data. Two federal statisticians working on surveys will discuss updates being considered for establishment surveys. One federal statistician will present results from a nearest neighbor application to a survey of business establishments. The other federal statistician will reveal results from a machine learning approach leveraging remotely sensed data to impute an area frame survey of farming operations. The other two statisticians from the private sector will present methods used to identify errors and make corrections for education assessment and agricultural survey topics. One will present findings in developing a generalized system for editing and imputation of complex surveys. The other will present strategies used for the identification and correction of errors during the data collection process. A statistician from the federal government heavily involved in current editing and imputation projects will provide a discussion. All speakers will display their blueprints and tools considered for cleaning their dirty data.

Sponsors:

Government Statistics Section 2
Section on Statistical Learning and Data Science 3
Survey Research Methods Section 1

Theme: Statistics and Data Science: Informing Policy and Countering Misinformation

Yes

Applied

Yes

Estimated Audience Size

Medium (80-150)

I have read and understand that JSM participants must abide by the Participant Guidelines.

Yes

I understand and have communicated to my proposed speakers that JSM participants must register and pay the appropriate registration fee by June 1, 2024. The registration fee is nonrefundable.

I understand