WITHDRAWN - Determining Edit Limits for an Agricultural Survey Using Outlier Detection Methods

Megan Lipke Co-Author
USDA/NASS
 
Darcy Miller Co-Author
USDA/NASS
 
Luca Sartore Co-Author
National Institute of Statistical Sciences
 
Kay Lee Turner Co-Author
USDA/National Agricultural Stats Service
 
Denise Abreu Co-Author
USDA/NASS
 
Yumiko Siegfried First Author
USDA/NASS
 
Thursday, Aug 7: 11:55 AM - 12:05 PM
1233 
Contributed Papers 
Music City Center 
The United States Department of Agriculture's (USDA's) National Agricultural Statistics Service (NASS) conducts hundreds of surveys each year. Many of these surveys rely on pre-assigned upper and lower limits to identify questionable reported values that may require editing. The limits are currently assigned manually by subject matter experts, and values outside of the limit range are flagged for editing. NASS has developed a system to minimize manual editing by automating most editing and imputation actions. Recent research focuses on evaluating several outlier detection methods to create hard edit limits using data-driven methods. The resulting limits must identify extreme anomalies to be successively corrected in automated edits. This paper evaluates four outlier detection methods – Cook's Distance, local outlier probability (LoOP), isolation forest (IF), and FuzzyHRT (historical, relational, and tail anomalies). Their possible application in determining edit limits using a case study from the Agricultural Production Survey is explored. We summarize the characteristics of each method, review the current edit limits, present application results, and discuss next steps.

Keywords

Outlier detection

Automatic editing

Imputation

Machine learning

Survey modernization 

Main Sponsor

Survey Research Methods Section