Print Close

WITHDRAWN - Determining Edit Limits for an Agricultural Survey Using Outlier Detection Methods

Presented During: Missing Data, Outlier Detection, and Confidentiality

Megan Lipke Co-Author
USDA/NASS

Darcy Miller Co-Author
USDA/NASS

Luca Sartore Co-Author
National Institute of Statistical Sciences

Kay Lee Turner Co-Author
USDA/National Agricultural Stats Service

Denise Abreu Co-Author
USDA/NASS

Yumiko Siegfried First Author
USDA/NASS

Thursday, Aug 7: 11:55 AM - 12:05 PM
1233
Contributed Papers

Music City Center

The United States Department of Agriculture's (USDA's) National Agricultural Statistics Service (NASS) conducts hundreds of surveys each year. Many of these surveys rely on pre-assigned upper and lower limits to identify questionable reported values that may require editing. The limits are currently assigned manually by subject matter experts, and values outside of the limit range are flagged for editing. NASS has developed a system to minimize manual editing by automating most editing and imputation actions. Recent research focuses on evaluating several outlier detection methods to create hard edit limits using data-driven methods. The resulting limits must identify extreme anomalies to be successively corrected in automated edits. This paper evaluates four outlier detection methods – Cook's Distance, local outlier probability (LoOP), isolation forest (IF), and FuzzyHRT (historical, relational, and tail anomalies). Their possible application in determining edit limits using a case study from the Agricultural Production Survey is explored. We summarize the characteristics of each method, review the current edit limits, present application results, and discuss next steps.

Keywords

Outlier detection

Automatic editing

Imputation

Machine learning

Survey modernization

Main Sponsor

Survey Research Methods Section