False Discovery Rate in Large-Scale Data Error Localization
Abstract Number:
2395
Submission Type:
Contributed Abstract
Contributed Abstract Type:
Paper
Participants:
Chin-Fang Weng (1), Eric Slud (2)
Institutions:
(1) US Census Bureau, Washington D.C., (2) U. S. Census Bureau, MD
Co-Author:
First Author:
Presenting Author:
Abstract Text:
Statistical data editing means identifying potential response errors in the data. Data editing is subject to two types of errors: labeling a correct observation as erroneous and not identifying an incorrect value. There is no statistical criterion to decide how many observations should be edited. Over-editing can increase data errors, degrade data quality, change the data structure and increase costs. Error localization consists of separate tests on each observation, where the null hypothesis states that the observation is error free and the alternative states that the observation is erroneous. The False Discovery Rate (FDR) is the fraction of false-positive findings among those deemed to be erroneous. Because FDR control is related to the number of edited observations, imposing an FDR requirement specifies the number of outliers to be edited, thereby controlling overediting. In this presentation we apply FDR theory to error localization and verify the theory on simulated data and on a real world data set.
Keywords:
data editing |response errors |over-editing |multiple hypothesis tests |periodic surveys|
Sponsors:
Survey Research Methods Section
Tracks:
Data Analysis/Modeling
Can this be considered for alternate subtype?
No
Are you interested in volunteering to serve as a session chair?
No
I have read and understand that JSM participants must abide by the Participant Guidelines.
Yes
I understand that JSM participants must register and pay the appropriate registration fee by June 1, 2024. The registration fee is non-refundable.
I understand
You have unsaved changes.