False Discovery Rate in Large-Scale Data Error Localization
Paul Smith
Co-Author
University of Maryland (retired)
Tuesday, Aug 6: 9:20 AM - 9:35 AM
2395
Contributed Papers
Oregon Convention Center
Statistical data editing means identifying potential response errors in the data. Data editing is subject to two types of errors: labeling a correct observation as erroneous and not identifying an incorrect value. There is no statistical criterion to decide how many observations should be edited. Over-editing can increase data errors, degrade data quality, change the data structure and increase costs. Error localization consists of separate tests on each observation, where the null hypothesis states that the observation is error free and the alternative states that the observation is erroneous. The False Discovery Rate (FDR) is the fraction of false-positive findings among those deemed to be erroneous. Because FDR control is related to the number of edited observations, imposing an FDR requirement specifies the number of outliers to be edited, thereby controlling overediting. In this presentation we apply FDR theory to error localization and verify the theory on simulated data.
data editing
response errors
over-editing
multiple hypothesis tests
periodic surveys
Main Sponsor
Survey Research Methods Section
You have unsaved changes.