False Discovery Rate in Large-Scale Data Error Localization

Abstract Number:

2395 

Submission Type:

Contributed Abstract 

Contributed Abstract Type:

Paper 

Participants:

Chin-Fang Weng (1), Eric Slud (2)

Institutions:

(1) US Census Bureau, Washington D.C., (2) U. S. Census Bureau, MD

Co-Author:

Eric Slud  
U. S. Census Bureau

First Author:

Chin-Fang Weng  
US Census Bureau

Presenting Author:

Chin-Fang Weng  
US Census Bureau

Abstract Text:

Statistical data editing means identifying potential response errors in the data. Data editing is subject to two types of errors: labeling a correct observation as erroneous and not identifying an incorrect value. There is no statistical criterion to decide how many observations should be edited. Over-editing can increase data errors, degrade data quality, change the data structure and increase costs. Error localization consists of separate tests on each observation, where the null hypothesis states that the observation is error free and the alternative states that the observation is erroneous. The False Discovery Rate (FDR) is the fraction of false-positive findings among those deemed to be erroneous. Because FDR control is related to the number of edited observations, imposing an FDR requirement specifies the number of outliers to be edited, thereby controlling overediting. In this presentation we apply FDR theory to error localization and verify the theory on simulated data and on a real world data set.

Keywords:

data editing |response errors |over-editing |multiple hypothesis tests |periodic surveys|

Sponsors:

Survey Research Methods Section

Tracks:

Data Analysis/Modeling

Can this be considered for alternate subtype?

No

Are you interested in volunteering to serve as a session chair?

No

I have read and understand that JSM participants must abide by the Participant Guidelines.

Yes

I understand that JSM participants must register and pay the appropriate registration fee by June 1, 2024. The registration fee is non-refundable.

I understand