WITHDRAWN: Sub-Sampling as Data Protection: A Case Study of Pew Research Center's Asian American Survey
Tuesday, Aug 5: 8:50 AM - 9:05 AM
2558
Contributed Papers
Music City Center
An anonymization challenge faced by many surveys is that that prior to, or even after, the process of anonymization organizations may release publications containing tables from raw data. These publications can undo data protections offered by Statistical Disclosure Limitation techniques, such as local suppression, since these tables can be used in subtraction attacks. We present a case study using Pew Research Center's Asian American Survey. Prior to releasing a Public Use File (PUF), Pew created many publications using raw Asian American Survey data. To create a PUF, we used further sub-sampling as our primary form of disclosure protection, since it would protect the PUF from subtraction attacks. This is effective because a data attacker would not expect a table coming from a sub-sampled PUF to have the exact same counts as the original data. We devised an experiment wherein we pulled 70 subsamples from the original responding sample and experimented with different sample sizes and different sampling strategies. We then tested the samples for both disclosure risk and data utility to find the sample with the best risk-utility profile.
Statistical Disclosure Limitation
Data Privacy
Sampling
Main Sponsor
Privacy and Confidentiality Interest Group
You have unsaved changes.