WITHDRAWN: Sub-Sampling as Data Protection: A Case Study of Pew Research Center's Asian American Survey

Jennifer Taub First Author
NORC at The University of Chicago
 
Tuesday, Aug 5: 8:50 AM - 9:05 AM
2558 
Contributed Papers 
Music City Center 
An anonymization challenge faced by many surveys is that that prior to, or even after, the process of anonymization organizations may release publications containing tables from raw data. These publications can undo data protections offered by Statistical Disclosure Limitation techniques, such as local suppression, since these tables can be used in subtraction attacks. We present a case study using Pew Research Center's Asian American Survey. Prior to releasing a Public Use File (PUF), Pew created many publications using raw Asian American Survey data. To create a PUF, we used further sub-sampling as our primary form of disclosure protection, since it would protect the PUF from subtraction attacks. This is effective because a data attacker would not expect a table coming from a sub-sampled PUF to have the exact same counts as the original data. We devised an experiment wherein we pulled 70 subsamples from the original responding sample and experimented with different sample sizes and different sampling strategies. We then tested the samples for both disclosure risk and data utility to find the sample with the best risk-utility profile.

Keywords

Statistical Disclosure Limitation

Data Privacy

Sampling 

Main Sponsor

Privacy and Confidentiality Interest Group