Balancing Privacy and Precision: The Impact of Data Perturbation Methods on Small Area Estimation

Trivellore Raghunathan Co-Author
Institute for Social Research
 
Chendi Zhao First Author
 
Chendi Zhao Presenting Author
 
Thursday, Aug 7: 11:35 AM - 11:50 AM
1781 
Contributed Papers 
Music City Center 
Microdata poses privacy risks, especially in small geographic areas. Perturbation reduces these risks, but balancing privacy and utility remains challenging, particularly in Small Area Estimation (SAE). This study examines how data perturbation affects the accuracy of SAE, aiming to optimize privacy protection and data utility. Using data from the 2018- 2022 American Community Survey Public Use Microdata Sample, we estimate income and poverty at the state and Public Use Microdata Area (PUMA) levels. Six covariates including age, gender, race/ethnicity, education, occupation, and health insurance are used for prediction and perturbed. Records are first classified by three privacy levels. Random Swapping, Post Randomization Method, and Multiple Imputation are then applied at the national, state, and PUMA levels. For each perturbation scenario, we generate SAE at the state and PUMA levels using the Fay-Herriot model and evaluate outcomes within the Risk-Utility (R-U) framework. We hypothesize that greater privacy protection and smaller geographic areas reduce utility, leading to less accurate estimates.

Keywords

Data Privacy

Data Perturbation

Small Area Estimation

American Community Survey 

Main Sponsor

Survey Research Methods Section