Uncertainty Assessment of a Triple-System Estimation Based on Privatized Microdata
Valbona Bejleri
Co-Author
United States Department of Agriculture – National Agricultural Statistics Service
Luca Sartore
Speaker
National Institute of Statistical Sciences
Tuesday, Aug 5: 9:20 AM - 9:35 AM
Invited Paper Session
Music City Center
Synthetic data generation is a statistical tool used to alter data to enhance the privacy of record-level information while maintaining the distributional properties of the original population. After processing the data, statistical agencies typically apply privacy-enhancing methods for generating privatized summaries that appear in official publications. These official summaries are often generated using model-based adjustments to account for potential issues due to undercoverage, nonresponse, and misclassification with respect to the population of interest. These adjustments can be produced using dual-system (DSE) or triple-system estimation (TSE) models. Moreover, calibration procedures further adjust the weights to produce estimates that meet known population benchmarks. Although the study of total error variability is well-developed for these standard statistical processes, it often disregards privacy mechanisms and related concerns over disclosure risk. In this paper, the use of an algorithm to generate protected microdata is proposed to study the uncertainty of a census under a novel definition of differential privacy. To better understand the properties of the proposed algorithm, real confidential microdata are substituted with altered microdata before typical estimation procedures are performed. This approach is tested on data from the 2022 US Census of Agriculture (including June Area Survey and FSA administrative data as second and third lists) using the level of accuracy, precision, utility and disclosure risk of final statistical summaries, as metrics for comparing the data with and without privatization mechanisms.
Capture-recapture models
Census of Agriculture
Disclosure risk
Neural Networks
Triple-system estimation
Variance estimation
You have unsaved changes.