Utility for Aggregate Queries of Formally Private Establishment Data
Wednesday, Aug 6: 11:05 AM - 11:35 AM
Invited Paper Session
Music City Center
The Quarterly Census of Employment and Wages (QCEW) is a critical dataset on U.S. establishments, widely used for economic analysis and policy-making, including politically sensitive decisions. The Bureau of Labor Statistics (BLS) releases QCEW data as aggregate tables grouped by public attributes such as industry or county. We introduce a new formal privacy framework and two mechanisms for sharing establishment-level data by adapting Gaussian Differential Privacy (GDP). These mechanisms are designed to address the specific privacy and accuracy challenges posed by the QCEW. We define a novel concept of neighboring datasets, called sqrt-neighbors, and build upon it to propose the Establishment Gaussian Differential Privacy (EGDP) framework, which our proposed mechanisms satisfy. To generate tabulations by specific public attributes, the QCEW employs group-by queries, which take the data as input and output values for each group within a chosen attribute. For example, a group-by-state query would return values for each U.S. state and territory. Our first mechanism scales noise by a group's summation value, while the second scales noise by the sanitized maximum value within a group. These mechanisms can be integrated into a weighted least squares post-processing procedure to produce privacy-preserving microdata (PPM). Using the PPM, aggregate tables can be released in a way that maintains internal consistency across attributes. For example, it ensures county totals sum to the corresponding state totals. We evaluate the utility and privacy properties of both mechanisms through a series of experiments on synthetic data, varying the privacy parameters and query types used in PPM creation. The synthetic dataset is constructed from public data sources to mimic the structure of QCEW and is used to measure performance in terms of relative and absolute error between sanitized and original values. These results help illustrate the strengths and ideal use cases for each proposed mechanism.
Research in collaboration with Daniel Kifer, Aleksandra Slavkovic, and Daniell Toth
You have unsaved changes.