Novel Extensions and Applications of Differential Privacy
Wednesday, Aug 6: 10:30 AM - 12:20 PM
0303
Invited Paper Session
Music City Center
Room: CC-106C
Applied
Yes
Main Sponsor
Survey Research Methods Section
Co Sponsors
Government Statistics Section
Presentations
Synthetic control is a widely used causal inference method for evaluating the effectiveness of government policies, such as new tariffs and tax increases. Traditionally, synthetic control is applied to aggregate-level datasets, but more recent studies have explored its applications to disaggregated datasets, such as individual health records and targeted marketing analyses. However, individual-level datasets exhibit different distributional properties compared to aggregate-level data. Our work reexamines the implicit assumptions of traditional synthetic control approaches and proposes theoretically grounded algorithms for synthetic control in individual-level analyses. I will address privacy concerns related to individual-level data analyses and present our work on Differentially Private Synthetic Control (DPSC). I will also discuss Cluster Synthetic Control, a synthetic control approach that incorporates a donor selection step, which eventually helps DPSC. These methods provide synthetic control approaches with provable accuracy improvements and privacy guarantees.
The Quarterly Census of Employment and Wages (QCEW) is a critical dataset on U.S. establishments, widely used for economic analysis and policy-making, including politically sensitive decisions. The Bureau of Labor Statistics (BLS) releases QCEW data as aggregate tables grouped by public attributes such as industry or county. We introduce a new formal privacy framework and two mechanisms for sharing establishment-level data by adapting Gaussian Differential Privacy (GDP). These mechanisms are designed to address the specific privacy and accuracy challenges posed by the QCEW. We define a novel concept of neighboring datasets, called sqrt-neighbors, and build upon it to propose the Establishment Gaussian Differential Privacy (EGDP) framework, which our proposed mechanisms satisfy. To generate tabulations by specific public attributes, the QCEW employs group-by queries, which take the data as input and output values for each group within a chosen attribute. For example, a group-by-state query would return values for each U.S. state and territory. Our first mechanism scales noise by a group's summation value, while the second scales noise by the sanitized maximum value within a group. These mechanisms can be integrated into a weighted least squares post-processing procedure to produce privacy-preserving microdata (PPM). Using the PPM, aggregate tables can be released in a way that maintains internal consistency across attributes. For example, it ensures county totals sum to the corresponding state totals. We evaluate the utility and privacy properties of both mechanisms through a series of experiments on synthetic data, varying the privacy parameters and query types used in PPM creation. The synthetic dataset is constructed from public data sources to mimic the structure of QCEW and is used to measure performance in terms of relative and absolute error between sanitized and original values. These results help illustrate the strengths and ideal use cases for each proposed mechanism.
Research in collaboration with Daniel Kifer, Aleksandra Slavkovic, and Daniell Toth
You have unsaved changes.