Novel Extensions and Applications of Differential Privacy

Daniel Yang Chair
Bureau of Labor Statistics
 
Ruobin Gong Discussant
Rutgers University
 
Daniell Toth Organizer
US Bureau of Labor Statistics
 
Wednesday, Aug 6: 10:30 AM - 12:20 PM
0303 
Invited Paper Session 
Music City Center 
Room: CC-106C 

Applied

Yes

Main Sponsor

Survey Research Methods Section

Co Sponsors

Government Statistics Section

Presentations

Towards Differentially Private Causal Inference Using Synthetic Controls

Synthetic control is a widely used causal inference method for evaluating the effectiveness of government policies, such as new tariffs and tax increases. Traditionally, synthetic control is applied to aggregate-level datasets, but more recent studies have explored its applications to disaggregated datasets, such as individual health records and targeted marketing analyses. However, individual-level datasets exhibit different distributional properties compared to aggregate-level data. Our work reexamines the implicit assumptions of traditional synthetic control approaches and proposes theoretically grounded algorithms for synthetic control in individual-level analyses. I will address privacy concerns related to individual-level data analyses and present our work on Differentially Private Synthetic Control (DPSC). I will also discuss Cluster Synthetic Control, a synthetic control approach that incorporates a donor selection step, which eventually helps DPSC. These methods provide synthetic control approaches with provable accuracy improvements and privacy guarantees. 

Speaker

Saeyoung Rho

Utility for Aggregate Queries of Formally Private Establishment Data

The Quarterly Census of Employment and Wages (QCEW) is a critical dataset on U.S. establishments, widely used for economic analysis and policy-making, including politically sensitive decisions. The Bureau of Labor Statistics (BLS) releases QCEW data as aggregate tables grouped by public attributes such as industry or county. We introduce a new formal privacy framework and two mechanisms for sharing establishment-level data by adapting Gaussian Differential Privacy (GDP). These mechanisms are designed to address the specific privacy and accuracy challenges posed by the QCEW. We define a novel concept of neighboring datasets, called sqrt-neighbors, and build upon it to propose the Establishment Gaussian Differential Privacy (EGDP) framework, which our proposed mechanisms satisfy. To generate tabulations by specific public attributes, the QCEW employs group-by queries, which take the data as input and output values for each group within a chosen attribute. For example, a group-by-state query would return values for each U.S. state and territory. Our first mechanism scales noise by a group's summation value, while the second scales noise by the sanitized maximum value within a group. These mechanisms can be integrated into a weighted least squares post-processing procedure to produce privacy-preserving microdata (PPM). Using the PPM, aggregate tables can be released in a way that maintains internal consistency across attributes. For example, it ensures county totals sum to the corresponding state totals. We evaluate the utility and privacy properties of both mechanisms through a series of experiments on synthetic data, varying the privacy parameters and query types used in PPM creation. The synthetic dataset is constructed from public data sources to mimic the structure of QCEW and is used to measure performance in terms of relative and absolute error between sanitized and original values. These results help illustrate the strengths and ideal use cases for each proposed mechanism.

Research in collaboration with Daniel Kifer, Aleksandra Slavkovic, and Daniell Toth 

Speaker

Kaitlyn Webb