Differentially Private Linear Regression and Synthetic Data Generation with Statistical Guarantees
Shurong Lin
Presenting Author
Pennsylvania State University
Tuesday, Aug 5: 9:05 AM - 9:20 AM
1423
Contributed Papers
Music City Center
In social sciences, where small- to medium-scale datasets are common, canonical tasks such as linear regression are ubiquitous. In privacy-aware settings, substantial work has been done on differentially private (DP) linear regression. However, most existing methods focus primarily on point estimation, with limited consideration of uncertainty quantification. At the same time, synthetic data generation (SDG) is gaining importance as a tool to allow replication studies in privacy-aware settings. Yet, current DP linear regression approaches do not readily support SDG. Furthermore, mainstream SDG methods, usually based on machine learning and deep learning models, often require large datasets to train effectively. This limits their applicability to smaller data regimes typical of social science research.
To address these challenges, we propose a novel Gaussian DP linear regression method that enables statistically valid inference by accounting for the noise introduced by the privacy mechanism. We derive a DP bias-corrected regression estimator and its asymptotic confidence interval. We also introduce a synthetic data generation procedure, where running linear regression on the synthetic data is equivalent to the proposed DP linear regression. Our approach is built upon a binning-aggregation strategy, leveraging existing DP binning techniques. It is designed to operate effectively in smaller $d$-dimensional regimes. Experimental results demonstrate that our method achieves statistical accuracy comparable to or better than existing DP linear regression techniques, with particularly notable improvements over those capable of statistical inference.
Differential Privacy
Linear Regression
Synthetic Data
Gaussian Mechanism
Perturbed Histogram
Main Sponsor
Privacy and Confidentiality Interest Group
You have unsaved changes.