Differentially Private Linear Regression with Linked Data

Elliot Paquette Co-Author
McGiIl University
 
Eric Kolaczyk Co-Author
McGill University
 
Shurong Lin Speaker
Boston University
 
Thursday, Aug 8: 11:00 AM - 11:25 AM
Invited Paper Session 
Oregon Convention Center 
There has been increasing demand for establishing privacy-preserving methodologies for modern statistics and machine learning. Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees. Recent work focuses primarily on developing differentially private versions of individual statistical and machine learning tasks, with nontrivial upstream pre-processing typically not incorporated. An important example is when record linkage is done prior to downstream modeling. Record linkage refers to the statistical task of linking two or more data sets of the same group of entities without a unique identifier. This probabilistic procedure brings additional uncertainty to the subsequent task. In this talk, we present two differentially private algorithms for linear regression with linked data. In particular, we propose a noisy gradient method and a sufficient statistics perturbation approach for the estimation of regression coefficients. We investigate the privacy-accuracy tradeoff by providing finite-sample error bounds for the estimators. The variances of the estimators are also discussed.