Bridge the Gap: Differential Privacy and Statistical Analysis
Thursday, Aug 8: 10:30 AM - 12:20 PM
1048
Invited Paper Session
Oregon Convention Center
Room: CC-B110
Applied
Yes
Main Sponsor
SSC (Statistical Society of Canada)
Co Sponsors
Caucus for Women in Statistics
WNAR
Presentations
While several results in the literature (e.g., Dimitrakakis et al., 2017; Zhang and Zhang, 2023) demonstrate that Bayesian inference approximated by MCMC output can achieve differential privacy with zero or limited impact on the ensuing posterior, we reassess this perspective via an alternate "exact" MCMC perturbation inspired from Nicholls et al. (2012) within a federated learning setting. Our conclusion is that the ensuing privacy is mostly related to a slowing-down of MCMC convergence rather than a generic gain in protecting data privacy.
There has been increasing demand for establishing privacy-preserving methodologies for modern statistics and machine learning. Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees. Recent work focuses primarily on developing differentially private versions of individual statistical and machine learning tasks, with nontrivial upstream pre-processing typically not incorporated. An important example is when record linkage is done prior to downstream modeling. Record linkage refers to the statistical task of linking two or more data sets of the same group of entities without a unique identifier. This probabilistic procedure brings additional uncertainty to the subsequent task. In this talk, we present two differentially private algorithms for linear regression with linked data. In particular, we propose a noisy gradient method and a sufficient statistics perturbation approach for the estimation of regression coefficients. We investigate the privacy-accuracy tradeoff by providing finite-sample error bounds for the estimators. The variances of the estimators are also discussed.
Recommender systems have witnessed significant advancements in the past decade, impacting billions of people worldwide. However, these systems often collect a vast amounts of personal data, raising concerns about privacy. To address these issues, federated methods have emerged, allowing models to be trained without sharing users' personal data with a central server. Despite these advancements, existing federated methods encounter challenges related to centralized bottlenecks and model aggregation between users. In this study, we present a fully decentralized federated learning approach, wherein each user's model is optimized using their own data and gradients transferred from their neighboring models. This ensures that personal data remains distributed and eliminates the necessity for central server-side aggregation or model merging steps. Empirical experiments demonstrate that our approach achieves a significant improvement in accuracy compared to other decentralized methods, across various network structures.
You have unsaved changes.