Bridge the Gap: Differential Privacy and Statistical Analysis

Bei Jiang Chair
University of Alberta
 
Naisyin Wang Discussant
University of Michigan
 
Bei Jiang Organizer
University of Alberta
 
Thursday, Aug 8: 10:30 AM - 12:20 PM
1048 
Invited Paper Session 
Oregon Convention Center 
Room: CC-B110 

Applied

Yes

Main Sponsor

SSC (Statistical Society of Canada)

Co Sponsors

Caucus for Women in Statistics
WNAR

Presentations

The Paradox of Exact and Differentially Private Bayesian Inference

While several results in the literature (e.g., Dimitrakakis et al., 2017; Zhang and Zhang, 2023) demonstrate that Bayesian inference approximated by MCMC output can achieve differential privacy with zero or limited impact on the ensuing posterior, we reassess this perspective via an alternate "exact" MCMC perturbation inspired from Nicholls et al. (2012) within a federated learning setting. Our conclusion is that the ensuing privacy is mostly related to a slowing-down of MCMC convergence rather than a generic gain in protecting data privacy. 

Speaker

Christian Robert, Universite Paris Dauphine

Differentially Private Linear Regression with Linked Data

There has been increasing demand for establishing privacy-preserving methodologies for modern statistics and machine learning. Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees. Recent work focuses primarily on developing differentially private versions of individual statistical and machine learning tasks, with nontrivial upstream pre-processing typically not incorporated. An important example is when record linkage is done prior to downstream modeling. Record linkage refers to the statistical task of linking two or more data sets of the same group of entities without a unique identifier. This probabilistic procedure brings additional uncertainty to the subsequent task. In this talk, we present two differentially private algorithms for linear regression with linked data. In particular, we propose a noisy gradient method and a sufficient statistics perturbation approach for the estimation of regression coefficients. We investigate the privacy-accuracy tradeoff by providing finite-sample error bounds for the estimators. The variances of the estimators are also discussed.  

Co-Author(s)

Elliot Paquette, McGiIl University
Eric Kolaczyk, McGill University

Speaker

Shurong Lin, Boston University

A Network-Based Decentralization Scheme for Recommender Systems

Recommender systems have witnessed significant advancements in the past decade, impacting billions of people worldwide. However, these systems often collect a vast amounts of personal data, raising concerns about privacy. To address these issues, federated methods have emerged, allowing models to be trained without sharing users' personal data with a central server. Despite these advancements, existing federated methods encounter challenges related to centralized bottlenecks and model aggregation between users. In this study, we present a fully decentralized federated learning approach, wherein each user's model is optimized using their own data and gradients transferred from their neighboring models. This ensures that personal data remains distributed and eliminates the necessity for central server-side aggregation or model merging steps. Empirical experiments demonstrate that our approach achieves a significant improvement in accuracy compared to other decentralized methods, across various network structures. 

Co-Author(s)

James Lee, University of Virginia
Tao Li, Emory University
Xiwei Tang, University of Virginia

Speaker

Xuan Bi