Statistical Foundations for Federated and Collaborative Learning

Pengkun Yang Chair
Tsinghua University
 
Tony Cai Discussant
University of Pennsylvania
 
Jiaming Xu Organizer
Duke University
 
Pengkun Yang Organizer
Tsinghua University
 
Lili Su Organizer
Northeastern University
 
Wednesday, Aug 6: 2:00 PM - 3:50 PM
0458 
Invited Paper Session 
Music City Center 
Room: CC-209B 

Applied

No

Main Sponsor

IMS

Co Sponsors

Institute for Operations Research and the Management Sciences
International Chinese Statistical Association

Presentations

Adaptive and robust multi-task learning

We study the multitask learning problem that aims to simultaneously analyze multiple data sets collected from different sources and learn one model for each of them. We propose a family of adaptive methods that automatically utilize possible similarities among those tasks while carefully handling their differences. We derive sharp statistical guarantees for the methods and prove their robustness against outlier tasks. Numerical experiments on synthetic and real data sets demonstrate the efficacy of our new methods. 

Keywords

multi-task learning

adaptivity

robustness

model mis-specification

clustering

low-rank model 

Speaker

Yaqi Duan

Collaborative Learning Amidst Heterogeneity

In this talk, we explore statistical challenges and opportunities in collaborative learning under extreme heterogeneity.

First, we study the question of how collaborative learning can be used to causal inference beyond meta-analysis and introduce a novel collaborative inverse propensity score weighting estimator. Our approach demonstrates significant improvements over existing methods, especially as heterogeneity increases.

Then, we re-examine optimal experiment design from a multi-agent perspective, formulating the tension between a global federated learning platform and local data contributors as a game. We show that this perspective sheds new insights on the classical question of which optimality criterion should we use. 

Keywords

federated learning

causal inference

experiment design

mechanism design

collaborative learning

meta analysis 

Speaker

Sai Praneeth Karimireddy, USC

Federated Learning for Nonparametric Function Estimation: Framework and Optimality

We consider statistical optimality for federated learning in the context of nonparametric regression and density estimation. The setting we study is heterogeneous, encompassing varying sample sizes and differential privacy constraints across different servers. Within this framework, both global and pointwise estimation are considered, and optimal rates of convergence over the Besov spaces are established.

We propose distributed, privacy-preserving estimation procedures and analyze their theoretical properties. The findings reveal intriguing phase transition phenomena, illustrating the trade-off between statistical accuracy and privacy. The results characterize how privacy budgets, server count, and sample size impact accuracy, highlighting the compromises in a distributed privacy framework. 

Keywords

differential privacy

distributed inference

optimal rate 

Speaker

Tony Cai, University of Pennsylvania