Wednesday, Aug 6: 2:00 PM - 3:50 PM
0458
Invited Paper Session
Music City Center
Room: CC-209B
Applied
No
Main Sponsor
IMS
Co Sponsors
Institute for Operations Research and the Management Sciences
International Chinese Statistical Association
Presentations
We study the multitask learning problem that aims to simultaneously analyze multiple data sets collected from different sources and learn one model for each of them. We propose a family of adaptive methods that automatically utilize possible similarities among those tasks while carefully handling their differences. We derive sharp statistical guarantees for the methods and prove their robustness against outlier tasks. Numerical experiments on synthetic and real data sets demonstrate the efficacy of our new methods.
Keywords
multi-task learning
adaptivity
robustness
model mis-specification
clustering
low-rank model
In this talk, we explore statistical challenges and opportunities in collaborative learning under extreme heterogeneity.
First, we study the question of how collaborative learning can be used to causal inference beyond meta-analysis and introduce a novel collaborative inverse propensity score weighting estimator. Our approach demonstrates significant improvements over existing methods, especially as heterogeneity increases.
Then, we re-examine optimal experiment design from a multi-agent perspective, formulating the tension between a global federated learning platform and local data contributors as a game. We show that this perspective sheds new insights on the classical question of which optimality criterion should we use.
Keywords
federated learning
causal inference
experiment design
mechanism design
collaborative learning
meta analysis
We consider statistical optimality for federated learning in the context of nonparametric regression and density estimation. The setting we study is heterogeneous, encompassing varying sample sizes and differential privacy constraints across different servers. Within this framework, both global and pointwise estimation are considered, and optimal rates of convergence over the Besov spaces are established.
We propose distributed, privacy-preserving estimation procedures and analyze their theoretical properties. The findings reveal intriguing phase transition phenomena, illustrating the trade-off between statistical accuracy and privacy. The results characterize how privacy budgets, server count, and sample size impact accuracy, highlighting the compromises in a distributed privacy framework.
Keywords
differential privacy
distributed inference
optimal rate
Speaker
Tony Cai, University of Pennsylvania