28 Communication-efficient distributed estimation of causal effects with high-dimensional data

Jiayi Tong Co-Author
 
Sida Peng Co-Author
Microsoft Research
 
Yong Chen Co-Author
University of Pennsylvania, Perelman School of Medicine
 
Yang Ning Co-Author
Cornell University
 
Xiaohan Wang First Author
Cornell University
 
Xiaohan Wang Presenting Author
Cornell University
 
Tuesday, Aug 6: 10:30 AM - 12:20 PM
3227 
Contributed Posters 
Oregon Convention Center 
We propose a communication-efficient algorithm to estimate the average treatment effect (ATE), when the data are distributed across multiple sites and the number of covariates is possibly much larger than the sample size in each site. Our main idea is to calibrate the estimates of the propensity score and outcome models using some proper surrogate loss functions to approximately attain the desired covariate balancing property. We show that under possible model misspecification, our distributed covariate balancing propensity score estimator (disthdCBPS) can approximate the global estimator, obtained by pooling together the data from multiple sites, at a fast rate. Thus, our estimator remains consistent and asymptotically normal. In addition, when both the propensity score and the outcome models are correctly specified, the proposed estimator attains the semiparametric efficiency bound. We illustrate the empirical performance of the proposed method in both simulation and empirical studies.

Keywords

Causal Inference

High-dimensional Statistics

Double robustness

Distributed inference

Communication efficiency

Likelihood approximation 

Abstracts


Main Sponsor

Section on Statistics in Epidemiology