Print Close

013 - Small Weights for Big Data: Scalable Kernel-Based Stable Balancing

Conference: International Conference on Health Policy Statistics 2023

01/10/2023: 7:30 PM - 8:30 PM MST
Posters

Description

Weighting is a general and often-used method for statistical adjustment. In observational studies and sample survey settings, one objective of weighting is to balance covariate distributions. An additional objective is that the weights be "small" in the sense that they have minimal dispersion and therefore produce a more stable estimator. There are two broad approaches to weighting: a modeling approach that targets these objectives by maximizing the fit of a propensity score model, and a balancing approach that directly optimizes the weights toward these two objectives. While the balancing approach tends to exhibit better performance in practice, at present it is not feasible to implement it in the increasingly common setting of very large observational studies when investigators wish to balance broad classes of functions of the covariates. Here, we propose a novel algorithm for scalable kernel-based stable balancing. We focus on a particular form of the balancing approach to weighting which posits a quadratic programming problem to solve for the weights of minimum variance that approximately balance the covariates. In order to choose what to balance, we use the kernel balancing approach that allows us to assume that the outcome regression functions lie in a large, flexible function space associated with a kernel, thus offers an effective way to minimize the bias caused by covariate imbalance. Based on the Nÿstrom method, the corresponding kernel-based imbalance metrics are constructed in linear time and space and incorporated into our quadratic program as linear constraints. Then we show that our balancing estimator can be efficiently computed by solving the quadratic program using the specialized first-order alternating direction method of multipliers. In extensive simulation studies reflecting a variety of data settings, we show that our proposed approach can handle large datasets containing millions of observations in seconds without sacrificing estimator accuracy. We apply our methods in a national study of heart attack treatment and outcomes by hospital profit status with 1.27 million patients. After weighting, we observe that for-profit hospitals perform percutaneous coronary intervention at similar rates as other hospitals; however, their patients have slightly worse mortality and higher readmission rates.

Keywords

Causal Inference

Observational Studies

Weighting Methods

Propensity Scores

Convex Optimization

Kernel Balancing

Speaker

Bijan Niknam, Harvard University

First Author

Kwangho Kim, Harvard Medical School

CoAuthor(s)

Bijan Niknam, Harvard University
Jose Zubizarreta