Accessible Causal Inference

Elizabeth Stuart Chair
Johns Hopkins University, Bloomberg School of Public Health
 
Michael Baiocchi Discussant
 
Booil Jo Organizer
Stanford University
 
Monday, Aug 4: 8:30 AM - 10:20 AM
0369 
Invited Paper Session 
Music City Center 
Room: CC-210 

Applied

Yes

Main Sponsor

Health Policy Statistics Section

Co Sponsors

Mental Health Statistics Section
Section on Statistics in Epidemiology

Presentations

Democratizing Methods

The past few decades have seen an explosion in the development of freely available software to implement statistical methods and algorithms to help us explore and analyze data. However researchers tend to assume that once they have made their software available (e.g. through CRAN or on GitHub) that their job is done. Typically, very little attention is paid to ensuring that the software is easy to use or that it is likely to be used correctly. Even less attention is paid to helping researchers new to a method to understand the underlying assumption or how to appropriate interpret the output. In this talk I will describe a new software tool for causal inference that works to scaffold the user experience to maximize the probability that researchers can use the tool appropriately and understand the foundational ideas. Moreover, I'll describe a randomized experiment we performed to understand whether this tool actually accomplishes these goals relative to traditional software. I conclude with calls to action for those that develop methods. 

Keywords

software

accessibility

causal inference

machine learning

BART


Speaker

Jennifer Hill

A Diagnostic Approach to Causal Inference

This work has been motivated by the lack of accessible methods even for relatively simple and common causal inference problems in practice (e.g., treatment noncompliance). This presentation revisits the use of Gaussian mixtures as a potentially accessible method of model identification in the context of principal stratification. Relying on such parametric conditions for causal identification has been mostly considered as risky without clear paths to conducting sensitivity analysis. Turning this situation around, the proposed diagnostic approach provides the means for assessing the quality of causal effect estimates from parametric identification. Our strategy for constructing diagnostic measures is unique - we observe how parametric estimation responds to varying degrees of nonparametrically identifying restrictions. In other words, parametric and nonparametric identification methods are jointly used to generate diagnostic indices, which will tell us how good or biased the parametrically identified causal effect estimates are. The presentation will highlight potential benefits of this under-explored causal approach such as easiness in implementation, quality control and automation.  

Keywords

Principal causal effects

Gaussian mixtures

Parametric identification

Nonparametric identification

Diagnostic indices

Automation 

Speaker

Booil Jo, Stanford University

Power and sample size calculations for propensity score analysis of observational studies

We develop theoretically justified analytical formulas for sample size and power calculation in the propensity score analysis of causal inference using observational data. By analyzing the variance of the inverse probability weighting estimator of the average treatment effect (ATE), we clarify the three key components for sample size calculations: propensity score distribution, potential outcome distribution, and their correlation. We devise analytical procedures to identify these components based on commonly available and interpretable summary statistics. We elucidate the critical role of covariate overlap between treatment groups in determining the sample size. In particular, we propose to use the Bhattacharyya coefficient as a measure of covariate overlap, which, together with the treatment proportion, leads to a uniquely identifiable and easily computable propensity score distribution. The proposed method is applicable to both continuous and binary outcomes. We show that the standard two-sample z-test and variance inflation factor methods often lead to, sometimes vastly, inaccurate sample size estimates, especially with limited overlap. We also derive formulas for the average treatment effects for the treated (ATT) and overlapped population (ATO) estimands. We provide simulated and real examples to illustrate the proposed method. We develop an associated R package PSpower. 

Keywords

causal inference

factorial design

trial emulation

survival outcome

software 

Co-Author

Fan Li, Duke University

Speaker

Xiaoxiao Zhou, Duke University