Monday, Aug 4: 8:30 AM - 10:20 AM
0369
Invited Paper Session
Music City Center
Room: CC-210
Applied
Yes
Main Sponsor
Health Policy Statistics Section
Co Sponsors
Mental Health Statistics Section
Section on Statistics in Epidemiology
Presentations
The past few decades have seen an explosion in the development of freely available software to implement statistical methods and algorithms to help us explore and analyze data. However researchers tend to assume that once they have made their software available (e.g. through CRAN or on GitHub) that their job is done. Typically, very little attention is paid to ensuring that the software is easy to use or that it is likely to be used correctly. Even less attention is paid to helping researchers new to a method to understand the underlying assumption or how to appropriate interpret the output. In this talk I will describe a new software tool for causal inference that works to scaffold the user experience to maximize the probability that researchers can use the tool appropriately and understand the foundational ideas. Moreover, I'll describe a randomized experiment we performed to understand whether this tool actually accomplishes these goals relative to traditional software. I conclude with calls to action for those that develop methods.
Keywords
software
accessibility
causal inference
machine learning
BART
R
This work has been motivated by the lack of accessible methods even for relatively simple and common causal inference problems in practice (e.g., treatment noncompliance). This presentation revisits the use of Gaussian mixtures as a potentially accessible method of model identification in the context of principal stratification. Relying on such parametric conditions for causal identification has been mostly considered as risky without clear paths to conducting sensitivity analysis. Turning this situation around, the proposed diagnostic approach provides the means for assessing the quality of causal effect estimates from parametric identification. Our strategy for constructing diagnostic measures is unique - we observe how parametric estimation responds to varying degrees of nonparametrically identifying restrictions. In other words, parametric and nonparametric identification methods are jointly used to generate diagnostic indices, which will tell us how good or biased the parametrically identified causal effect estimates are. The presentation will highlight potential benefits of this under-explored causal approach such as easiness in implementation, quality control and automation.
Keywords
Principal causal effects
Gaussian mixtures
Parametric identification
Nonparametric identification
Diagnostic indices
Automation
We develop theoretically justified analytical formulas for sample size and power calculation in the propensity score analysis of causal inference using observational data. By analyzing the variance of the inverse probability weighting estimator of the average treatment effect (ATE), we clarify the three key components for sample size calculations: propensity score distribution, potential outcome distribution, and their correlation. We devise analytical procedures to identify these components based on commonly available and interpretable summary statistics. We elucidate the critical role of covariate overlap between treatment groups in determining the sample size. In particular, we propose to use the Bhattacharyya coefficient as a measure of covariate overlap, which, together with the treatment proportion, leads to a uniquely identifiable and easily computable propensity score distribution. The proposed method is applicable to both continuous and binary outcomes. We show that the standard two-sample z-test and variance inflation factor methods often lead to, sometimes vastly, inaccurate sample size estimates, especially with limited overlap. We also derive formulas for the average treatment effects for the treated (ATT) and overlapped population (ATO) estimands. We provide simulated and real examples to illustrate the proposed method. We develop an associated R package PSpower.
Keywords
causal inference
factorial design
trial emulation
survival outcome
software