Advances in Causal Inference Methodology

Xindi Lin Chair
 
Sunday, Aug 3: 2:00 PM - 3:50 PM
4001 
Contributed Papers 
Music City Center 
Room: CC-202C 

Main Sponsor

Health Policy Statistics Section

Presentations

2sCOPE-GAM: Instrument-Free Copula Approach for Endogeneity in Additive Models

While causal inference is central to many empirical analyses, endogeneity often complicates it. Classical instrumental variable (IV) methods addressing endogeneity require valid instruments and often assume linear relationships between the outcome and both endogenous and exogenous regressors. However, these linearity assumptions can be restrictive for many real-world applications, leading to biased estimates when relationships are inherently nonlinear. Copula-based approaches, such as copula endogeneity correction, offer instrument-free solutions but are similarly limited by their linearity assumptions. This paper introduces a flexible two-stage Copula Generalized Additive Model (2sCOPE-GAM) to address endogeneity without instruments while accommodating nonlinear relationships. Expressly, 2sCOPE-GAM assumes a linear relationship between the outcome and the endogenous regressor while allowing nonlinear relationships between the outcome and exogenous regressor through GAM. Through theoretical development, simulation and empirical data, we demonstrate the efficacy of our approach in accurately estimating causal effects in the presence of endogeneity and nonlinear relationships. 

Keywords

Endogeneity

Copula

2 stage estimation

Generalized additive model 

Co-Author(s)

Yi Qian, University of British Columbia
Hui Xie, Simon Fraser University

First Author

Kai Zhao

Presenting Author

Kai Zhao

High-dimensional mediation analysis with survival outcomes

It is of substantial scientific interest to detect mediators that lie in the causal pathway from an exposure to a survival outcome. However, with high-dimensional mediators, as often encountered in modern genomic data settings, there is a lack of powerful methods that can provide valid post-selection inference for the identified marginal mediation effect. To resolve this challenge, we develop a post-selection inference procedure for the maximally selected natural indirect effect using a semiparametric efficient influence function approach. To this end, we establish the asymptotic normality of a stabilized one-step estimator that takes the selection of the mediator into account. Simulation studies show that our proposed method has good empirical performance. We further apply our proposed approach to a lung cancer dataset and find multiple DNA methylation CpG sites that might mediate the effect of cigarette smoking on lung cancer survival. 

Keywords

causal inference

mediation analysis

right-censored data

post-selection inference

non-standard asymptotics

multiple testing 

Co-Author(s)

Zhonghua Liu, Columbia University
Ian McKeague, Department of Biostatistics, Columbia University

First Author

Tzu-Jung Huang, Fred Hutchinson Cancer Center

Presenting Author

Tzu-Jung Huang, Fred Hutchinson Cancer Center

Hybrid Frequentist-Bayesian Estimation of Causal Effects in the Primary Care First Evaluation

Multiplicity is a key problem in applied research. Estimating causal effects across many time periods, subgroups, or outcomes increases the risk of spurious findings. Bayesian methods address this issue by borrowing information across estimates from the same study or past studies to rein in extreme estimates. However, these methods are computationally intensive and can be difficult to align with frequentist approaches. As a solution, we pioneered a hybrid frequentist-Bayesian approach in the evaluation of Primary Care First (PCF), a primary care model from the Center for Medicare & Medicaid Innovation. In this approach, we fit a Bayesian meta-regression (Lipman et al. 2022) to frequentist difference-in-differences effect estimates from PCF's first three performance years. We found similar probabilities that PCF increased or reduced acute hospitalizations (51 and 49 percent, respectively), and a 72 percent probability that PCF increased Medicare expenditures by at least 1 percent. For subgroups, hybrid frequentist-Bayesian results were more moderate than frequentist estimates. These results supplement frequentist estimates to clarify model impacts across groups and over time. 

Keywords

Causal inference

Bayesian statistics

Difference-in-differences

Health policy evaluation

Multiplicity correction

Heterogeneous treatment effects 

Co-Author(s)

Honoka Suzuki, Mathematica
Nadia Bell, Mathematica
Daniel Thal, Mathematica
Lauren Forrow, Mathematica Policy Research

First Author

Rachael Aikens, Mathematica

Presenting Author

Rachael Aikens, Mathematica

Multilevel interrupted time series allowing non-linear interruption effects

We propose a new interrupted time series method for causal inference for multivariate time series data with an interruption. This method can incorporate multiple response streams with or without a control and estimate non-linear interruption effects across groups. We specify a latent time varying mean model as well as a multilevel interruption effect and generalized additive model post intervention, which behaves like a flexible structured random effect, allowing for nonlinear interruption effects.

We show through simulation that our model formulation a) has good coverage, b) effectively predicts the counterfactual trend, and c) effectively estimates the interruption effect across groups. In our first application, we use our modeling strategy absent a control time series by estimating the effect of the COVID-19 pandemic on hospital care utilization for acute myocardial infarction (AMI, or heart attacks) amongst Medicare beneficiaries in 2018 - 2021. Our application with a control concerns the effect of introduction of the prostate specific antigen test in 1986 on prostate cancer incidence using SEER data from 1975 - 2000, using colon and lung cancer in men as a control. 

Keywords

causal inference

time series

econometrics

health policy

semiparametric

bayesian 

First Author

RJ Waken

Presenting Author

RJ Waken

Unifying regression-based and design-based causal inference in time series experiments

Time series experiments, sometimes called switchback experiments in modern digital platforms, are a fundamental experimental design in practice. In this paper, we examine the design-based properties of regression-based methods for estimating treatment effects in such settings. We demonstrate that the treatment effect of interest can be consistently estimated using ordinary least squares (OLS) with an appropriately specified working model. Our analysis extends to estimating a diverging number of treatment effects simultaneously, and we establish the asymptotic properties of the resulting estimators. Additionally, we show that the heteroskedasticity and autocorrelation consistent (HAC) estimator provides a conservative estimate of the variance. Importantly, while our approach relies on OLS regression, our theoretical framework accommodates misspecification of the regression model. 

Keywords

time series experiments

potential outcome

randomization inference

robust standard error 

Co-Author

Peng Ding, University of California-Berkeley

First Author

Zhexiao Lin, UC Berkeley

Presenting Author

Zhexiao Lin, UC Berkeley