Sunday, Aug 3: 2:00 PM - 3:50 PM
4001
Contributed Papers
Music City Center
Room: CC-202C
Main Sponsor
Health Policy Statistics Section
Presentations
While causal inference is central to many empirical analyses, endogeneity often complicates it. Classical instrumental variable (IV) methods addressing endogeneity require valid instruments and often assume linear relationships between the outcome and both endogenous and exogenous regressors. However, these linearity assumptions can be restrictive for many real-world applications, leading to biased estimates when relationships are inherently nonlinear. Copula-based approaches, such as copula endogeneity correction, offer instrument-free solutions but are similarly limited by their linearity assumptions. This paper introduces a flexible two-stage Copula Generalized Additive Model (2sCOPE-GAM) to address endogeneity without instruments while accommodating nonlinear relationships. Expressly, 2sCOPE-GAM assumes a linear relationship between the outcome and the endogenous regressor while allowing nonlinear relationships between the outcome and exogenous regressor through GAM. Through theoretical development, simulation and empirical data, we demonstrate the efficacy of our approach in accurately estimating causal effects in the presence of endogeneity and nonlinear relationships.
Keywords
Endogeneity
Copula
2 stage estimation
Generalized additive model
It is of substantial scientific interest to detect mediators that lie in the causal pathway from an exposure to a survival outcome. However, with high-dimensional mediators, as often encountered in modern genomic data settings, there is a lack of powerful methods that can provide valid post-selection inference for the identified marginal mediation effect. To resolve this challenge, we develop a post-selection inference procedure for the maximally selected natural indirect effect using a semiparametric efficient influence function approach. To this end, we establish the asymptotic normality of a stabilized one-step estimator that takes the selection of the mediator into account. Simulation studies show that our proposed method has good empirical performance. We further apply our proposed approach to a lung cancer dataset and find multiple DNA methylation CpG sites that might mediate the effect of cigarette smoking on lung cancer survival.
Keywords
causal inference
mediation analysis
right-censored data
post-selection inference
non-standard asymptotics
multiple testing
Multiplicity is a key problem in applied research. Estimating causal effects across many time periods, subgroups, or outcomes increases the risk of spurious findings. Bayesian methods address this issue by borrowing information across estimates from the same study or past studies to rein in extreme estimates. However, these methods are computationally intensive and can be difficult to align with frequentist approaches. As a solution, we pioneered a hybrid frequentist-Bayesian approach in the evaluation of Primary Care First (PCF), a primary care model from the Center for Medicare & Medicaid Innovation. In this approach, we fit a Bayesian meta-regression (Lipman et al. 2022) to frequentist difference-in-differences effect estimates from PCF's first three performance years. We found similar probabilities that PCF increased or reduced acute hospitalizations (51 and 49 percent, respectively), and a 72 percent probability that PCF increased Medicare expenditures by at least 1 percent. For subgroups, hybrid frequentist-Bayesian results were more moderate than frequentist estimates. These results supplement frequentist estimates to clarify model impacts across groups and over time.
Keywords
Causal inference
Bayesian statistics
Difference-in-differences
Health policy evaluation
Multiplicity correction
Heterogeneous treatment effects
We propose a new interrupted time series method for causal inference for multivariate time series data with an interruption. This method can incorporate multiple response streams with or without a control and estimate non-linear interruption effects across groups. We specify a latent time varying mean model as well as a multilevel interruption effect and generalized additive model post intervention, which behaves like a flexible structured random effect, allowing for nonlinear interruption effects.
We show through simulation that our model formulation a) has good coverage, b) effectively predicts the counterfactual trend, and c) effectively estimates the interruption effect across groups. In our first application, we use our modeling strategy absent a control time series by estimating the effect of the COVID-19 pandemic on hospital care utilization for acute myocardial infarction (AMI, or heart attacks) amongst Medicare beneficiaries in 2018 - 2021. Our application with a control concerns the effect of introduction of the prostate specific antigen test in 1986 on prostate cancer incidence using SEER data from 1975 - 2000, using colon and lung cancer in men as a control.
Keywords
causal inference
time series
econometrics
health policy
semiparametric
bayesian
Time series experiments, sometimes called switchback experiments in modern digital platforms, are a fundamental experimental design in practice. In this paper, we examine the design-based properties of regression-based methods for estimating treatment effects in such settings. We demonstrate that the treatment effect of interest can be consistently estimated using ordinary least squares (OLS) with an appropriately specified working model. Our analysis extends to estimating a diverging number of treatment effects simultaneously, and we establish the asymptotic properties of the resulting estimators. Additionally, we show that the heteroskedasticity and autocorrelation consistent (HAC) estimator provides a conservative estimate of the variance. Importantly, while our approach relies on OLS regression, our theoretical framework accommodates misspecification of the regression model.
Keywords
time series experiments
potential outcome
randomization inference
robust standard error