Wednesday, Aug 6: 2:00 PM - 3:50 PM
4191
Contributed Papers
Music City Center
Room: CC-101A
Main Sponsor
Section on Statistics in Epidemiology
Presentations
Estimating the causal effect of a treatment with observational data can be challenging due to an imbalance of and a lack of overlap between treated and control covariate distributions. In the presence of limited overlap, researchers choose between 1) methods that imply traditional estimands (e.g., ATE) but whose estimators are at risk of considerable bias and variance; and 2) methods (e.g., overlap weighting) which imply a different estimand, thereby modifying the target population to reduce variance. We propose a framework for navigating the tradeoffs between variance and bias due to imbalance and lack of overlap and the targeting of the estimand of scientific interest. We introduce a bias decomposition that encapsulates bias due to 1) the statistical bias of the estimator; and 2) estimand mismatch, i.e., deviation from the population of interest. We propose two design-based metrics and an estimand selection procedure that illustrate the tradeoffs between these sources of bias and variance of the resulting estimators. We demonstrate how to select an estimand based on preferences between these characteristics with an application to right heart catheterization data.
Keywords
average treatment effect
causal inference
inverse probability weighting
propensity score
target population
The classical best-subset selection method has been demonstrated to be nondeterministic polynomial-time hard and thus presents computational challenges. This problem can now be solved via advanced mixed integer optimization (MIO) algorithms for linear regression. We extend this methodology to linear instrumental variable (IV) regression and propose the best-subset instrumental variable (BSIV) method incorporating the MIO procedure. Classical IV estimation methods assume that IVs must not directly impact the outcome variable and should remain uncorrelated with nonmeasured variables. However, in practice, IVs are likely to be invalid, and existing methods can lead to a large bias relative to standard errors in certain situations. The proposed BSIV estimator is robust in estimating causal effects in the presence of unknown IV validity. We demonstrate that the BSIV using MIO algorithms outperforms two-stage least squares, Lasso-type IVs, and two-sample analysis (median and mode estimators) through Monte Carlo simulations in terms of bias and relative efficiency. We analyze two datasets involving the health-related quality of life index and proximity and the education–wage relationship
Keywords
Causal inference
Instrumental variables
Mendelian randomization
Best-subset selection
Mixed integer programming
Variable selection
In placebo-controlled RCTs, the placebo response significantly modifies treatment effects and diminishes the ITT treatment effect, ΔITT. This study presents a novel design consisting of two stages to estimate the standardized causal treatment effect, ΔSTD, among ITT subjects, had they exhibited lower levels of placebo response when using the active drug at home. Stage one involves an open-label placebo lead-in phase designed to estimate the expected placebo response had the active treatment been self-administered in daily use. In stage two, a double-blinded randomized phase is employed to estimate the CATE as a function of placebo response levels and other effect modifiers. To simplify CATE estimation, prognostic scores serve as placebo responses at both stages. The causal estimand ΔSTD is computed by averaging the CATE across the placebo response levels at stage one. We derive theoretical values of ΔITT- ΔSTD under normality and parametric assumptions to quantify bias attributable to the presence of placebo response. The validity of the proposed estimand is further evaluated through a series of simulations.
Keywords
Placebo Response
Causal Treatment Effect
Placebo-controlled Randomized Clinical Trials (RCTs)
Conditional Average Treatment Effect (CATE)
Prognostic Score
In recent years, valid instrumental variable selection method has attracted attention across different fields of study, including biostatistics, econometrics, and epidemiology. Under the plurality rule, where valid instruments form the largest group of clusters from taking ratios between coefficients of regressing the outcome variable on candidate instruments as well as the covariates and corresponding coefficients from regressing the endogenous variable(s) again on the same variables, exploration of this method has extended to deal with cases of multiple endogenous variables and heterogenous treatment effect. However, the up-to-date agglomerative hierarchical clustering method that groups instruments for multiple endogenous variables based on their Euclidean distances from each other could be computationally complex and does not provide well-defined explanations for heterogeneous treatment effects by non-categorical variables. In this study, we propose Lasso under the control function approach to deal with multiple endogenous regressors, endogenous variables with non-normal distributions, and heterogeneous treatment effects in interaction and higher-order terms.
Keywords
Lasso
instrumental variable
the control function approach
multiple endogenous variables
heterogenous treatment effect
In many observational studies, researchers are often interested in studying the effects of multiple exposures on a single outcome. Standard approaches for high-dimensional data such as the lasso assume the associations between the exposures and the outcome are sparse. These methods, however, do not estimate the causal effects in the presence of unmeasured confounding. In this paper, we consider an alternative approach that assumes the causal effects in view are sparse. We show that with sparse causation, the causal effects are identifiable even with unmeasured confounding. At the core of our proposal is a novel device, called the synthetic instrument, that in contrast to standard instrumental variables, can be constructed using the observed exposures directly. We show that under linear structural equation models, the problem of causal effect estimation can be formulated as an l0-penalization problem, and hence can be solved efficiently using off-the-shelf software. Simulations show that our approach outperforms state-of-art methods in both low-dimensional and high-dimensional settings. We further illustrate our method using a mouse obesity dataset.
Keywords
Causal inference
Multivariate analysis
Unmeasured confounding.
This talk addresses the problem of insufficient statistical power when a limited number of representative individuals volunteer as experimental participants. We highlight an inefficiency of randomized treatment assignment for causal inference. If observational data is available and transportable then it may be redundant to assign preferred treatments within an experiment. There are situations where experimenters should determine individual preferences and eliminate the possibility of assigning preferred treatments, which can cut the number of required participants in half. When only two treatments are under consideration, then each participant will receive the opposite of their preference, and randomization will not be needed, thus demonstrating the primacy of experimental control over randomization. This talk will share some ideas for how data fusion can enhance adaptive experimentation. The ideas will be demonstrated with an example application of causal reinforcement learning to the problem of selecting an optimal intervention policy to treat Crohn's disease.
Keywords
Experimental control
Randomization
Causal inference
Treatment preferences
Data fusion
Reinforcement learning