Wednesday, Aug 6: 10:30 AM - 12:20 PM
0496
Invited Paper Session
Music City Center
Room: CC-106A
Applied
No
Main Sponsor
IMS
Co Sponsors
Biometrics Section
Business and Economic Statistics Section
Presentations
The synthetic control method is often applied to problems with one treated unit and a small number of control units. A common inferential task in this setting is to test null hypotheses regarding the average treatment effect on the treated. Inference procedures that are justified asymptotically are often unsatisfactory due to (1) small sample sizes that render large-sample approximation fragile and (2) simplification of the estimation procedure that is implemented in practice. An alternative is permutation inference, which is related to a common diagnostic called the placebo test. It has provable Type-I error guarantees in finite samples without simplification of the method, when the treatment is uniformly assigned. Despite this robustness, the placebo test suffers from low resolution since the null distribution is constructed from only N reference estimates, where N is the sample size. This creates a barrier for statistical inference at a common level like α=0.05, especially when N is small. We propose a novel leave-two-out procedure that bypasses this issue, while still maintaining the same finite-sample Type-I error guarantee under uniform assignment for a wide range of N. Unlike the placebo test whose Type-I error always equals the theoretical upper bound, our procedure often achieves a lower unconditional Type-I error than theory suggests; this enables useful inference in the challenging regime when α<1/N. Empirically, our procedure achieves a higher power when the effect size is reasonably large and a comparable power otherwise. We generalize our procedure to non-uniform assignments and show how to conduct sensitivity analysis. From a methodological perspective, our procedure can be viewed as a new type of randomization inference different from permutation or rank-based inference, which is particularly effective in small samples.
Keywords
panel data
missing not-at-random
factor model
interactive fixed effects
event study
bipartite network data
The empirical risk minimization approach to data-driven decision making requires access to training data drawn under the same conditions as those that will be faced when the decision rule is deployed. However, in a number of settings, we may be concerned that our training sample is biased in the sense that some groups (characterized by either observable or unobservable attributes) may be under- or over-represented relative to the general population; and in this setting empirical risk minimization over the training set may fail to yield rules that perform well at deployment. We propose a model of sampling bias called conditional $\Gamma$-biased sampling, where observed covariates can affect the probability of sample selection arbitrarily much but the amount of unexplained variation in the probability of sample selection is bounded by a constant factor. Applying the distributionally robust optimization framework, we propose a method for learning a decision rule that minimizes the worst-case risk incurred under a family of test distributions that can generate the training distribution under $\Gamma$-biased sampling. We apply a result of Rockafellar and Uryasev to show that this problem is equivalent to an augmented convex risk minimization problem. We give statistical guarantees for learning a model that is robust to sampling bias via the method of sieves, and propose a deep learning algorithm whose loss function captures our robust learning target. We empirically validate our proposed method in a case study on prediction of mental health scores from health survey data and a case study on ICU length of stay prediction.
Keywords
distributionally robust optimization
The control function approach allows the researcher to identify various causal effects of interest. While powerful, it requires a strong invertibility assumption in the selection process, which limits its applicability. This paper expands the scope of the nonparametric control function approach by allowing the control function to be set-valued and derive sharp bounds on structural parameters. The proposed generalization accommodates a wide range of selection processes involving discrete endogenous variables, random coefficients, treatment selections with interference, and dynamic treatment selections. The framework also applies to partially observed or identified controls that are directly motivated from economic models.
Keywords
Control Function, Control Variable, Partial Identification.
This article studies experimental design in settings where the experimental units are large aggregate entities (e.g., markets), and only one or a small number of units can be exposed to the treatment. In such settings, randomization of the treatment may result in treated and control groups with very different characteristics at baseline, inducing biases. We propose a variety of experimental non-randomized synthetic control designs (Abadie, Diamond and Hainmueller, 2010, Abadie and Gardeazabal, 2003) that select the units to be treated, as well as the untreated units to be used as a control group. Average potential outcomes are estimated as weighted averages of the outcomes of treated units for potential outcomes with treatment, and weighted averages the outcomes of control units for potential outcomes without treatment. We analyze the properties of estimators based on synthetic control designs and propose new inferential techniques. We show that in experimental settings with aggregate units, synthetic control designs can substantially reduce estimation biases in comparison to randomization of the treatment.