Tuesday, Aug 5: 8:30 AM - 10:20 AM
0294
Invited Paper Session
Music City Center
Room: CC-103C
Causal inference
Randomized experiments
Policy evaluation
Applied
Yes
Main Sponsor
WNAR
Co Sponsors
Health Policy Statistics Section
Social Statistics Section
Presentations
Investigating interference or spillover effects among units is a central task in many social science problems. Network experiments are powerful tools for this task, which avoids endogeneity by randomly assigning treatments to units over networks. However, it is non-trivial to analyze network experiments properly without imposing strong modeling assumptions. Previously, many researchers have proposed sophisticated point estimators and standard errors for causal effects under network experiments. We further show that regression-based point estimators and standard errors can have strong theoretical guarantees if the regression functions and robust standard errors are carefully specified to accommodate the interference patterns under network experiments. We first recall a well-known result that the Hajek estimator is numerically identical to the coefficient from the weighted-least-squares fit based on the inverse probability of the exposure mapping. Moreover, we demonstrate that the regression-based approach offers three notable advantages: its ease of implementation, the ability to derive standard errors through the same weighted-least-squares fit, and the capacity to integrate covariates into the analysis, thereby enhancing estimation efficiency. Furthermore, we analyze the asymptotic bias of the regression-based network-robust standard errors. Recognizing that the covariance estimator can be anti-conservative, we propose an adjusted covariance estimator to improve the empirical coverage rates. Although we focus on regression-based point estimators and standard errors, our theory holds under the design-based framework, which assumes that the randomness comes solely from the design of network experiments and allows for arbitrary misspecification of the regression models.
Keywords
Covariate adjustment
exposure mapping
interference
model misspecification
network- robust standard error
weighted least squares
Across a wide array of disciplines, many researchers use machine learning (ML) algorithms to identify a subgroup of individuals who are likely to benefit from a treatment the most ("exceptional responders") or those who are harmed by it. A common approach to this subgroup identification problem consists of two steps. First, researchers estimate the conditional average treatment effect (CATE) using an ML algorithm. Next, they use the estimated CATE to select those individuals who are predicted to be most affected by the treatment, either positively or negatively. Unfortunately, CATE estimates are often biased and noisy. In addition, utilizing the same data to both identify a subgroup and estimate its group average treatment effect results in a multiple testing problem. To address these challenges, we develop uniform confidence bands for estimation of the group average treatment effect sorted by generic ML algorithm (GATES). Using these uniform confidence bands, researchers can identify, with a statistical guarantee, a subgroup whose GATES exceeds a certain effect size, regardless of how this effect size is chosen. The validity of the proposed methodology depends solely on randomization of treatment and random sampling of units. Importantly, our method does not require modeling assumptions and avoids a computationally intensive resampling procedure. A simulation study shows that the proposed uniform confidence bands are reasonably informative and have an appropriate empirical coverage even when the sample size is as small as 100. We analyze a clinical trial of late-stage prostate cancer and find a relatively large proportion of exceptional responders.
Keywords
causal inference
exceptional responders
heterogeneous treatment effects
uniform confidence bands
Experiments studying get-out-the-vote (GOTV) efforts estimate the causal effect of various mobilization efforts on voter turnout. However, there is often substantial noncompliance in these studies. A usual approach is to use an instrumental variable (IV) analysis to estimate impacts for compliers, here being those actually contacted by the investigators. Unfortunately, popular IV estimators can be unstable in studies with a small fraction of compliers. This talk will explore post-stratification of the data using variables that predict complier status (and, potentially, the outcome) to mitigate this. The benefits of post-stratification in terms of bias, variance, and improved standard error estimates will be presented, along with a finite-sample asymptotic variance formula. Comparisons of the performance of different IV approaches will be made, with discussion of the advantages of our design-based post-stratification approach over incorporating compliance-predictive covariates into the two-stage least squares estimator. The benefits of our approach will be demonstrated in two GOTV applications.
Keywords
Causal inference
Post-stratification
Instrumental variables
Blocking
Compliance
Randomization Inference
This talk presents an ongoing large field experiment that randomizes tax audit notices (treatment) to firms connected through a large network of VAT transactions. While the ultimate goal is to optimize tax audit policy, the short-term goal is to estimate causal effects of tax audit notices on firm behavior. Of particular interest is to understand spillovers, that is, the response of firms that are not treated but are connected to other firms that are treated. First, I will discuss why current popular approaches to experimenting on networks are limited by the reality of inter-firm networks, such as their size, high interconnectivity and heavy-tailed degree distributions. I will then describe an approach to experimentation that leverages subtle sub-structures in the network. This approach is specifically designed to allow the application of Fisherian-style permutation tests of causal effects. These testing procedures are computationally efficient and finite-sample valid, qualities that are important for testing in a robust way the parameters of structural economic models.
Keywords
randomization
spillovers
tax policies
interfirm networks
big data