Novel Causal Inference Methods for Analyzing Modern Experimental Data in Health and Social Sciences

Siyu Heng Chair
New York University
 
Siyu Heng Organizer
New York University
 
Tuesday, Aug 5: 8:30 AM - 10:20 AM
0294 
Invited Paper Session 
Music City Center 
Room: CC-103C 

Keywords

Causal inference

Randomized experiments

Policy evaluation 

Applied

Yes

Main Sponsor

WNAR

Co Sponsors

Health Policy Statistics Section
Social Statistics Section

Presentations

Causal inference in network experiments: regression-based analysis and design-based properties

Investigating interference or spillover effects among units is a central task in many social science problems. Network experiments are powerful tools for this task, which avoids endogeneity by randomly assigning treatments to units over networks. However, it is non-trivial to analyze network experiments properly without imposing strong modeling assumptions. Previously, many researchers have proposed sophisticated point estimators and standard errors for causal effects under network experiments. We further show that regression-based point estimators and standard errors can have strong theoretical guarantees if the regression functions and robust standard errors are carefully specified to accommodate the interference patterns under network experiments. We first recall a well-known result that the Hajek estimator is numerically identical to the coefficient from the weighted-least-squares fit based on the inverse probability of the exposure mapping. Moreover, we demonstrate that the regression-based approach offers three notable advantages: its ease of implementation, the ability to derive standard errors through the same weighted-least-squares fit, and the capacity to integrate covariates into the analysis, thereby enhancing estimation efficiency. Furthermore, we analyze the asymptotic bias of the regression-based network-robust standard errors. Recognizing that the covariance estimator can be anti-conservative, we propose an adjusted covariance estimator to improve the empirical coverage rates. Although we focus on regression-based point estimators and standard errors, our theory holds under the design-based framework, which assumes that the randomness comes solely from the design of network experiments and allows for arbitrary misspecification of the regression models.
 

Keywords

Covariate adjustment

exposure mapping

interference

model misspecification

network- robust standard error

weighted least squares 

Co-Author

Peng Ding, University of California-Berkeley

Speaker

Mengsi Gao, UC Berkeley

Statistical performance guarantee for selecting those predicted to benefit most from treatment

Across a wide array of disciplines, many researchers use machine learning (ML) algorithms to identify a subgroup of individuals who are likely to benefit from a treatment the most ("exceptional responders") or those who are harmed by it. A common approach to this subgroup identification problem consists of two steps. First, researchers estimate the conditional average treatment effect (CATE) using an ML algorithm. Next, they use the estimated CATE to select those individuals who are predicted to be most affected by the treatment, either positively or negatively. Unfortunately, CATE estimates are often biased and noisy. In addition, utilizing the same data to both identify a subgroup and estimate its group average treatment effect results in a multiple testing problem. To address these challenges, we develop uniform confidence bands for estimation of the group average treatment effect sorted by generic ML algorithm (GATES). Using these uniform confidence bands, researchers can identify, with a statistical guarantee, a subgroup whose GATES exceeds a certain effect size, regardless of how this effect size is chosen. The validity of the proposed methodology depends solely on randomization of treatment and random sampling of units. Importantly, our method does not require modeling assumptions and avoids a computationally intensive resampling procedure. A simulation study shows that the proposed uniform confidence bands are reasonably informative and have an appropriate empirical coverage even when the sample size is as small as 100. We analyze a clinical trial of late-stage prostate cancer and find a relatively large proportion of exceptional responders. 

Keywords

causal inference

exceptional responders

heterogeneous treatment effects

uniform confidence bands 

Speaker

Michael Lingzhi Li, Harvard Business School

Improving instrumental variable estimators with post-stratification, with applications to experiments studying get-out-the-vote (GOTV) efforts

Experiments studying get-out-the-vote (GOTV) efforts estimate the causal effect of various mobilization efforts on voter turnout. However, there is often substantial noncompliance in these studies. A usual approach is to use an instrumental variable (IV) analysis to estimate impacts for compliers, here being those actually contacted by the investigators. Unfortunately, popular IV estimators can be unstable in studies with a small fraction of compliers. This talk will explore post-stratification of the data using variables that predict complier status (and, potentially, the outcome) to mitigate this. The benefits of post-stratification in terms of bias, variance, and improved standard error estimates will be presented, along with a finite-sample asymptotic variance formula. Comparisons of the performance of different IV approaches will be made, with discussion of the advantages of our design-based post-stratification approach over incorporating compliance-predictive covariates into the two-stage least squares estimator. The benefits of our approach will be demonstrated in two GOTV applications. 

Keywords

Causal inference

Post-stratification

Instrumental variables

Blocking

Compliance

Randomization Inference 

Co-Author(s)

Luke Miratrix, Harvard University
Luke Keele, University of Pennsylvania

Speaker

Nicole Pashley, Rutgers University

Evaluating tax audit policies on large interfirm networks : a field experiment

This talk presents an ongoing large field experiment that randomizes tax audit notices (treatment) to firms connected through a large network of VAT transactions. While the ultimate goal is to optimize tax audit policy, the short-term goal is to estimate causal effects of tax audit notices on firm behavior. Of particular interest is to understand spillovers, that is, the response of firms that are not treated but are connected to other firms that are treated. First, I will discuss why current popular approaches to experimenting on networks are limited by the reality of inter-firm networks, such as their size, high interconnectivity and heavy-tailed degree distributions. I will then describe an approach to experimentation that leverages subtle sub-structures in the network. This approach is specifically designed to allow the application of Fisherian-style permutation tests of causal effects. These testing procedures are computationally efficient and finite-sample valid, qualities that are important for testing in a robust way the parameters of structural economic models.
 

Keywords

randomization

spillovers

tax policies

interfirm networks

big data 

Speaker

Panagiotis Toulis, The University of Chicago, Booth School of Business