Sunday, Aug 3: 2:00 PM - 3:50 PM
0671
Topic-Contributed Paper Session
Music City Center
Room: CC-201B
Social Networks
Causal Inference
Network Sampling
Off-Policy Estimation
Applied
No
Main Sponsor
Survey Research Methods Section
Co Sponsors
Health Policy Statistics Section
Section on Statistical Learning and Data Science
Presentations
Estimating causal effects under interference is pertinent to many real-world settings. Recent work with low-order potential outcomes models uses a rollout design to obtain unbiased estimators that require no interference network information. However, the required extrapolation can lead to prohibitively high variance. To address this, we propose a two-stage experiment that selects a
sub-population in the first stage and restricts treatment rollout to this sub-population in the second stage. We explore the role of clustering in the first stage by analyzing the bias and variance of a polynomial interpolation-style estimator under this experimental design. Bias increases with the number of edges cut in the clustering of the interference network, but variance
depends on qualities of the clustering that relate to homophily and covariate balance. There is a tension between clustering objectives that minimize the number of cut edges versus those that maximize covariate balance across clusters. Through simulations, we explore a bias-variance trade-off and compare the performance of the estimator under different clustering strategies.
Keywords
Rollout experimental design
In the setting of no interference between experimental units, wherein the treatment of one unit cannot influence the outcome of another, the average treatment effect is the main causal estimand of interest and its value does not depend on the experiment design policy. However, in social networks where units are connected, this assumption is often incorrect and its relaxation yields a profusion of causal estimands whose value depends on the design. Say, the expected average direct and indirect effects respectively capture the average effect of flipping the treatment of a unit on its own and on another unit's outcome, marginalized over the experiment design. The nontrivial dependence of these estimands on the design implies an off-policy estimation challenge: can we estimate arbitrary causal estimands under a design policy different from the one the data were collected under? Considering causal estimands as Boolean functions, we describe unbiased estimators for off-policy estimation in full generality and show precisely how interference assumptions interplay with the sparsity of the interference network for a given variance of these estimators. For any causal estimator, including the proposed off-policy estimator(s), we show that its variance is generally nonidentifiable but there are unbiased estimators for a conservative bound on the variance and the bound can be made tighter when one restricts interference. Notably, considering causal estimands as Boolean functions allows us to view the profusion of causal estimands as providing higher-order corrections to a Taylor expansion of the expected average outcome curve around the actual design policy, which presents promising directions for optimal design to estimate the complete off-policy curve.
Keywords
social networks
network interference
off-policy estimation
experiment design
Co-Author
Dean Eckles, Massachusetts Institute of Technology
Speaker
Sahil Loomba, Massachusetts Institute of Technology
Populations in the greatest need of health interventions are often the hardest to reach with conventional health policies. From people who are unhoused to people who inject drugs to people who are undocumented, establishing reliable methods of accessing and assisting marginalized communities is important for achieving wholistic public health objectives. Respondent-Driven Sampling is a network sampling method widely used to study hidden or hard-to-reach populations by incentivizing study participants to recruit their social connections. We present reinforcement learning (RL) for respondent-driven sampling, an adaptive RDS study design in which the incentives are tailored over time to maximize cumulative utility during the study. We show that these designs are more efficient, cost-effective, and can generate new insights into the social structure of hidden populations. In addition, we develop methods for valid post-study inference, which are complicated by the adaptive sampling induced by RL as well as the complex dependencies among subjects due to latent (unobserved) social network structure.
Keywords
Reinforcement Learning
Respondent Driven Sampling
From clinical trials to corporate strategy, randomized experiments are a reliable methodological tool for estimating causal effects. In recent years, there has been a growing interest in causal inference under interference, where treatment given to one unit can affect outcomes of other units. While the literature on interference has focused primarily on unbiased and consistent estimation, designing randomized network experiments to ensure tight rates of convergence is relatively under-explored. Not only are the optimal rates of estimation for different causal effects under interference an open question but previously proposed designs are created in an ad-hoc fashion.
In this talk, we present the Conflict Graph Design, a general approach for constructing experimental designs to estimate causal effects under interference. Given a particular causal estimand (e.g. total treatment effect, direct effect, spill-over effect etc), we construct a so-called "conflict graph" which captures the fundamental unobservability associated with the estimand on the underlying network. The Conflict Graph Design aims to randomly assign treatment by first assigning "desired" exposures and then resolving these conflicts in desired exposures according to an algorithmically constructed importance ordering. In this way, the proposed experimental design depends on both the underlying network and the causal estimand under investigation. We show that a modified Horvitz--Thompson estimator attains a variance of $O( \lambda / n )$ under the design, where $\lambda$ is the largest eigenvalue of the adjacency matrix of the conflict graph, which is a global measure of connectivity. These rates improve upon the best known rates for a variety of estimands (e.g. total treatment effects and direct effects) and we conjecture that this rate is optimal. Finally, we provide consistent variance estimators and asymptotically valid confidence intervals, which facilitate inference of the causal effect under investigation.
Keywords
causal effects under interference