New Methods in Causal Inference and Reinforcement Learning for Personalized Decision-Making

Ziping Xu Chair
 
Yongyi Guo Organizer
 
Wednesday, Aug 7: 10:30 AM - 12:20 PM
1201 
Invited Paper Session 
Oregon Convention Center 
Room: CC-D136 

Applied

Yes

Main Sponsor

ENAR

Co Sponsors

IMS
Society for Medical Decision Making

Presentations

The promises of multiple outcomes

A key challenge in causal inference from observational studies is the identification and estimation of causal effects in the presence of unmeasured confounding. In this paper, we introduce a novel approach for causal inference that leverages information in multiple outcomes to deal with unmeasured confounding. The key assumption in our approach is conditional independence among multiple outcomes. In contrast to existing proposals in the literature, the roles of multiple outcomes in our key identification assumption are symmetric, hence the name parallel outcomes. We show nonparametric identifiability with at least three parallel outcomes and provide parametric estimation tools under a set of linear structural equation models. Our proposal is evaluated through a set of synthetic and real data analyses. 

Co-Author(s)

Ying Zhou, University of Connecticut
Dingke Tang, University of Toronto
Dehan Kong, University of Toronto

Speaker

Linbo Wang

Balancing Personalization and Pooling: Decision-making and Statistical Inference with Limited Time Horizons

In contrast to traditional clinical trials, digital health interventions facilitate adaptive personalized treatments delivered in near real-time to manage health risks and promote healthy behaviors. Integrating Reinforcement Learning (RL) algorithms into mHealth (mobile health) studies presents numerous challenges, with a critical one being the constrained time horizon leading to data scarcity, affecting decision quality, as well as the autonomy and stability of RL algorithms in practical applications.

To address this challenge, we propose a solution for online decision-making and post-study statistical inference. Leveraging the mixed-effects reward model in Thompson sampling, we efficiently utilize user data to expedite informed decision-making. The online algorithm makes traditional statistical analysis for the treatment effect invalid: The user history are not independent even if we assume the potential outcomes are i.i.d. This is because the RL algorithm makes decisions using pooled user information in addition to the user state variables. We provide valid asymptotic confidence intervals for the average causal excursion effect using the idea of decomposing the policy into 'population statistics' and decisions based on '(expanded) user states'. As an example, I will present the MiWaves clinical trial, which is an AI-based mobile health intervention to reduce cannabis use amongst emerging adults.  

Speaker

Yongyi Guo

Did We Personalize? Assessing Personalization by an Online Reinforcement Learning Algorithm Using Resampling

There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user's context (e.g., prior activity level, location, etc.). Online RL is a promising data-driven approach for this problem as it learns based on each user's historical responses and uses that knowledge to personalize these decisions. However, to decide whether the RL algorithm should be included in an ``optimized'' intervention for real-world deployment, we must assess the data evidence indicating that the RL algorithm is actually personalizing the treatments to its users. Due to the stochasticity in the RL algorithm, one may get a false impression that it is learning in certain states and using this learning to provide specific treatments. We use a working definition of personalization and introduce a resampling-based methodology for investigating whether the personalization exhibited by the RL algorithm is an artifact of the RL algorithm stochasticity, and illustrate it via a mobile health case study. 

Speaker

Raaz Dwivedi, UC Berkeley

Further Results on Target Trials and Structural Nested Models: Emulating RCTs using Observational Longitudinal Data

Target trials are RCTs one would like to conduct but cannot for ethical, financial, and/or logistical reasons. Consequently, we must emulate such trials from observational data. A novel aspect of target trial methodology is that, for purposes of data analysis, each subject in the observational study is 'enrolled' in all target trials for which the subject is eligible, instead of a single trial. I will discuss new results related to target trials and structural nested models including the handling of k incompatible treatment arms, treatment arms that are initially the same, failure to meet eligibility criteria in one or more trials, and optimal treatment regime estimation. 

Speaker

James Robins, Harvard School of Public Health

Presentation

Speaker

Anish Agarwal, Columbia University