Risk-inclusive Contextual Bandits for Early Phase Clinical Trials

Chunlin Li Co-Author
Iowa State University
 
Margaret Gamalo Co-Author
Pfizer
 
Zara Ghodsi Co-Author
Pfizer
 
Rohit Kanrar First Author
 
Rohit Kanrar Presenting Author
 
Wednesday, Aug 6: 11:50 AM - 12:05 PM
2370 
Contributed Papers 
Music City Center 
Early-phase clinical trials face the challenge of selecting optimal drug doses that balance safety and efficacy due to uncertain dose-response relationships and varied participant characteristics. Traditional randomized dose allocation often exposes participants to sub-optimal doses by not considering individual covariates, leading to suboptimal dosing, larger sample sizes, and longer trials. This paper introduces a risk-inclusive contextual bandit algorithm leveraging multi-arm bandit (MAB) strategies to optimize dosing using participant-specific data. The algorithm improves dose allocation balance by integrating separate Thompson samplers for efficacy and safety. Effect sizes are estimated robustly with a generalized version of the asymptotic confidence sequence (AsympCS) method (Waudby-Smith et al., 2024), ensuring uniform coverage for effect sizes over time. AsympCS validity is also established in the MAB framework. Empirical results show the method outperforms randomized and efficacy-focused Thompson samplers, with real-data application from a Phase IIb study aligning with actual findings.

Keywords

Anytime-valid policy evaluation

Dose-ranging studies

Efficacy and Safety

Model-assisted inference

Sequential causal inference 

Main Sponsor

Biopharmaceutical Section