Semi-Parametric Batched Global Multi-Armed Bandits with Covariates

Hyebin Song Co-Author
Penn State
 
Sakshi Arya First Author
Case Western Reserve University
 
Sakshi Arya Presenting Author
Case Western Reserve University
 
Monday, Aug 4: 3:35 PM - 3:50 PM
1795 
Contributed Papers 
Music City Center 
In applications such as clinical trials, treatment decisions are usually made in phases/batches, where information from the previous batch is used to determine the treatments allocated in the upcoming batch. Such scenarios can naturally be seen to fall in the batched bandits framework. While batched bandit frameworks have been studied in parametric and nonparametric regression settings, we propose a novel semi-parametric bandit approach that promotes interpretability and dimension reduction in nonparametric batched bandits. We assume that the reward-covariate relationship can be modelled in a reduced 1-dimensional central subspace based on the single-index regression framework. We adopt an adaptive binning and successive elimination algorithm and provide optimal regret guarantees for the same. We also illustrate the performance of the algorithm on simulated and real datasets.

Keywords

multi-armed bandits

semi-parametric

single-index regression

dynamic binning

successive elimination

regret bounds 

Main Sponsor

Section on Statistical Learning and Data Science