Non-parametric Adaptive Estimation of Transition Kernels of Controlled Markov Chains

Imon Banerjee First Author
Purdue University
 
Imon Banerjee Presenting Author
Purdue University
 
Tuesday, Aug 6: 2:35 PM - 2:50 PM
3474 
Contributed Papers 
Oregon Convention Center 
A controlled Markov chain (CMC) is a paired process which constitute a Markovian state and a non-Markovian control. The control is a random variable which chooses a transition kernel and the state transitions according to that transition kernel. The recent popularity of model-based offline reinforcement learning has made learning this transition kernel (a.k.a. "model") an important open question. This talk aims to address that through the lenses of an adaptive, non-parametric, estimator. In particular, we will pose the estimator as a solution to a constrained minimax-optimisation problem and explore its finite sample risk bounds. We will also connect it to recent developments in the theory of model selection. Finally we will discuss some examples which illustrate the applicability of our setup on downstream estimation tasks.

Keywords

Markov chain

Controlled Markov Chain

Non-parametric estimation

Adaptive-estimation

besov-classes

optimisation 

Main Sponsor

IMS