Off-policy Evaluation with Irregular, Outcome-dependent Observation Process

Wenbin Lu Speaker
North Carolina State University
 
Wednesday, Aug 7: 2:30 PM - 2:55 PM
Invited Paper Session 
Oregon Convention Center 

Description

Off-Policy Evaluation (OPE) aims to estimate the value of a target policy using offline data collected from a potentially different policy. In this talk, we consider OPE in infinite horizon and irregular observation settings where the number of decision points diverges to infinity and the occurrence times of decision points are not regular and could be informative. We develop a new framework for OPE in a relatively general setting, and construct confidence intervals for a given policy's value with reinforcement learning techniques. Since the irregular decision making times could be informative, we also develop a Cox-type renewal process model on the occurrence times of the decision points, and based on the model, we develop an adaptive estimation procedure which leads to more efficient estimates for the policy value. Simulation studies and a real data application are conducted to illustrate the proposed methods.