Off-policy Evaluation with Irregular, Outcome-dependent Observation Process
Wenbin Lu
Speaker
North Carolina State University
Wednesday, Aug 7: 2:30 PM - 2:55 PM
Invited Paper Session
Oregon Convention Center
Off-Policy Evaluation (OPE) aims to estimate the value of a target policy using offline data collected from a potentially different policy. In this talk, we consider OPE in infinite horizon and irregular observation settings where the number of decision points diverges to infinity and the occurrence times of decision points are not regular and could be informative. We develop a new framework for OPE in a relatively general setting, and construct confidence intervals for a given policy's value with reinforcement learning techniques. Since the irregular decision making times could be informative, we also develop a Cox-type renewal process model on the occurrence times of the decision points, and based on the model, we develop an adaptive estimation procedure which leads to more efficient estimates for the policy value. Simulation studies and a real data application are conducted to illustrate the proposed methods.
You have unsaved changes.