A Minimax Approach for Optimal Incentive Policy Learning

Yue Liu Co-Author
School of Statistics, Renmin University of China
 
Hao Mei Co-Author
School of Statistics, Renmin University of China
 
Chenyang Li First Author
School of Statistics, Renmin University of China
 
Hao Mei Presenting Author
School of Statistics, Renmin University of China
 
Sunday, Aug 3: 2:05 PM - 2:20 PM
1767 
Contributed Papers 
Music City Center 
While strategic incentive policies are essential in personalized services, variations in user responses and revenue potentials create significant challenges in identifying optimal incentive policies. Existing literature typically falls short in fully identification of all user types and often assumes a uniform conversion revenue, leading to inefficient incentive allocations. In this study, we propose a minimax policy learning approach within a counterfactual principal strata framework. A value function, accommodating varying rewards across six potentially non-identifiable principal strata, is designed to minimize the worst-case value loss relative to three alternative policies: never-treat, always-treat, and oracle. To learn an optimal policy, we introduce three estimators: Principal Outcome Regression (P-OR), Principal Inverse Propensity Scoring (P-IPS), and Principal Doubly Robust (P-DR), providing theoretical guarantees for their unbiasedness, robustness, and regret upper bound. Extensive numerical experiments validate the effectiveness and superiority of our proposed approach.

Keywords

Policy learning

Minimax

Counterfactual principal strata

Partial identification

Causal inference 

Main Sponsor

International Chinese Statistical Association