Offline Multi-Dimensional Distributional Reinforcement Learning: A Hilbert Space Embedding Approach

Qi Zheng Co-Author
University of Louisville
 
Ruoqing Zhu Co-Author
University of Illinois Urbana-Champaign
 
Mehrdad Mohammadi First Author
University of Illinois Urbana-Champaign
 
Mehrdad Mohammadi Presenting Author
University of Illinois Urbana-Champaign
 
Monday, Aug 4: 3:05 PM - 3:20 PM
1462 
Contributed Papers 
Music City Center 
We propose an offline distributional reinforcement learning framework that leverages Hilbert space embeddings to estimate the multi-dimensional value distribution under a proposed target policy. By mapping probability measures into a reproducing kernel Hilbert space via kernel mean embeddings, our method replaces Wasserstein metrics with a novel integral probability metric. This enables efficient estimation in multi-dimensional state–action spaces and reward settings, where direct computation of Wasserstein distances is computationally challenging. Theoretically, we establish contraction properties of the distributional Bellman operator under our proposed metric and provide uniform convergence guarantees. Empirical results demonstrate improved convergence rates and robust off-policy evaluation under mild assumptions, namely, Lipschitz continuity and boundedness for the Matérn family of kernels, highlighting the potential of our embedding-based approaches in complex, real-world decision-making scenarios.

Keywords

Reinforcement Learning

Wasserstein Distance

Reproducing Kernel Hilbert Space

Non-parametric

Matérn Kernel 

Main Sponsor

Section on Statistical Learning and Data Science