Offline Multi-Dimensional Distributional Reinforcement Learning: A Hilbert Space Embedding Approach
Qi Zheng
Co-Author
University of Louisville
Ruoqing Zhu
Co-Author
University of Illinois Urbana-Champaign
Monday, Aug 4: 3:05 PM - 3:20 PM
1462
Contributed Papers
Music City Center
We propose an offline distributional reinforcement learning framework that leverages Hilbert space embeddings to estimate the multi-dimensional value distribution under a proposed target policy. By mapping probability measures into a reproducing kernel Hilbert space via kernel mean embeddings, our method replaces Wasserstein metrics with a novel integral probability metric. This enables efficient estimation in multi-dimensional state–action spaces and reward settings, where direct computation of Wasserstein distances is computationally challenging. Theoretically, we establish contraction properties of the distributional Bellman operator under our proposed metric and provide uniform convergence guarantees. Empirical results demonstrate improved convergence rates and robust off-policy evaluation under mild assumptions, namely, Lipschitz continuity and boundedness for the Matérn family of kernels, highlighting the potential of our embedding-based approaches in complex, real-world decision-making scenarios.
Reinforcement Learning
Wasserstein Distance
Reproducing Kernel Hilbert Space
Non-parametric
Matérn Kernel
Main Sponsor
Section on Statistical Learning and Data Science
You have unsaved changes.