flow-based-models

#flow-based-models

Moment Matching Q-Learning

arXiv cs.LG ↗ · 2026-05-29 Cached

Moment Matching Q-Learning (MoMa QL) uses maximum mean discrepancy to match all moment statistics for distribution-level convergence in offline RL, achieving computational efficiency and strong performance on D4RL tasks.

0 favorites 0 likes

flow-based-models

Moment Matching Q-Learning

Submit Feedback