Tag
Moment Matching Q-Learning (MoMa QL) uses maximum mean discrepancy to match all moment statistics for distribution-level convergence in offline RL, achieving computational efficiency and strong performance on D4RL tasks.