Tag
Moment Matching Q-Learning (MoMa QL) uses maximum mean discrepancy to match all moment statistics for distribution-level convergence in offline RL, achieving computational efficiency and strong performance on D4RL tasks.
This paper develops a PAC-Bayesian framework for test-time adaptation that uses MMD-balls as credal sets, providing formal generalization bounds and separating epistemic from aleatoric uncertainty under distribution shift.