behavior-regularization

#behavior-regularization

Reinforcement Learning via Value Gradient Flow

Hugging Face Daily Papers ↗ · 2026-04-15 Cached

Value Gradient Flow (VGF) presents a scalable approach to behavior-regularized reinforcement learning by formulating it as an optimal transport problem solved through discrete gradient flow, achieving state-of-the-art results on offline RL and LLM RL benchmarks. The method eliminates explicit policy parameterization while enabling adaptive test-time scaling by controlling transport budget.

0 favorites 0 likes

behavior-regularization

Reinforcement Learning via Value Gradient Flow

Submit Feedback