Tag
This paper systematically identifies all qualitatively different extreme learning regimes for large weight-tied linear autoencoders, deriving explicit loss evolutions for five regimes associated with the faces of a triangular prism.
Value Gradient Flow (VGF) presents a scalable approach to behavior-regularized reinforcement learning by formulating it as an optimal transport problem solved through discrete gradient flow, achieving state-of-the-art results on offline RL and LLM RL benchmarks. The method eliminates explicit policy parameterization while enabling adaptive test-time scaling by controlling transport budget.