Tag
This paper introduces ACSAC, a reinforcement learning method that uses an adaptive chunk size actor-critic algorithm with a causal Transformer Q-network to handle long-horizon, sparse-reward tasks. It demonstrates state-of-the-art performance on manipulation tasks by dynamically adjusting action chunk sizes based on state-dependent needs.
This paper introduces Path-Coupled Bellman Flows (PCBF), a continuous-time distributional reinforcement learning method that uses flow matching to model return distributions without heuristic projections. It addresses boundary mismatch and high-variance issues in previous flow-based approaches by coupling current and successor return flows through shared base noise.
This paper introduces Adaptive Q-Chunking (AQC), a reinforcement learning method that dynamically selects action chunk sizes to balance reactive control and long-horizon planning. It achieves state-of-the-art results on OGBench and Robomimic, enhancing the performance of large-scale VLA models in robotics tasks.
Value Gradient Flow (VGF) presents a scalable approach to behavior-regularized reinforcement learning by formulating it as an optimal transport problem solved through discrete gradient flow, achieving state-of-the-art results on offline RL and LLM RL benchmarks. The method eliminates explicit policy parameterization while enabling adaptive test-time scaling by controlling transport budget.