offline-rl

#offline-rl

ACSAC: Adaptive Chunk Size Actor-Critic with Causal Transformer Q-Network

arXiv cs.LG ↗ · 2d ago Cached

This paper introduces ACSAC, a reinforcement learning method that uses an adaptive chunk size actor-critic algorithm with a causal Transformer Q-network to handle long-horizon, sparse-reward tasks. It demonstrates state-of-the-art performance on manipulation tasks by dynamically adjusting action chunk sizes based on state-dependent needs.

0 favorites 0 likes

#offline-rl

Path-Coupled Bellman Flows for Distributional Reinforcement Learning

arXiv cs.LG ↗ · 3d ago Cached

This paper introduces Path-Coupled Bellman Flows (PCBF), a continuous-time distributional reinforcement learning method that uses flow matching to model return distributions without heuristic projections. It addresses boundary mismatch and high-variance issues in previous flow-based approaches by coupling current and successor return flows through shared base noise.

0 favorites 0 likes

#offline-rl

Adaptive Q-Chunking for Offline-to-Online Reinforcement Learning

arXiv cs.LG ↗ · 2026-05-08 Cached

This paper introduces Adaptive Q-Chunking (AQC), a reinforcement learning method that dynamically selects action chunk sizes to balance reactive control and long-horizon planning. It achieves state-of-the-art results on OGBench and Robomimic, enhancing the performance of large-scale VLA models in robotics tasks.

0 favorites 0 likes

#offline-rl

Reinforcement Learning via Value Gradient Flow

Hugging Face Daily Papers ↗ · 2026-04-15 Cached

Value Gradient Flow (VGF) presents a scalable approach to behavior-regularized reinforcement learning by formulating it as an optimal transport problem solved through discrete gradient flow, achieving state-of-the-art results on offline RL and LLM RL benchmarks. The method eliminates explicit policy parameterization while enabling adaptive test-time scaling by controlling transport budget.

0 favorites 0 likes

offline-rl

ACSAC: Adaptive Chunk Size Actor-Critic with Causal Transformer Q-Network

Path-Coupled Bellman Flows for Distributional Reinforcement Learning

Adaptive Q-Chunking for Offline-to-Online Reinforcement Learning

Reinforcement Learning via Value Gradient Flow

Submit Feedback