AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
Summary
AReaL is a fully asynchronous reinforcement learning system for LLM reasoning, achieving up to 2.57x training speedup over synchronous systems while maintaining or improving performance. It decouples generation and training to improve GPU utilization and includes optimizations like staleness-enhanced PPO.
View Cached Full Text
Cached at: 07/02/26, 03:44 PM
Paper page - AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
Source: https://huggingface.co/papers/2505.24298 Published on May 30, 2025
Abstract
AReaL, a fully asynchronous reinforcement learning system, decouples generation and training to achieve higher GPU utilization and up to 2.57x training speedup for large language models on reasoning tasks.
Reinforcement learning(RL) has become a trending paradigm for training large language models (LLMs), particularly for reasoning tasks. Effective RL for LLMs requires massive parallelization and poses an urgent need for efficient training systems. Most existing large-scale RL systems for LLMs are synchronous by alternating generation and training in a batch setting, where therolloutsin each training batch are generated by the same (or latest) model. This stabilizes RL training but suffers from severe system-level inefficiency. Generation must wait until the longest output in the batch is completed beforemodel update, resulting in GPU underutilization. We present AReaL, a fully asynchronous RL system that completely decouples generation from training. Rollout workers in AReaL continuously generate new outputs without waiting, while training workers update the model whenever a batch of data is collected. AReaL also incorporates a collection of system-level optimizations, leading to substantially higherGPU utilization. To stabilize RL training, AReaL balances the workload of rollout and training workers to control data staleness, and adopts a staleness-enhancedPPOvariant to better handle outdated training samples. Extensive experiments on math and code reasoning benchmarks show that AReaL achieves up to 2.57times training speedup compared to the best synchronous systems with the same number of GPUs and matched or even improved final performance. The code of AReaL is available at https://github.com/inclusionAI/AReaL/.
View arXiv pageView PDFGitHub5.43kAdd to collection
Get this paper in your agent:
hf papers read 2505\.24298
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper6
#### inclusionAI/AReaL-boba-2-8B Text Generation• UpdatedJun 13, 2025 • 292 • 28
#### inclusionAI/AReaL-boba-2-14B Text Generation• UpdatedJun 10, 2025 • 109 • 22
#### inclusionAI/AReaL-boba-2-8B-Open Text Generation• UpdatedJun 4, 2025 • 93 • 20
#### inclusionAI/AReaL-boba-2-14B-Open Text Generation• UpdatedJun 4, 2025 • 110 • 20
Browse 6 models citing this paper## Datasets citing this paper1
#### inclusionAI/AReaL-tau2-data Preview• UpdatedMar 2 • 474 • 13
Spaces citing this paper1
Collections including this paper5
Similar Articles
REAL: A Reasoning-Enhanced Graph Framework for Long-Term Memory Management of LLMs
REAL is a reasoning-enhanced graph framework for long-term memory management of LLMs that uses temporal and confidence-aware directed property graphs with non-destructive temporal updates and hybrid beam search retrieval, achieving an average improvement of 22.72%.
@jiqizhixin: Awesome blog! State of RL for reasoning LLMs https://aweers.de/blog/2026/rl-for-llms/…
A comprehensive blog post reviewing the state of reinforcement learning for reasoning LLMs, covering methods from REINFORCE and PPO to GRPO and beyond, with connections to key models like InstructGPT and DeepSeek-R1.
Adaptive Latent Agentic Reasoning
This paper introduces Adaptive Latent Agentic Reasoning (ALAR), a dual-mode framework for LLM agents that uses compact latent reasoning for routine turns and selectively escalates to explicit chain-of-thought for harder decisions, achieving up to 84.6% token reduction while maintaining task accuracy.
ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning
ARES proposes a framework for automatically constructing rubric-based RL data from pretraining documents, generating question-answer pairs and weighted rubrics to enable instance-level reward supervision for open-ended LLM responses, outperforming existing methods on multi-dimensional open-ended tasks.
LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models
LEAD dynamically adapts reasoning efficiency during training by using online calibration of correctness-efficiency trade-offs and adaptive problem-specific length targets, improving mathematical reasoning accuracy and reducing output length.