AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Papers with Code Trending Papers

Summary

AReaL is a fully asynchronous reinforcement learning system for LLM reasoning, achieving up to 2.57x training speedup over synchronous systems while maintaining or improving performance. It decouples generation and training to improve GPU utilization and includes optimizations like staleness-enhanced PPO.

Reinforcement learning (RL) has become a trending paradigm for training large language models (LLMs), particularly for reasoning tasks. Effective RL for LLMs requires massive parallelization and poses an urgent need for efficient training systems. Most existing large-scale RL systems for LLMs are synchronous by alternating generation and training in a batch setting, where the rollouts in each training batch are generated by the same (or latest) model. This stabilizes RL training but suffers from severe system-level inefficiency. Generation must wait until the longest output in the batch is completed before model update, resulting in GPU underutilization. We present AReaL, a fully asynchronous RL system that completely decouples generation from training. Rollout workers in AReaL continuously generate new outputs without waiting, while training workers update the model whenever a batch of data is collected. AReaL also incorporates a collection of system-level optimizations, leading to substantially higher GPU utilization. To stabilize RL training, AReaL balances the workload of rollout and training workers to control data staleness, and adopts a staleness-enhanced PPO variant to better handle outdated training samples. Extensive experiments on math and code reasoning benchmarks show that AReaL achieves up to 2.57times training speedup compared to the best synchronous systems with the same number of GPUs and matched or even improved final performance. The code of AReaL is available at https://github.com/inclusionAI/AReaL/.
Original Article
View Cached Full Text

Cached at: 07/02/26, 03:44 PM

Paper page - AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Source: https://huggingface.co/papers/2505.24298 Published on May 30, 2025

Abstract

AReaL, a fully asynchronous reinforcement learning system, decouples generation and training to achieve higher GPU utilization and up to 2.57x training speedup for large language models on reasoning tasks.

Reinforcement learning(RL) has become a trending paradigm for training large language models (LLMs), particularly for reasoning tasks. Effective RL for LLMs requires massive parallelization and poses an urgent need for efficient training systems. Most existing large-scale RL systems for LLMs are synchronous by alternating generation and training in a batch setting, where therolloutsin each training batch are generated by the same (or latest) model. This stabilizes RL training but suffers from severe system-level inefficiency. Generation must wait until the longest output in the batch is completed beforemodel update, resulting in GPU underutilization. We present AReaL, a fully asynchronous RL system that completely decouples generation from training. Rollout workers in AReaL continuously generate new outputs without waiting, while training workers update the model whenever a batch of data is collected. AReaL also incorporates a collection of system-level optimizations, leading to substantially higherGPU utilization. To stabilize RL training, AReaL balances the workload of rollout and training workers to control data staleness, and adopts a staleness-enhancedPPOvariant to better handle outdated training samples. Extensive experiments on math and code reasoning benchmarks show that AReaL achieves up to 2.57times training speedup compared to the best synchronous systems with the same number of GPUs and matched or even improved final performance. The code of AReaL is available at https://github.com/inclusionAI/AReaL/.

View arXiv pageView PDFGitHub5.43kAdd to collection

Get this paper in your agent:

hf papers read 2505\.24298

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper6

#### inclusionAI/AReaL-boba-2-8B Text Generation• UpdatedJun 13, 2025 • 292 • 28 #### inclusionAI/AReaL-boba-2-14B Text Generation• UpdatedJun 10, 2025 • 109 • 22 #### inclusionAI/AReaL-boba-2-8B-Open Text Generation• UpdatedJun 4, 2025 • 93 • 20 #### inclusionAI/AReaL-boba-2-14B-Open Text Generation• UpdatedJun 4, 2025 • 110 • 20 Browse 6 models citing this paper## Datasets citing this paper1

#### inclusionAI/AReaL-tau2-data Preview• UpdatedMar 2 • 474 • 13

Spaces citing this paper1

Collections including this paper5

Browse 5 collections that include this paper

Similar Articles

Adaptive Latent Agentic Reasoning

arXiv cs.CL

This paper introduces Adaptive Latent Agentic Reasoning (ALAR), a dual-mode framework for LLM agents that uses compact latent reasoning for routine turns and selectively escalates to explicit chain-of-thought for harder decisions, achieving up to 84.6% token reduction while maintaining task accuracy.

ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning

arXiv cs.CL

ARES proposes a framework for automatically constructing rubric-based RL data from pretraining documents, generating question-answer pairs and weighted rubrics to enable instance-level reward supervision for open-ended LLM responses, outperforming existing methods on multi-dimensional open-ended tasks.