AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs

Hugging Face Daily Papers 05/15/26, 12:00 AM Papers

reinforcement-learning agentic-llm dataflow multi-policy elastic-scaling training llm-agents

Summary

AstraFlow is a dataflow-oriented RL system that enables efficient multi-policy collaborative training and elastic scaling for agentic LLMs, achieving a 2.7x training speedup over existing systems.

Reinforcement learning (RL) is increasingly used to improve the reasoning, coding, and tool-use capabilities of large language models, but agentic RL remains prohibitively expensive. Scaling RL to agentic LLMs requires supporting complex workloads, including multi-policy collaborative training, while efficiently using elastic, heterogeneous, and cross-region compute resources. Existing LLM RL systems support some of these capabilities, but each new extension often requires dedicated system engineering. This burden arises from trainer-centered control architectures and the lack of principled abstractions for RL system components. To address these limitations, we propose AstraFlow, a dataflow-oriented RL system that replaces conventional trainer-centered control with principled component abstractions. In AstraFlow, rollout services, dataflow management, and training are decoupled into autonomous components, enabling the system to natively support complex multi-policy agentic RL workloads and efficiently exploit diverse compute resources. We evaluate AstraFlow across math, code, search, and AgentBench workloads, showing that the same system supports multi-policy training, elastic scaling, heterogeneous cross-region execution, and composable data algorithms without system-level code changes. In multi-policy collaborative training, AstraFlow achieves comparable or better accuracy than existing RL systems while speeding up training time by 2.7x.

Original Article

View Cached Full Text

Cached at: 05/19/26, 06:33 PM

Paper page - AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs

Source: https://huggingface.co/papers/2605.15565

Abstract

AstraFlow is a dataflow-oriented reinforcement learning system that enables efficient multi-policy collaborative training and elastic scaling across diverse compute resources for large language model agents.

Reinforcement learning(RL) is increasingly used to improve the reasoning, coding, and tool-use capabilities oflarge language models, butagentic RLremains prohibitively expensive. Scaling RL to agentic LLMs requires supporting complex workloads, includingmulti-policy collaborative training, while efficiently using elastic, heterogeneous, and cross-region compute resources. Existing LLM RL systems support some of these capabilities, but each new extension often requires dedicated system engineering. This burden arises fromtrainer-centered controlarchitectures and the lack of principled abstractions for RL system components. To address these limitations, we propose AstraFlow, a dataflow-oriented RL system that replaces conventionaltrainer-centered controlwith principledcomponent abstractions. In AstraFlow,rollout services,dataflow management, andtrainingare decoupled into autonomous components, enabling the system to natively support complex multi-policyagentic RLworkloads and efficiently exploit diverse compute resources. We evaluate AstraFlow across math, code, search, and AgentBench workloads, showing that the same system supports multi-policytraining,elastic scaling, heterogeneous cross-region execution, and composable data algorithms without system-level code changes. Inmulti-policy collaborative training, AstraFlow achieves comparable or better accuracy than existing RL systems while speeding uptrainingtime by 2.7x.

View arXiv page View PDF Project page GitHub5 Add to collection

Get this paper in your agent:

hf papers read 2605\.15565

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.15565 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.15565 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.15565 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs

Paper page - AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs

Learning Agentic Policy from Action Guidance

AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning

UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering

SkillFlow: Flow-Driven Recursive Skill Evolution for Agentic Orchestration

Submit Feedback

Similar Articles

When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs

Learning Agentic Policy from Action Guidance

AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning

UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering

SkillFlow: Flow-Driven Recursive Skill Evolution for Agentic Orchestration