@ShaokunZhang1: Want to train your own Claude Code/Codex agent with your own model? We are excited to roll out ProRL Agent V2: Polar. A…

X AI KOLs Timeline 05/26/26, 05:39 PM Tools

agentic-rl reinforcement-learning open-source nvidia framework llm-agents training

Summary

NVIDIA releases Polar, an open-source infrastructure for black-box agentic reinforcement learning, enabling training of coding agents like Claude Code or Codex with any agent harness or framework.

Want to train your own Claude Code/Codex agent with your own model? We are excited to roll out ProRL Agent V2: Polar. An infrastructure for black-box agentic RL, Polar lets you train agents with any harness, whether it’s OpenClaw, Hermes, or a custom agent built with frameworks like LangChain, Autogen, AG2 and others. Check out here: Code: https://github.com/NVIDIA-NeMo/ProRL-Agent-Server… Paper: https://arxiv.org/pdf/2605.24220 Welcome to the world of agentic RL, without opening the box.

Original Article

View Cached Full Text

Cached at: 05/26/26, 09:14 PM

Want to train your own Claude Code/Codex agent with your own model? We are excited to roll out ProRL Agent V2: Polar.

An infrastructure for black-box agentic RL, Polar lets you train agents with any harness, whether it’s OpenClaw, Hermes, or a custom agent built with frameworks like LangChain, Autogen, AG2 and others.

Check out here: Code: https://github.com/NVIDIA-NeMo/ProRL-Agent-Server… Paper: https://arxiv.org/pdf/2605.24220

Welcome to the world of agentic RL, without opening the box.

NVIDIA-NeMo/ProRL-Agent-Server

Source: https://github.com/NVIDIA-NeMo/ProRL-Agent-Server

Polar is a RL rollout framework for real-world agent harnesses.

Harness as Environment. Bring your agent harnesses as RL-ready environments without code change.
Smart Rollout Pipeline. Save GPU hours with Polar’s parallel Rollout Staging & Runtime Pooling.
Rollout as a Service. Server mode by design – scaling Async RL with any training frameworks.

Architecture Overview

Polar rollout architecture

The Rollout Server manages and dispatches client requests into distributed Gateway Nodes, which asynchronously prepare runtime, execute agents, build trajectories and evaluate them. Agent harnesses are listened by a proxy that sits between agnostic agent execution processes and inference servers.

Installation

🟩 Install the Rollout Server (Polar):

uv venv
uv pip install -e .

🟩 Install the Inference Server (SGLang):

uv pip install --prerelease=allow sglang==0.5.10
bash scripts/patch/patch_sglang.sh

The patch applies necessary TITO and prompt token id emission on pinned sglang version. We’ll remove this once upstream supports go through. vllm integration is on the way.

🟩 Polar is trainer agnostic. So choice of Trainer and Training Backend are highly flexible given Polar’s server boundaries.

Currently, we provide a demo-purpose Slime integration in Slime bridge installation guide.

🟩 (Optional) For SWE-bench official evaluation harness:

uv pip install -e ".[swebench]"

🟩 (Optional) To enable polar dashboard UI, build the frontend once.

cd web && npm install && npm run build

Usage Guide

⭐ Choose your Agent Harness: pick a built-in harness, or use the generic shell harness with wrapped agents.
🚀 Trajectory Construction and Eval: See builder and evaluator guides for registered strategies.
🔧 Deployment Topology: configure the Polar service.
▶️ Request for Rollout: client side task submission via rollout API.

CLI Interface

A typical local run uses five commands. Each takes the same topology.yaml.

polar serve_rollout   -c topology.yaml                            # central orchestrator (port 8080)
polar serve_gateway   -c topology.yaml --node-id <node>           # one per gateway node (port 8100+)
polar dashboard       -c topology.yaml [--port 8090]              # observability & monitoring dashboard
polar submit          <task.json|yaml> -c topology.yaml           # submit a task and tail it
polar status          -c topology.yaml                            # one-shot health / topology check

Examples

Calculator: minimal smoke test.
Count Stars: minimal test for VLM.
SWE-bench Verified: benchmark-style evaluation on SWE-bench Verified tasks.
SWE-Gym Slime GRPO: training path that connects Polar rollouts to Slime.

Polar rollout architecture

🟩 We are adding new examples for different tasks / models on diverse hardware setups. Contributions are welcome!

Roadmap

Our development goal for Polar is low-intrusion and neutral, finding the lowest common ancestor to cover and support diverse training and inference frameworks.

Initial release & tech report.
Slime bridge & RL example.
CUA (VLM / VLA) Support.
More built-in evaluators (eg. self distillation with textual feedback).
vLLM dual inference support.
More trainer bridges (NemoRL, VERL, etc.).

📖 Reference

If you find it useful, please consider citing our work:

@article{xu2026polar,
  title={Polar: Agentic RL on Any Harness at Scale},
  author={Xu, Binfeng and Zhang, Hao and Zhang, Shaokun and Han, Songyang and Liu, Mingjie and Hu, Jian and Diao, Shizhe and Jin, Zhenghui and Zou, Yunheng and Demoret, Michael and Kautz, Jan and Dong, Yi},
  journal={arXiv preprint arXiv:2605.24220},
  year={2026}
}

@article{zhang2026prorl,
  title={ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents},
  author={Zhang, Hao and Liu, Mingjie and Zhang, Shaokun and Han, Songyang and Hu, Jian and Jin, Zhenghui and Zhang, Yuchi and Diao, Shizhe and Lu, Ximing and Xu, Binfeng and others},
  journal={arXiv preprint arXiv:2603.18815},
  year={2026}
}

Binfeng Xu (@billxbf): Excited to release 🌟Polar🌟, our Agent RL rollout infra for real-world harnesses. Be it Codex, Claude Code, OpenClaw, Hermes, or your self-made ones 🔥 – Polar takes your harnesses directly as training environments without code change.

Find a problem, design the harness, and

@ShaokunZhang1: Want to train your own Claude Code/Codex agent with your own model? We are excited to roll out ProRL Agent V2: Polar. A…

NVIDIA-NeMo/ProRL-Agent-Server

Architecture Overview

Installation

🟩 Install the Rollout Server (Polar):

🟩 Install the Inference Server (SGLang):

🟩 Polar is trainer agnostic. So choice of Trainer and Training Backend are highly flexible given Polar’s server boundaries.

🟩 (Optional) For SWE-bench official evaluation harness:

🟩 (Optional) To enable polar dashboard UI, build the frontend once.

Usage Guide

CLI Interface

Examples

Roadmap

📖 Reference

Similar Articles

@billxbf: Excited to release Polar, our Agent RL rollout infra for real-world harnesses. Be it Codex, Claude Code, OpenClaw, Herm…

@SergioPaniego: frontier agents are this good partly because the model was trained inside the very harness it ships with great to see t…

@svpino: You can now have Claude Code collaborate autonomously with Codex and any other agent. This is going to break the intern…

@JinjingLiang: Want to ditch Claude? Someone just released the any-agent version of 'Claude Agent View' Use Codex, Pi, Droid, whatever…

I rebuilt a Claude Code–style coding agent from scratch — the whole agent loop is 6 lines. 20 chapters, ~5k lines, no frameworks, runs on local models too

Submit Feedback

Similar Articles

@billxbf: Excited to release Polar, our Agent RL rollout infra for real-world harnesses. Be it Codex, Claude Code, OpenClaw, Herm…

@SergioPaniego: frontier agents are this good partly because the model was trained inside the very harness it ships with great to see t…

@svpino: You can now have Claude Code collaborate autonomously with Codex and any other agent. This is going to break the intern…

@JinjingLiang: Want to ditch Claude? Someone just released the any-agent version of 'Claude Agent View' Use Codex, Pi, Droid, whatever…

I rebuilt a Claude Code–style coding agent from scratch — the whole agent loop is 6 lines. 20 chapters, ~5k lines, no frameworks, runs on local models too