agentic-rl

#agentic-rl

@cwolferesearch: I just published a blog on agentic RL that covers 10+ recent frameworks in the space. Here are the key takeaways… Link …

X AI KOLs Timeline ↗ · yesterday Cached

A blog post summarizing ten recent agentic RL frameworks and best practices, covering modular interfaces, trajectory structure, action masks, process rewards, advantage normalization, scalable rollouts, stability/exploration, and task curriculum.

0 favorites 0 likes

#agentic-rl

@cwolferesearch: I've been reading a ton of agentic RL papers recently. Out of all the work, one of the only commonly-used tricks is act…

X AI KOLs Timeline ↗ · 4d ago Cached

Discussion of recent agentic RL papers, highlighting action masking as a common technique and its evolution with world modeling papers like ECHO and PaW.

0 favorites 0 likes

#agentic-rl

APPO: Agentic Procedural Policy Optimization

Hugging Face Daily Papers ↗ · 2026-06-10 Cached

APPO improves multi-turn tool-use in LLM agents by refining branching decisions and credit assignment using fine-grained decision points and procedure-level advantage scaling, outperforming baselines by 4 points on 13 benchmarks.

0 favorites 0 likes

#agentic-rl

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

Hugging Face Daily Papers ↗ · 2026-06-09 Cached

TRACE is a unified rollout budget allocation framework that enhances reward contrast in multi-turn agentic reinforcement learning by dynamically distributing resources across tree-structured rollouts based on prefix-level informativeness. It improves efficiency and accuracy on agentic benchmarks like Multi-Hop QA.

0 favorites 0 likes

#agentic-rl

The Open Source Community is backing OpenEnv for Agentic RL

Hugging Face Blog ↗ · 2026-06-08 Cached

OpenEnv, a library for creating agentic execution environments to train open source agents with reinforcement learning, is becoming more open with a new governance committee including Meta-PyTorch, Hugging Face, Nvidia, and others, aiming to provide a protocol layer that works across models and harnesses.

0 favorites 0 likes

#agentic-rl

@SergioPaniego: frontier agents are this good partly because the model was trained inside the very harness it ships with great to see t…

X AI KOLs Timeline ↗ · 2026-06-05 Cached

Sergio Paniego highlights that frontier agents' performance is due to models being trained inside their deployment harness. The new work 'Polar: Agentic RL on Any Harness at Scale' by NVIDIA AI enables turning harnesses like Codex, Claude Code, Qwen Code, or Pi into RL training environments without modifying their internals.

0 favorites 0 likes

#agentic-rl

StepPO: Step-Aligned Policy Optimization for Agentic Reinforcement Learning

Hugging Face Daily Papers ↗ · 2026-06-05 Cached

StepPO introduces a step-centric paradigm for agentic reinforcement learning that aligns policy optimization with agent decision granularity, outperforming token-centric methods in multi-turn interaction tasks.

0 favorites 0 likes

#agentic-rl

Agentic RL: Token-In, Token-Out Done Right (16 minute read)

TLDR AI ↗ · 2026-06-01 Cached

This article explains the 'Token-In, Token-Out' (TITO) invariant in reinforcement learning for LLMs, highlighting a common error when training multi-turn agents with tool calls. It presents two solutions: using per-model renderers or designing training to avoid re-encoding decoded tokens, emphasizing prefix-preserving chat templates.

0 favorites 0 likes

#agentic-rl

@yuwen_lu_: I'm halfway through, damn why did no one ever tell me RL is this fun

X AI KOLs Timeline ↗ · 2026-05-30 Cached

Sanbu 散步 released a modern RL tutorial Hands-On Modern RL, covering from CartPole+PPO basics to LLM post-training (RLHF, DPO, GRPO) and Agentic RL, code-first, English version coming soon.

0 favorites 0 likes

#agentic-rl

Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning

Hugging Face Daily Papers ↗ · 2026-05-27 Cached

Skill0.5 is a novel agentic reinforcement learning framework that combines general skill internalization with task-specific skill utilization via a dynamic difficulty-aware router, improving out-of-distribution generalization in complex task environments as demonstrated on ALFWorld and WebShop.

0 favorites 0 likes

#agentic-rl

@ShaokunZhang1: Want to train your own Claude Code/Codex agent with your own model? We are excited to roll out ProRL Agent V2: Polar. A…

X AI KOLs Timeline ↗ · 2026-05-26 Cached

NVIDIA releases Polar, an open-source infrastructure for black-box agentic reinforcement learning, enabling training of coding agents like Claude Code or Codex with any agent harness or framework.

0 favorites 0 likes

#agentic-rl

Efficient Agentic Reinforcement Learning with On-Policy Intrinsic Knowledge Boundary Enhancement

Hugging Face Daily Papers ↗ · 2026-05-26 Cached

This paper proposes AKBE, an on-policy method for LLM agent reinforcement learning that dynamically identifies when tool use is needed versus when internal knowledge suffices, improving accuracy by +1.85 on average and reducing tool calls by 18% over standard agentic RL.

0 favorites 0 likes

#agentic-rl

Masked Diffusion Language Models are Strong and Steerable Text-Based World Models for Agentic RL [R]

Reddit r/MachineLearning ↗ · 2026-05-21

This paper proposes using Masked Diffusion Language Models (MDLMs) as text-based world models for agentic reinforcement learning, showing that their any-order denoising objective avoids prefix mode collapse and leads to stronger performance than autoregressive baselines.

0 favorites 0 likes

#agentic-rl

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

Hugging Face Daily Papers ↗ · 2026-05-18 Cached

EnvFactory automates the creation of executable tool environments and natural multi-turn trajectories for training LLMs with agentic reinforcement learning, achieving superior performance on benchmarks like BFCLv3 and MCP-Atlas with fewer environments than prior work.

0 favorites 0 likes

#agentic-rl

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

Hugging Face Daily Papers ↗ · 2026-04-21 Cached

DR-Venus-4B is a 4B-parameter deep-research agent trained on only 10K open samples via agentic SFT+RL with turn-level rewards, outrunning prior sub-9B agents and rivaling 30B models on research benchmarks while staying deployable on edge devices.

0 favorites 0 likes

agentic-rl

Submit Feedback