@apurvasgandhi: Sub-agents are a promising inference-time scaling primitive: • Expand an agent's working memory • Divide-and-conquer ha…

X AI KOLs Timeline Papers

Summary

RAO (Recursive Agent Optimization) is an end-to-end reinforcement learning approach for training LLM agents to spawn, delegate to, and coordinate with recursive copies of themselves, turning recursive inference into a learned capability.

Sub-agents are a promising inference-time scaling primitive: • Expand an agent's working memory • Divide-and-conquer hard problems • Solve problems faster with parallel execution But how do we train a model to best take advantage of sub-agents and make sure we get these benefits? Very excited to release RAO: Recursive Agent Optimization. RAO is an end-to-end reinforcement learning approach for training LLM agents to spawn, delegate to, and coordinate with recursive copies of themselves (that can themselves spawn other agents) - turning recursive inference into a learned capability. 1/10
Original Article

Similar Articles

Recursive Multi-Agent Systems

Papers with Code Trending

This paper introduces RecursiveMAS, a framework that extends recursive scaling principles to multi-agent systems for improved collaborative reasoning efficiency and accuracy. It demonstrates significant speedups and token reduction across various benchmarks compared to standard baselines.

Recursive Self-Evolving Agents via Held-Out Selection

arXiv cs.AI

Introduces RSEA, a method for recursive self-evolution of LLM agents using a three-layer natural-language state and a held-out selection gate to prevent regression. Evaluated across four benchmarks, it shows that context evolution is benchmark-dependent and that a strict selection gate is crucial for reliability.

@leerob: https://x.com/leerob/status/2065469795529588940

X AI KOLs Following

Cursor AI describes its recursive agent system for scaling training of its Composer model, using a fleet of agents that self-manage and alert humans when issues arise. The system enables parallel experiments and accelerates research, treating researcher time as the scarcest resource.

APPO: Agentic Procedural Policy Optimization

Hugging Face Daily Papers

APPO improves multi-turn tool-use in LLM agents by refining branching decisions and credit assignment using fine-grained decision points and procedure-level advantage scaling, outperforming baselines by 4 points on 13 benchmarks.

Stateful Inference for Low-Latency Multi-Agent Tool Calling

arXiv cs.LG

This paper presents a stateful inference architecture for multi-agent tool calling that reuses KV cache across turns and employs speculative decoding, achieving 2.1x-4.2x speedup over vLLM and SGLang on agentic workflows.