@apurvasgandhi: Sub-agents are a promising inference-time scaling primitive: • Expand an agent's working memory • Divide-and-conquer ha…
Summary
RAO (Recursive Agent Optimization) is an end-to-end reinforcement learning approach for training LLM agents to spawn, delegate to, and coordinate with recursive copies of themselves, turning recursive inference into a learned capability.
Similar Articles
Recursive Multi-Agent Systems
This paper introduces RecursiveMAS, a framework that extends recursive scaling principles to multi-agent systems for improved collaborative reasoning efficiency and accuracy. It demonstrates significant speedups and token reduction across various benchmarks compared to standard baselines.
Recursive Self-Evolving Agents via Held-Out Selection
Introduces RSEA, a method for recursive self-evolution of LLM agents using a three-layer natural-language state and a held-out selection gate to prevent regression. Evaluated across four benchmarks, it shows that context evolution is benchmark-dependent and that a strict selection gate is crucial for reliability.
@leerob: https://x.com/leerob/status/2065469795529588940
Cursor AI describes its recursive agent system for scaling training of its Composer model, using a fleet of agents that self-manage and alert humans when issues arise. The system enables parallel experiments and accelerates research, treating researcher time as the scarcest resource.
APPO: Agentic Procedural Policy Optimization
APPO improves multi-turn tool-use in LLM agents by refining branching decisions and credit assignment using fine-grained decision points and procedure-level advantage scaling, outperforming baselines by 4 points on 13 benchmarks.
Stateful Inference for Low-Latency Multi-Agent Tool Calling
This paper presents a stateful inference architecture for multi-agent tool calling that reuses KV cache across turns and employs speculative decoding, achieving 2.1x-4.2x speedup over vLLM and SGLang on agentic workflows.