SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research
Summary
This paper introduces SearchSwarm, a model trained on synthesized delegation intelligence to improve long-horizon deep research tasks via task decomposition and subagent coordination, achieving state-of-the-art results on BrowseComp benchmarks.
View Cached Full Text
Cached at: 06/10/26, 05:45 AM
Paper page - SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research
Source: https://huggingface.co/papers/2606.09730
Abstract
A large language model trained on synthesized delegation intelligence achieves superior performance on long-horizon research tasks through task decomposition and subagent coordination.
Large language modelsare increasingly expected to handle complex, long-horizon real-world tasks whose context demands can grow without bound, yet modelcontext windowsremain inherently finite. Recent work explores a paradigm where a main agent decomposes tasks and dispatches subtasks tosubagents, which execute and return only summarized results, conserving the main agent’s context budget. However, performing this well requiresdelegation intelligence: the ability to decompose complex tasks, determine when and what to delegate, and integrate returned results into the ongoing workflow. Training data for this capability is scarce in naturally occurring text, and to our knowledge, how to synthesize such data and train models to acquire this capability remains largely unexplored in the open-source community. To bridge this gap, we present a preliminary exploration targeting deep research, a representative long-horizon agent task. Specifically, we design a harness that guides the model toward high-qualitytask decompositionand delegation, while constrainingsubagentsto return results properly to support the main agent’s workflow. The harness-guided trajectories naturally encode correct delegation decisions, which we use assupervised fine-tuningdata to internalizedelegation intelligenceinto model weights. Our resulting model,SearchSwarm-30B-A3B, achieves 68.1 onBrowseCompand 73.3 onBrowseComp-ZH, the best results among all models of comparable scale. We will release our harness, model weights, and training data to facilitate future research.
View arXiv pageView PDFProject pageGitHub28Add to collection
Get this paper in your agent:
hf papers read 2606\.09730
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.09730 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.09730 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.09730 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning
AgentJet is a distributed swarm training framework for LLM agent reinforcement learning that decouples agent rollouts from model optimization, enabling heterogeneous multi-agent RL, multi-task training, fault tolerance, and live code iteration with 1.5-10x training speedup. It also introduces an automated research system capable of autonomously conducting multi-day RL studies on large-scale clusters.
SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating
SlimSearcher is a framework that improves efficiency in deep research agents by combining Pareto-efficient trajectory filtering and adaptive reward shaping, reducing tool-call rounds by 17-58% while maintaining accuracy on benchmarks like GAIA, BrowseComp, and XBenchDeepSearch.
@AdamRLucek: I'm bullish on agent swarms (aka workflows). Agents are increasingly being used to analyze and collate massive amounts …
The author discusses the growing use of agent swarms/workflows for processing unstructured data at scale, noting that reliable execution drops significantly when deploying more than 30+ sub-agents in parallel, and teases a solution for combining intelligent decision-making with reliable task execution.
When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample LLM Inference
The paper proposes a delegation-based aggregator called Propagational Proxy Voting (PPV) that uses letter entropy and reasoning geometry to improve over majority voting for multi-sample LLM inference, achieving gains on MMLU-Pro without requiring gold labels or auxiliary training.
Search Discipline for Long-Horizon Research Agents
This paper identifies a failure mode in long-horizon research agents where optimizing an aggregate metric can select candidates that improve the headline number but break critical subgroups (inversion). It proposes a search-discipline protocol with an external control loop that audits candidates based on disaggregated behavior rather than the score.