Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent
Summary
Introduces Agents-A1, a 35B Mixture-of-Experts agentic model that achieves trillion-parameter-level performance through long-horizon trajectory scaling and a three-stage training approach including SFT, domain-level teachers, and multi-teacher distillation. The model outperforms or matches much larger models on long-horizon agent benchmarks.
View Cached Full Text
Cached at: 06/30/26, 03:33 AM
Paper page - Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent
Source: https://huggingface.co/papers/2606.30616 Published on Jun 29
#1 Paper of the day Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
Agents-A1, a 35B Mixture-of-Experts Agentic Model, achieves trillion-parameter-level performance through long-horizon trajectory scaling and heterogeneous agent ability scaling via a three-stage training approach involving supervised fine-tuning, domain-level teacher models, and multi-teacher distillation.
We introduce Agents-A1, a 35BMixture-of-ExpertsAgentic Modelthat reaches trillion-parameter-level performance by scaling theagent horizon. We investigate agent-horizon scaling from two perspectives: scalinglong-horizon trajectoriesand scalingheterogeneous agent abilities. To support this goal, we build a long-horizonknowledge-action infrastructurethat connects external knowledge, actions, observations, and verifier outcomes, producingagentic trajectorieswith an average length of 45K tokens. Based on this, we train Agents-A1 with a three-stage recipe. First, we perform full-domainsupervised fine-tuningto align the base model with broad agentic behaviors. Second, we traindomain-level teacher modelsto capture specialized expertise in each domain. Third, we propose amulti-teacher domain-routed on-policy distillationwithsalient vocabulary alignmentto improve knowledge transfer efficiency across different domains, unifying six heterogeneous domains into one deployable student model. Agents-A1 achieves strong and broad performance for long-horizon agent benchmarks. Compared with 1T-parameter model such as Kimi-K2.6 and DeepSeek-V4-pro, Agents-A1 achieves leading results on SEAL-0 (56.4), IFBench (80.6), HiPhO (46.4), FrontierScience-Olympiad (79.0), and MolBench-Bind (56.8), and remains highly competitive on SciCode (44.3), HLE (47.6) and BrowseComp (75.5). We hope this work provides the community with a practical path for scaling the horizon using a 35B agent that can reach or match the performance of 1T models on long-horizon tasks.
View arXiv pageView PDFProject pageGitHub34Add to collection
Get this paper in your agent:
hf papers read 2606\.30616
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper1
#### InternScience/Agents-A1 Text Generation• 35B• Updated42 minutes ago • 55 • 18
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.30616 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.30616 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
InternScience/Agents-A1 · Hugging Face
Agents-A1 is a 35B Mixture-of-Experts agentic model from InternScience that achieves competitive performance against frontier-scale systems like GPT-5.5 and DeepSeek-V4-pro using long-horizon trajectory scaling and multi-teacher multi-domain distillation.
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
Agent-World introduces a self-evolving training framework for general agent intelligence that autonomously discovers real-world environments and tasks via the Model Context Protocol, enabling continuous learning. Agent-World-8B and 14B models outperform strong proprietary models across 23 challenging agent benchmarks.
@ModelScope2022: Introducing Agents-A1, A 35B MoE agentic model built for long-horizon tasks across search, engineering, scientific rese…
ModelScope introduces Agents-A1, a 35B MoE agentic model with 256K context and function calling, achieving SOTA on long-horizon tasks and instruction following.
TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
TMAS introduces a multi-agent framework that enhances large language model reasoning by scaling test-time compute through structured collaboration and hierarchical memory systems. The approach uses specialized agents, cross-trajectory information flow, and hybrid reward reinforcement learning to improve iterative scaling and stability on challenging reasoning benchmarks.
@dair_ai: Outstanding paper on long-horizon agents. (bookmark it) Similar to humans, how do you make agents persist on a difficul…
AutoLab is a new benchmark evaluating 17 frontier models on 36 expert-curated long-horizon tasks (system optimization, model development, CUDA kernels, puzzles), finding that persistence—not initial attempt quality—is the dominant predictor of success. Claude-opus-4.6 led all categories, while most other models terminated prematurely or exhausted budgets with minimal progress.