Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent

Hugging Face Daily Papers 06/29/26, 12:00 AM Papers

mixture-of-experts agentic-model long-horizon trajectory-scaling distillation benchmark

Summary

Introduces Agents-A1, a 35B Mixture-of-Experts agentic model that achieves trillion-parameter-level performance through long-horizon trajectory scaling and a three-stage training approach including SFT, domain-level teachers, and multi-teacher distillation. The model outperforms or matches much larger models on long-horizon agent benchmarks.

We introduce Agents-A1, a 35B Mixture-of-Experts Agentic Model that reaches trillion-parameter-level performance by scaling the agent horizon. We investigate agent-horizon scaling from two perspectives: scaling long-horizon trajectories and scaling heterogeneous agent abilities. To support this goal, we build a long-horizon knowledge-action infrastructure that connects external knowledge, actions, observations, and verifier outcomes, producing agentic trajectories with an average length of 45K tokens. Based on this, we train Agents-A1 with a three-stage recipe. First, we perform full-domain supervised fine-tuning to align the base model with broad agentic behaviors. Second, we train domain-level teacher models to capture specialized expertise in each domain. Third, we propose a multi-teacher domain-routed on-policy distillation with salient vocabulary alignment to improve knowledge transfer efficiency across different domains, unifying six heterogeneous domains into one deployable student model. Agents-A1 achieves strong and broad performance for long-horizon agent benchmarks. Compared with 1T-parameter model such as Kimi-K2.6 and DeepSeek-V4-pro, Agents-A1 achieves leading results on SEAL-0 (56.4), IFBench (80.6), HiPhO (46.4), FrontierScience-Olympiad (79.0), and MolBench-Bind (56.8), and remains highly competitive on SciCode (44.3), HLE (47.6) and BrowseComp (75.5). We hope this work provides the community with a practical path for scaling the horizon using a 35B agent that can reach or match the performance of 1T models on long-horizon tasks.

Original Article

View Cached Full Text

Cached at: 06/30/26, 03:33 AM

Paper page - Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent

Source: https://huggingface.co/papers/2606.30616 Published on Jun 29

#1 Paper of the day Authors:

Abstract

Agents-A1, a 35B Mixture-of-Experts Agentic Model, achieves trillion-parameter-level performance through long-horizon trajectory scaling and heterogeneous agent ability scaling via a three-stage training approach involving supervised fine-tuning, domain-level teacher models, and multi-teacher distillation.

We introduce Agents-A1, a 35BMixture-of-Experts Agentic Modelthat reaches trillion-parameter-level performance by scaling theagent horizon. We investigate agent-horizon scaling from two perspectives: scalinglong-horizon trajectoriesand scalingheterogeneous agent abilities. To support this goal, we build a long-horizonknowledge-action infrastructurethat connects external knowledge, actions, observations, and verifier outcomes, producingagentic trajectorieswith an average length of 45K tokens. Based on this, we train Agents-A1 with a three-stage recipe. First, we perform full-domainsupervised fine-tuningto align the base model with broad agentic behaviors. Second, we traindomain-level teacher modelsto capture specialized expertise in each domain. Third, we propose amulti-teacher domain-routed on-policy distillationwithsalient vocabulary alignmentto improve knowledge transfer efficiency across different domains, unifying six heterogeneous domains into one deployable student model. Agents-A1 achieves strong and broad performance for long-horizon agent benchmarks. Compared with 1T-parameter model such as Kimi-K2.6 and DeepSeek-V4-pro, Agents-A1 achieves leading results on SEAL-0 (56.4), IFBench (80.6), HiPhO (46.4), FrontierScience-Olympiad (79.0), and MolBench-Bind (56.8), and remains highly competitive on SciCode (44.3), HLE (47.6) and BrowseComp (75.5). We hope this work provides the community with a practical path for scaling the horizon using a 35B agent that can reach or match the performance of 1T models on long-horizon tasks.

View arXiv page View PDF Project page GitHub34 Add to collection

Get this paper in your agent:

hf papers read 2606\.30616

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper1

#### InternScience/Agents-A1 Text Generation• 35B• Updated42 minutes ago • 55 • 18

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.30616 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.30616 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent

Paper page - Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent

Abstract

Models citing this paper1

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

InternScience/Agents-A1 · Hugging Face

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

@ModelScope2022: Introducing Agents-A1, A 35B MoE agentic model built for long-horizon tasks across search, engineering, scientific rese…

TMAS: Scaling Test-Time Compute via Multi-Agent Synergy

@dair_ai: Outstanding paper on long-horizon agents. (bookmark it) Similar to humans, how do you make agents persist on a difficul…

Submit Feedback

Similar Articles

InternScience/Agents-A1 · Hugging Face

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

@ModelScope2022: Introducing Agents-A1, A 35B MoE agentic model built for long-horizon tasks across search, engineering, scientific rese…

TMAS: Scaling Test-Time Compute via Multi-Agent Synergy

@dair_ai: Outstanding paper on long-horizon agents. (bookmark it) Similar to humans, how do you make agents persist on a difficul…