TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
Summary
TMAS introduces a multi-agent framework that enhances large language model reasoning by scaling test-time compute through structured collaboration and hierarchical memory systems. The approach uses specialized agents, cross-trajectory information flow, and hybrid reward reinforcement learning to improve iterative scaling and stability on challenging reasoning benchmarks.
View Cached Full Text
Cached at: 05/12/26, 10:52 AM
Paper page - TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
Source: https://huggingface.co/papers/2605.10344
Abstract
TMAS is a multi-agent framework for test-time scaling that enhances large language model reasoning through structured collaboration and hierarchical memory systems.
Test-time scalinghas become an effective paradigm for improving thereasoning abilityoflarge language modelsby allocating additional computation during inference. Recent structured approaches have further advanced this paradigm by organizing inference across multiple trajectories, refinement rounds, and verification-based feedback. However, existing structuredtest-time scalingmethods either weakly coordinate parallel reasoning trajectories or rely on noisy historical information without explicitly deciding what should be retained and reused, limiting their ability to balance exploration and exploitation. In this work, we propose TMAS, a framework for scaling test-time compute viamulti-agent synergy. TMAS organizes inference as a collaborative process among specialized agents, enabling structured information flow across agents, trajectories, and refinement iterations. To support effective cross-trajectory collaboration, TMAS introduceshierarchical memories: theexperience bankreuses low-level reliable intermediate conclusions and local feedback, while theguideline bankrecords previously explored high-level strategies to steer subsequent rollouts away from redundant reasoning patterns. Furthermore, we design ahybrid reward reinforcement learningscheme tailored to TMAS, which jointly preserves basic reasoning capability, enhances experience utilization, and encourages exploration beyond previously attempted solution strategies. Extensive experiments on challenging reasoning benchmarks demonstrate that TMAS achieves strongeriterative scalingthan existingtest-time scalingbaselines, while hybrid reward training further improves scaling effectiveness and stability across iterations. Code and data are available at https://github.com/george-QF/TMAS-code.
View arXiv pageView PDFGitHub0Add to collection
Get this paper in your agent:
hf papers read 2605\.10344
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.10344 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.10344 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.10344 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Recursive Multi-Agent Systems
This paper introduces RecursiveMAS, a framework that extends recursive scaling principles to multi-agent systems for improved collaborative reasoning efficiency and accuracy. It demonstrates significant speedups and token reduction across various benchmarks compared to standard baselines.
Multi-Agent Transactive Memory
Proposes Multi-Agent Transactive Memory (MATM), a framework for population-level storage and retrieval of agent-generated trajectories to improve task performance and reduce interaction steps in interactive environments like ALFWorld and WebArena.
SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models
SMAC-Talk is a new benchmark that extends the StarCraft Multi-Agent Challenge to evaluate LLM-based agents in cooperative multi-agent environments with natural language communication. It includes scenarios with deceptive communicators and benchmarks agents using models from the Qwen3.5 family to study how reasoning, memory, and scale affect coordination.
TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems
This paper introduces TacoMAS, a framework for test-time co-evolution of agent capabilities and communication topology in LLM-based multi-agent systems. It demonstrates that jointly adapting fast capability loops and slow topology loops improves performance and stability over existing baselines.
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
This paper introduces AutoTTS, an environment-driven framework that automates the discovery of test-time scaling strategies for LLMs by formulating it as controller synthesis. It demonstrates improved accuracy-cost tradeoffs on mathematical reasoning benchmarks with minimal computational overhead.