TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems
Summary
This paper introduces TacoMAS, a framework for test-time co-evolution of agent capabilities and communication topology in LLM-based multi-agent systems. It demonstrates that jointly adapting fast capability loops and slow topology loops improves performance and stability over existing baselines.
View Cached Full Text
Cached at: 05/13/26, 12:14 PM
Paper page - TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems
Source: https://huggingface.co/papers/2605.09539
Abstract
Test-time co-evolution framework for multi-agent systems that jointly adapts agent capabilities and communication topology at different time scales to achieve task-conditioned stability and improved performance.
Multi-agent systems(MAS) have emerged as a promising paradigm for solving complex tasks. Recent work has explored self-evolving MAS that automatically optimize agent capabilities or communication topologies. However, existing methods either learn a topology that remains fixed at inference time or adapt only the topology or capability during inference. We empirically and theoretically show that effectivetest-time evolutionrequires jointly adapting both axes, but on different time scales: capabilities should update rapidly to handle emerging subtasks, while the topology should evolve more slowly to preserve coordination stability. We then introduce TacoMAS, a test-time co-evolution framework for dynamic MAS. TacoMAS formulates MAS inference as a task ofonline graph adaptation, where nodes represent agents with role-specific capabilities and edges define their communication topology. During inference, a fast capability loop updates agent expertise usingtrajectory-level feedback, while a slowmeta-LLM-driven topologyloop performsagents’ birth-death operationson MAS, includingedge edit,agent addition, andagent removal. We further show that thisfast-slow designdrives MAS evolution toward atask-conditioned stable equilibrium. Experiments on four benchmarks demonstrate that TacoMAS outperforms nearly 20 multi-agent baselines, achieving an average improvement of 13.3% over the strongest baseline. The codes are released at https://github.com/chenxu2-gif/TacoMAS-MultiAgent.
View arXiv pageView PDFGitHub1Add to collection
Get this paper in your agent:
hf papers read 2605\.09539
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.09539 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.09539 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.09539 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
TMAS introduces a multi-agent framework that enhances large language model reasoning by scaling test-time compute through structured collaboration and hierarchical memory systems. The approach uses specialized agents, cross-trajectory information flow, and hybrid reward reinforcement learning to improve iterative scaling and stability on challenging reasoning benchmarks.
A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression
TACO introduces a self-evolving compression framework that automatically learns to shrink redundant terminal interaction history, cutting token overhead ~10% while boosting accuracy 1-4% across TerminalBench and other code-agent benchmarks.
SkillCAT: Contrastive Assessment and Topology-Aware Skill Self-Evolution for LLM Agents
SkillCAT is a training-free framework for LLM agent skill self-evolution that addresses limitations of single-trace bias, unverified merging, and full corpus loading via three stages: Contrastive Causal Extraction, Assessment-Augmented Evolution, and Topology-Aware Task Execution, achieving up to 40.40% improvement on benchmarks.
Multi-agent Framework for Time-Sensitive Complementary Collaboration in Minecraft
The paper introduces TickingCollabBench, a Minecraft-based multi-agent benchmark for time-sensitive complementary collaboration tasks with dynamic environments, and demonstrates that LLMs frequently fail under such conditions compared to a global-knowledge oracle.
EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems
EvoTest introduces J-TTL, a benchmark for measuring agent test-time learning capabilities, and proposes an evolutionary framework where an Actor Agent plays games while an Evolver Agent iteratively improves the system's prompts, memory, and hyperparameters without fine-tuning. The method demonstrates superior performance compared to reflection and memory-based baselines on complex text-based games.