Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents
Summary
Evoflux uses evolutionary search at inference time to repair failed tool workflows for compact language models, boosting execution feasibility significantly over fine-tuning methods.
View Cached Full Text
Cached at: 06/12/26, 06:51 AM
Paper page - Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents
Source: https://huggingface.co/papers/2606.12674 Published on Jun 10
·
Submitted byhttps://huggingface.co/LeoYML
Leo Yon Jun 12
Abstract
Evoflux enables compact language models to execute tool workflows more reliably by using evolutionary search to repair failed plans during inference, significantly improving execution feasibility compared to traditional fine-tuning methods.
Compact language models(LMs) reduce cost, latency, and deployment risk fortool agents. YetMCP-style tool userequires more than isolated function calling: an agent must discover tools from live catalogs, satisfy schemas, preserve dependencies across intermediate outputs, and ground final responses in executed evidence. Small planners often generate plausibleworkflow graphsthat fail under tool resolution, parameter validation, dependency tracking, or execution. We argue that this failure mode is poorly handled by small-corpus distillation. A few hundred teacher traces can teach workflow format, but rarely cover the recovery behavior needed to repair failed plans over changing tool catalogs. We introduce Evoflux, an inference-timeevolutionary searchmethod that treats compact tool use as the repair of executable tool workflows. It evolves typedworkflow graphsthroughstructured edits,execution feedback,adaptive intensity,meta-guided redesign, anddiversity pruning. On held-out MCP-Bench tasks spanning live MCP servers and 250 tools, Evoflux raises execution feasibility from roughly 3% to 17-24% across small planners. In contrast,SFTandSFT+DPOon the same search-mined data match, underperform, or collapse below zero-shot performance;ReActreaches higher peaks, but with higher variance and token cost. These results show that execution-grounded search is more reliable under scarce teacher-trace budgets.
View arXiv pageView PDFGitHub0Add to collection
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.12674 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.12674 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.12674 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems
EvoMAS is a framework for learning execution-time workflows in multi-agent systems by formulating workflow construction as a sequential decision problem. It outperforms static multi-agent design methods on complex tasks by adapting agent coordination dynamically based on evolving task states.
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL
EnvFactory automates the creation of executable tool environments and natural multi-turn trajectories for training LLMs with agentic reinforcement learning, achieving superior performance on benchmarks like BFCLv3 and MCP-Atlas with fewer environments than prior work.
MetaEvo: A Meta-Optimization Framework for Experience-Driven Agent Evolution
MetaEvo proposes a two-stage framework for continual evolution of LLM-based agents, using preference-based optimization to enhance principle abstraction and modular architecture for experience reuse, outperforming strong baselines on reasoning benchmarks.
Stateful Inference for Low-Latency Multi-Agent Tool Calling
This paper presents a stateful inference architecture for multi-agent tool calling that reuses KV cache across turns and employs speculative decoding, achieving 2.1x-4.2x speedup over vLLM and SGLang on agentic workflows.
@tom_doerr: Semi-autonomous agents optimize codebases through parallel experimentation https://github.com/evo-hq/evo
Evo is an open-source tool that provides semi-autonomous agents to optimize codebases through parallel experimentation, using tree search and multiple subagents to autonomously discover and improve metrics.