Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

Hugging Face Daily Papers 06/10/26, 12:00 AM Papers

Summary

Evoflux uses evolutionary search at inference time to repair failed tool workflows for compact language models, boosting execution feasibility significantly over fine-tuning methods.

Compact language models (LMs) reduce cost, latency, and deployment risk for tool agents. Yet MCP-style tool use requires more than isolated function calling: an agent must discover tools from live catalogs, satisfy schemas, preserve dependencies across intermediate outputs, and ground final responses in executed evidence. Small planners often generate plausible workflow graphs that fail under tool resolution, parameter validation, dependency tracking, or execution. We argue that this failure mode is poorly handled by small-corpus distillation. A few hundred teacher traces can teach workflow format, but rarely cover the recovery behavior needed to repair failed plans over changing tool catalogs. We introduce Evoflux, an inference-time evolutionary search method that treats compact tool use as the repair of executable tool workflows. It evolves typed workflow graphs through structured edits, execution feedback, adaptive intensity, meta-guided redesign, and diversity pruning. On held-out MCP-Bench tasks spanning live MCP servers and 250 tools, Evoflux raises execution feasibility from roughly 3% to 17-24% across small planners. In contrast, SFT and SFT+DPO on the same search-mined data match, underperform, or collapse below zero-shot performance; ReAct reaches higher peaks, but with higher variance and token cost. These results show that execution-grounded search is more reliable under scarce teacher-trace budgets.

Original Article

View Cached Full Text

Cached at: 06/12/26, 06:51 AM

Paper page - Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

Source: https://huggingface.co/papers/2606.12674 Published on Jun 10

Submitted byhttps://huggingface.co/LeoYML

Leo Yon Jun 12

Abstract

Evoflux enables compact language models to execute tool workflows more reliably by using evolutionary search to repair failed plans during inference, significantly improving execution feasibility compared to traditional fine-tuning methods.

Compact language models(LMs) reduce cost, latency, and deployment risk fortool agents. YetMCP-style tool userequires more than isolated function calling: an agent must discover tools from live catalogs, satisfy schemas, preserve dependencies across intermediate outputs, and ground final responses in executed evidence. Small planners often generate plausibleworkflow graphsthat fail under tool resolution, parameter validation, dependency tracking, or execution. We argue that this failure mode is poorly handled by small-corpus distillation. A few hundred teacher traces can teach workflow format, but rarely cover the recovery behavior needed to repair failed plans over changing tool catalogs. We introduce Evoflux, an inference-timeevolutionary searchmethod that treats compact tool use as the repair of executable tool workflows. It evolves typedworkflow graphsthroughstructured edits,execution feedback,adaptive intensity,meta-guided redesign, anddiversity pruning. On held-out MCP-Bench tasks spanning live MCP servers and 250 tools, Evoflux raises execution feasibility from roughly 3% to 17-24% across small planners. In contrast,SFTandSFT+DPOon the same search-mined data match, underperform, or collapse below zero-shot performance;ReActreaches higher peaks, but with higher variance and token cost. These results show that execution-grounded search is more reliable under scarce teacher-trace budgets.

View arXiv page View PDF GitHub0 Add to collection

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.12674 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.12674 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.12674 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

Paper page - Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

MetaEvo: A Meta-Optimization Framework for Experience-Driven Agent Evolution

Stateful Inference for Low-Latency Multi-Agent Tool Calling

@tom_doerr: Semi-autonomous agents optimize codebases through parallel experimentation https://github.com/evo-hq/evo

Submit Feedback

Similar Articles

EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

MetaEvo: A Meta-Optimization Framework for Experience-Driven Agent Evolution

Stateful Inference for Low-Latency Multi-Agent Tool Calling

@tom_doerr: Semi-autonomous agents optimize codebases through parallel experimentation https://github.com/evo-hq/evo