TACO: Tool-Augmented Credit Optimization for Agentic Tool Use
Summary
TACO introduces a novel credit optimization method for code-tool agents that uses a differential reward probe and outcome-gated advantage routing to distinguish useful from redundant or misleading tool calls, improving multimodal agent performance.
View Cached Full Text
Cached at: 06/30/26, 03:33 AM
Paper page - TACO: Tool-Augmented Credit Optimization for Agentic Tool Use
Source: https://huggingface.co/papers/2606.30251
Abstract
Tool-Augmented Credit Optimization (TACO) improves multimodal agent performance by distinguishing useful, redundant, or misleading code operations through dual advantage channels: Differential Answer-Probe Reward for individual tool contribution and Outcome-Gated Advantage Routing for final outcome distribution.
Agentic multimodal modelsperform diverse operations on an image via code and reason over the returned view, an effective paradigm for fine-grained visual question answering. However, code operations can be useful, redundant, or misleading. Outcome-only rewards cannot precisely distinguish these cases, and existing process rewards either fail to attribute final correctness to individual tool calls, or require an external judge model. To address this, we introduce Tool-Augmented Credit Optimization (TACO), aGRPOvariant forcode-tool agentsbuilt on two coupled advantage channels. The first,Differential Answer-Probe Reward(DAPR), is a self-supervised, judge-freetool-contribution advantagethat credits each tool call by its own effect on answering correctly.Probe tokensinserted into the model’s reasoning elicit its predictions with and without the tool, and the difference in outcome reward is taken as the call’s value: positive for a useful call, negative for a misleading one, and zero for one that changes nothing. This reuses the existinganswer checkerwith no auxiliary judge, and, being a difference rather than an absolute probe score, is naturally robust toprobe-hacking. The second is the outcome advantage from the final answer, distributed byOutcome-Gated Advantage Routing(OGAR): a parameter-free rule that, conditioned on the call’s outcome, delivers this credit only to the responsible segments, suppressing wasted tool calls without any cost term. We train TACO through a two-stageSFT+RL pipeline. Extensive experiments across perception, reasoning, and general multimodal benchmarks show that it yields consistent accuracy gains and learns to invoke its tools only when they help.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2606\.30251
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.30251 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.30251 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.30251 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression
TACO introduces a self-evolving compression framework that automatically learns to shrink redundant terminal interaction history, cutting token overhead ~10% while boosting accuracy 1-4% across TerminalBench and other code-agent benchmarks.
@omarsar0: Pay attention to this one, AI devs. This is particularly interesting if you work with long-horizon terminal agents that…
TACO is a self-evolving framework that automatically discovers and refines context compression rules for long-horizon terminal agents.
CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution [R]
CANTANTE introduces a contrastive credit attribution method to optimize multi-agent LLM systems by decomposing global rewards into per-agent signals, enabling automated prompt tuning. It outperforms baselines on programming, math, and retrieval benchmarks, achieving up to +18.9 points improvement without increased inference cost.
ToolGate: Token-Efficient Pre-Call Control for Tool-Augmented Vision-Language Agents
ToolGate is a lightweight external controller that predicts whether to execute or skip perceptual tool calls in vision-language agents, reducing token cost to 64–69% of baseline while preserving accuracy in cross-domain settings.
TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems
This paper introduces TacoMAS, a framework for test-time co-evolution of agent capabilities and communication topology in LLM-based multi-agent systems. It demonstrates that jointly adapting fast capability loops and slow topology loops improves performance and stability over existing baselines.