CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution [R]

Reddit r/MachineLearning 05/20/26, 11:43 AM Papers

Summary

CANTANTE introduces a contrastive credit attribution method to optimize multi-agent LLM systems by decomposing global rewards into per-agent signals, enabling automated prompt tuning. It outperforms baselines on programming, math, and retrieval benchmarks, achieving up to +18.9 points improvement without increased inference cost.

LLM-based multi-agent systems have demonstrated strong performance across complex real-world tasks, such as software engineering, predictive modeling, and retrieval-augmented generation. Yet, automating their configuration remains a structural challenge. Researchers are often forced into manual, trial-and-error prompt tuning, where a change to a single agent shifts the global output in ways that are difficult to trace. The core bottleneck is **credit assignment**: while the parameters governing agent behavior are local, performance scores are only available at the global system level. This makes optimization fundamentally difficult because we do not inherently know which agents contributed positively or negatively to the outcome. CANTANTE is an attempt to take a different path: treating agent prompts as parameters learned from task rewards rather than tuned by hand. By solving the credit assignment problem, we can move from brittle, hand-crafted agent demos to trustworthy systems that are actually autonomous and useful in practice. CANTANTE's algorithm in short (see second image): 1. Let local optimizers suggest configurations (e.g., prompts). 2. Evaluate different configurations on the same queries, capturing reasoning traces and system scores. 3. Let an attributer compare these rollouts and assign each agent a credit, thereby decomposing the global reward into per-agent update signals. 4. Feed those credits to any local optimizer; for the experiments, we use CAPO, our prompt optimizer from prior work at AutoML 2025. Evaluated against the DSPy-solutions GEPA and MIPROv2 on MBPP (Programming Benchmark), GSM8K (Mathematical Reasoning Benchmark), and HotpotQA (Retrieval Benchmark), CANTANTE: • Achieves the best average rank, • beats the strongest baseline by +18.9 points on MBPP and +12.5 on GSM8K, and • maintains inference time cost compared to unoptimized prompts. 🔗 Link to the paper: [https://arxiv.org/abs/2605.13295](https://arxiv.org/abs/2605.13295) 💻 Link to the repo: [https://github.com/finitearth/cantante](https://github.com/finitearth/cantante) If you're researching multi-agent architectures or automated prompt engineering, I'd love to hear what's working (and breaking) for you right now.

Original Article

CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution [R]

Similar Articles

Solving the Credit Assignment Problem in Multi-Agent Systems (CANTANTE Framework)

Reducing Credit Assignment Variance via Counterfactual Reasoning Paths

Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks

@NousResearch: Today we release Contrastive Neuron Attribution (CNA), a method for steering LLM behavior by identifying and ablating s…

Targeted Neuron Modulation via Contrastive Pair Search

Submit Feedback

Similar Articles

Solving the Credit Assignment Problem in Multi-Agent Systems (CANTANTE Framework)

Reducing Credit Assignment Variance via Counterfactual Reasoning Paths

Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks

@NousResearch: Today we release Contrastive Neuron Attribution (CNA), a method for steering LLM behavior by identifying and ablating s…

Targeted Neuron Modulation via Contrastive Pair Search