The agent bug I thought was the model turned out to be the harness

Reddit r/AI_Agents 05/08/26, 01:43 AM News

ai-agents debugging tool-use llm-harness anthropic openai software-engineering

Summary

The author shares a debugging experience where an agent loop was caused by a harness truncating tool outputs rather than model failure, highlighting the reliability gap in agent infrastructure compared to models.

Spent 3 days debugging an agent that kept looping on the same web search tool call. First things that came to mind was the model couldn't handle the schema. Swapped form Sonnet to Opus, then to GPT-5. Same loops. Swapped frameworks. Different loops, same shape. Eventually traced it to the harness silently truncating tool outputs when they ran past the default token budget. The tool was returning a long JSON blob, the harness was cutting it mid response, and the model, seeing what looked like an incomplete answer, kept calling the tool again. The truncating wasn't logged anywhere. Trace just showed the call going out and a partial response coming back. In this day and age (almost mid 2026) the model is mostly never the bottleneck on tool reliability. The harness layer is. There's plenty of leaderboards for model tool calling. None for which harness handles the actual tool I/O most reliably. What are the most reliable harness people are actually shipping with?

Original Article

Similar Articles

Your harness is failing your agent but there's no benchmark to prove it

Reddit r/AI_Agents

The article highlights a lack of benchmarks for evaluating the reliability of agent harnesses, specifically focusing on how MCP implementations handle tool calls and errors compared to the models themselves.

Your agent keeps failing after you upgrade the model. Cursor's engineering notes explain why.

Reddit r/AI_Agents

Cursor's engineering notes reveal that agent failures often stem from the harness (scaffolding) rather than the model itself, with different tool formats across providers causing silent errors and reliability issues.

@sydneyrunkle: let's assume agent = model + harness unfortunately, good models are getting really expensive! so you need a great harne…

X AI KOLs Following

A guide on optimizing AI agent performance by improving the harness component to compensate for expensive model costs, focusing on hill climbing techniques.

Your AI agent isn't broken. Your harness is. Here's the system that took mine from "liability" to shipping production code.

Reddit r/AI_Agents

The article argues that AI coding agent failures stem from poor system design rather than model limitations, outlining a three-layer 'harness' of knowledge, guardrails, and feedback loops to reliably ship production code.

The model is the CPU, not the computer — why the harness moves agent performance as much as a model upgrade

Reddit r/AI_Agents

The article argues that the harness (the system around the model) is as important as the model itself for agent performance, citing evidence from various benchmarks and experiments.

Similar Articles

Your harness is failing your agent but there's no benchmark to prove it

Your agent keeps failing after you upgrade the model. Cursor's engineering notes explain why.

@sydneyrunkle: let's assume agent = model + harness unfortunately, good models are getting really expensive! so you need a great harne…

Your AI agent isn't broken. Your harness is. Here's the system that took mine from "liability" to shipping production code.

The model is the CPU, not the computer — why the harness moves agent performance as much as a model upgrade

Submit Feedback