The agent bug I thought was the model turned out to be the harness

Reddit r/AI_Agents News

Summary

The author shares a debugging experience where an agent loop was caused by a harness truncating tool outputs rather than model failure, highlighting the reliability gap in agent infrastructure compared to models.

Spent 3 days debugging an agent that kept looping on the same web search tool call. First things that came to mind was the model couldn't handle the schema. Swapped form Sonnet to Opus, then to GPT-5. Same loops. Swapped frameworks. Different loops, same shape. Eventually traced it to the harness silently truncating tool outputs when they ran past the default token budget. The tool was returning a long JSON blob, the harness was cutting it mid response, and the model, seeing what looked like an incomplete answer, kept calling the tool again. The truncating wasn't logged anywhere. Trace just showed the call going out and a partial response coming back. In this day and age (almost mid 2026) the model is mostly never the bottleneck on tool reliability. The harness layer is. There's plenty of leaderboards for model tool calling. None for which harness handles the actual tool I/O most reliably. What are the most reliable harness people are actually shipping with?
Original Article

Similar Articles