@rohanpaul_ai: This paper shows that agent performance depends less on prompts alone and more on the harness around them. “Agent intel…

X AI KOLs Following 05/23/26, 01:26 PM Papers

agents natural-language harness control-layer systems-problem research

Summary

This paper argues that AI agent performance depends more on the harness (control layer) than on prompts alone, proposing natural-language agent harnesses to make design choices inspectable and portable.

This paper shows that agent performance depends less on prompts alone and more on the harness around them. “Agent intelligence” is becoming partly a systems problem. The problem is that many AI agents look like 1 model, but their real behavior comes from surrounding code that controls planning, tools, memory, retries, checking, and stopping. A model may reason well in one step, but long tasks fail in messier places: state disappears, verification drifts, tools return partial evidence, and the agent forgets which intermediate artifact actually matters. Natural-Language Agent Harnesses try to make that control layer visible. Instead of burying the logic in controller code, they express the stages, roles, contracts, state rules, failure modes, and stopping conditions in structured natural language that a shared runtime can execute. The claim is not that natural language should replace code, but that the important design choices around an agent should become inspectable, portable, and testable instead of hiding inside one framework’s habits. On SWE-bench, heavier harnessing changed behavior dramatically, with more calls, tools, delegation, and runtime, but it did not produce a simple win curve; sometimes added structure helped, and sometimes it pushed the agent away from the shortest benchmark-aligned repair. A harness is not magic scaffolding around a model; it is a set of bets about where reliability comes from. ---- Paper Link – arxiv. org/abs/2603.25723 Paper Title: "Natural-Language Agent Harnesses"

Original Article

View Cached Full Text

Cached at: 05/23/26, 02:07 PM

This paper shows that agent performance depends less on prompts alone and more on the harness around them.

“Agent intelligence” is becoming partly a systems problem. The problem is that many AI agents look like 1 model, but their real behavior comes from surrounding code that controls planning, tools, memory, retries, checking, and stopping.

A model may reason well in one step, but long tasks fail in messier places: state disappears, verification drifts, tools return partial evidence, and the agent forgets which intermediate artifact actually matters.

Natural-Language Agent Harnesses try to make that control layer visible.

Instead of burying the logic in controller code, they express the stages, roles, contracts, state rules, failure modes, and stopping conditions in structured natural language that a shared runtime can execute.

The claim is not that natural language should replace code, but that the important design choices around an agent should become inspectable, portable, and testable instead of hiding inside one framework’s habits.

On SWE-bench, heavier harnessing changed behavior dramatically, with more calls, tools, delegation, and runtime, but it did not produce a simple win curve; sometimes added structure helped, and sometimes it pushed the agent away from the shortest benchmark-aligned repair.

A harness is not magic scaffolding around a model; it is a set of bets about where reliability comes from.

Paper Link – arxiv. org/abs/2603.25723

Paper Title: “Natural-Language Agent Harnesses”

@rohanpaul_ai: This paper shows that agent performance depends less on prompts alone and more on the harness around them. “Agent intel…

Similar Articles

@rohanpaul_ai: This Meta + Stanford + Illinois survey paper argues that AI agents work better when code becomes their main working lay…

@omarsar0: // Self-Harness: Harnesses That Improve Themselves // (bookmark this one) Most of the agent scaffolds we rely on today …

@sairahul1: https://x.com/sairahul1/status/2063544956158185927

@AlphaSignalAI: https://x.com/AlphaSignalAI/status/2057153343081111582

@sydneyrunkle: let's assume agent = model + harness unfortunately, good models are getting really expensive! so you need a great harne…

Submit Feedback

Similar Articles

@rohanpaul_ai: This Meta + Stanford + Illinois survey paper argues that AI agents work better when code becomes their main working lay…

@omarsar0: // Self-Harness: Harnesses That Improve Themselves // (bookmark this one) Most of the agent scaffolds we rely on today …

@sairahul1: https://x.com/sairahul1/status/2063544956158185927

@AlphaSignalAI: https://x.com/AlphaSignalAI/status/2057153343081111582

@sydneyrunkle: let's assume agent = model + harness unfortunately, good models are getting really expensive! so you need a great harne…