@dair_ai: // State-Externalizing Harnesses // A new paradigm is emerging on how to effectively build agents and harnesses. If the…

X AI KOLs Following Papers

Summary

Harness-1 introduces a state-externalizing harness that separates routine bookkeeping from policy decisions in search agents, enabling a 20B model to outperform larger frontier searchers across multiple benchmarks.

// State-Externalizing Harnesses // A new paradigm is emerging on how to effectively build agents and harnesses. If there is a state that the environment can maintain reliably, it probably doesn't belong inside the policy. Move it into the harness, and a 20B model trains better and generalizes further. Search agents are usually trained on one policy over a growing transcript, so RL has to learn semantic search and routine bookkeeping at the same time. This model, Harness-1, splits those apart. The harness keeps the working memory (candidate pool, evidence links, verification records, deduplicated observations, budget-aware context) outside the policy, and the 20B model only decides what to search, what to keep, what to verify, and when to stop. Across eight retrieval benchmarks spanning web, finance, patents, and multi-hop QA, it reaches 0.730 average curated recall, beating the next-best open search agent by 11.4 points and staying competitive with much larger frontier searchers. The gains are largest on the held-out transfer. Paper: https://arxiv.org/abs/2606.02373 Learn to build effective AI agents in our academy: https://academy.dair.ai
Original Article
View Cached Full Text

Cached at: 06/02/26, 03:47 PM

// State-Externalizing Harnesses //

A new paradigm is emerging on how to effectively build agents and harnesses.

If there is a state that the environment can maintain reliably, it probably doesn’t belong inside the policy. Move it into the harness, and a 20B model trains better and generalizes further.

Search agents are usually trained on one policy over a growing transcript, so RL has to learn semantic search and routine bookkeeping at the same time. This model, Harness-1, splits those apart.

The harness keeps the working memory (candidate pool, evidence links, verification records, deduplicated observations, budget-aware context) outside the policy, and the 20B model only decides what to search, what to keep, what to verify, and when to stop.

Across eight retrieval benchmarks spanning web, finance, patents, and multi-hop QA, it reaches 0.730 average curated recall, beating the next-best open search agent by 11.4 points and staying competitive with much larger frontier searchers. The gains are largest on the held-out transfer.

Paper: https://arxiv.org/abs/2606.02373

Learn to build effective AI agents in our academy: https://academy.dair.ai


Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

Source: https://arxiv.org/abs/2606.02373 View PDF

Abstract:Search agents are often trained as policies over growing transcripts: the model must decide how to search while also remembering what it has seen, which evidence is useful, which constraints remain open, and which claims have actually been checked. We argue that this formulation puts too much routine state management inside the policy: reinforcement learning is forced to optimize both semantic search decisions and recoverable bookkeeping that the environment can maintain more reliably. We introduce Harness-1, a 20B search agent (retrieval subagent) trained with reinforcement learning inside a stateful search harness. The harness maintains environment-side working memory, including a candidate pool, an importance-tagged curated set, compact evidence links, verification records, compressed and deduplicated observations, and budget-aware context rendering. The policy retains the semantic decisions: what to search, which documents to keep or discard, what to verify, and when to stop. Across eight retrieval benchmarks spanning web, finance, patents, and multi-hop QA, Harness-1 achieves 0.730 average curated recall, outperforming the next strongest open search subagent by +11.4 points and remaining competitive with much larger frontier-model searchers. Its gains are especially strong on held-out transfer benchmarks, suggesting that reinforcement learning over explicit search state can produce retrieval behaviors that generalize beyond the training domains. Our code is available atthis https URL.

Submission history

From: Pengcheng Jiang [view email] **[v1]**Mon, 1 Jun 2026 15:21:41 UTC (6,831 KB)

Similar Articles

@Potatoloogs: https://x.com/Potatoloogs/status/2057391224592667051

X AI KOLs Timeline

This article deeply analyzes the concept of Agent Harness, which is the engineering infrastructure wrapped around an LLM, including 12 components such as orchestration loops, tool calling, memory systems, context management, etc. The article cites practices from companies like Anthropic, OpenAI, and LangChain, arguing for the critical role of the harness in production-grade AI agents.