@dair_ai: // State-Externalizing Harnesses // A new paradigm is emerging on how to effectively build agents and harnesses. If the…

X AI KOLs Following 06/02/26, 03:01 PM Papers

Summary

Harness-1 introduces a state-externalizing harness that separates routine bookkeeping from policy decisions in search agents, enabling a 20B model to outperform larger frontier searchers across multiple benchmarks.

// State-Externalizing Harnesses // A new paradigm is emerging on how to effectively build agents and harnesses. If there is a state that the environment can maintain reliably, it probably doesn't belong inside the policy. Move it into the harness, and a 20B model trains better and generalizes further. Search agents are usually trained on one policy over a growing transcript, so RL has to learn semantic search and routine bookkeeping at the same time. This model, Harness-1, splits those apart. The harness keeps the working memory (candidate pool, evidence links, verification records, deduplicated observations, budget-aware context) outside the policy, and the 20B model only decides what to search, what to keep, what to verify, and when to stop. Across eight retrieval benchmarks spanning web, finance, patents, and multi-hop QA, it reaches 0.730 average curated recall, beating the next-best open search agent by 11.4 points and staying competitive with much larger frontier searchers. The gains are largest on the held-out transfer. Paper: https://arxiv.org/abs/2606.02373 Learn to build effective AI agents in our academy: https://academy.dair.ai

Original Article

View Cached Full Text

Cached at: 06/02/26, 03:47 PM

// State-Externalizing Harnesses //

A new paradigm is emerging on how to effectively build agents and harnesses.

If there is a state that the environment can maintain reliably, it probably doesn’t belong inside the policy. Move it into the harness, and a 20B model trains better and generalizes further.

Search agents are usually trained on one policy over a growing transcript, so RL has to learn semantic search and routine bookkeeping at the same time. This model, Harness-1, splits those apart.

The harness keeps the working memory (candidate pool, evidence links, verification records, deduplicated observations, budget-aware context) outside the policy, and the 20B model only decides what to search, what to keep, what to verify, and when to stop.

Across eight retrieval benchmarks spanning web, finance, patents, and multi-hop QA, it reaches 0.730 average curated recall, beating the next-best open search agent by 11.4 points and staying competitive with much larger frontier searchers. The gains are largest on the held-out transfer.

Paper: https://arxiv.org/abs/2606.02373

Learn to build effective AI agents in our academy: https://academy.dair.ai

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

Source: https://arxiv.org/abs/2606.02373 View PDF

Abstract:Search agents are often trained as policies over growing transcripts: the model must decide how to search while also remembering what it has seen, which evidence is useful, which constraints remain open, and which claims have actually been checked. We argue that this formulation puts too much routine state management inside the policy: reinforcement learning is forced to optimize both semantic search decisions and recoverable bookkeeping that the environment can maintain more reliably. We introduce Harness-1, a 20B search agent (retrieval subagent) trained with reinforcement learning inside a stateful search harness. The harness maintains environment-side working memory, including a candidate pool, an importance-tagged curated set, compact evidence links, verification records, compressed and deduplicated observations, and budget-aware context rendering. The policy retains the semantic decisions: what to search, which documents to keep or discard, what to verify, and when to stop. Across eight retrieval benchmarks spanning web, finance, patents, and multi-hop QA, Harness-1 achieves 0.730 average curated recall, outperforming the next strongest open search subagent by +11.4 points and remaining competitive with much larger frontier-model searchers. Its gains are especially strong on held-out transfer benchmarks, suggesting that reinforcement learning over explicit search state can produce retrieval behaviors that generalize beyond the training domains. Our code is available atthis https URL.

Submission history

From: Pengcheng Jiang [view email] **[v1]**Mon, 1 Jun 2026 15:21:41 UTC (6,831 KB)

@dair_ai: // State-Externalizing Harnesses // A new paradigm is emerging on how to effectively build agents and harnesses. If the…

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

Submission history

Similar Articles

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

@omarsar0: // Self-Harness: Harnesses That Improve Themselves // (bookmark this one) Most of the agent scaffolds we rely on today …

Self-Harness: Harnesses That Improve Themselves

best of the best agentic harnesses do this…

@rohanpaul_ai: So much recent work and research papers points to the same thing: the "harness" is becoming the real capability layer. …

Submit Feedback

Similar Articles

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

@omarsar0: // Self-Harness: Harnesses That Improve Themselves // (bookmark this one) Most of the agent scaffolds we rely on today …

Self-Harness: Harnesses That Improve Themselves

best of the best agentic harnesses do this…

@rohanpaul_ai: So much recent work and research papers points to the same thing: the "harness" is becoming the real capability layer. …