Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

Hugging Face Daily Papers 06/04/26, 09:26 AM Papers

llm-agents self-supervised harness-optimization trajectory-rollouts ai-agents swe-bench

Summary

Retrospective Harness Optimization (RHO) is a self-supervised method that improves LLM agent performance using only past trajectories, achieving a 78% pass rate on SWE-Bench Pro without external grading.

AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is difficult to acquire in practical deployment settings. To address this problem, we introduce Retrospective Harness Optimization (RHO), a self-supervised method that optimizes the agent harness using only past trajectories. Specifically, RHO selects a diverse coreset of challenging tasks from past trajectories and re-solves them in parallel. The agent analyzes these rollouts using self-validation and self-consistency, then generates candidate harness updates and selects the most effective one by its own pairwise self-preference. We evaluate RHO across three diverse domains, spanning software engineering, technical work, and knowledge work. Notably, a single optimization round improves the pass rate on SWE-Bench Pro from 59% to 78% without any external grading. Furthermore, our analysis demonstrates that RHO effectively targets prior failure modes. As a result, the optimized harness alters the agent's behavior patterns and sustains higher accuracy during long-horizon sessions.

Original Article

View Cached Full Text

Cached at: 06/10/26, 05:44 AM

Paper page - Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

Source: https://huggingface.co/papers/2606.05922

Abstract

Retrospective Harness Optimization (RHO) is a self-supervised method that improves AI agent performance by optimizing agent harness using only past trajectories through diverse task selection, parallel re-solving, and self-validation techniques.

AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is difficult to acquire in practical deployment settings. To address this problem, we introduceRetrospective Harness Optimization(RHO), aself-supervised methodthat optimizes theagent harnessusing onlypast trajectories. Specifically, RHO selects a diversecoresetof challenging tasks frompast trajectoriesand re-solves them in parallel. The agent analyzes these rollouts usingself-validationandself-consistency, then generates candidate harness updates and selects the most effective one by its ownpairwise self-preference. We evaluate RHO across three diverse domains, spanning software engineering, technical work, and knowledge work. Notably, a single optimization round improves the pass rate onSWE-Bench Profrom 59% to 78% without any external grading. Furthermore, our analysis demonstrates that RHO effectively targets prior failure modes. As a result, the optimized harness alters the agent’s behavior patterns and sustains higher accuracy during long-horizon sessions.

View arXiv page View PDF Project page GitHub Add to collection

Get this paper in your agent:

hf papers read 2606\.05922

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.05922 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.05922 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.05922 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

Paper page - Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Retrospective Progress-Aware Self-Refinement for LLM Agent Training

@omarsar0: // Self-Harness: Harnesses That Improve Themselves // (bookmark this one) Most of the agent scaffolds we rely on today …

Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses

Harnesses for Inference-Time Alignment over Execution Trajectories

Stop Comparing LLM Agents Without Disclosing the Harness

Submit Feedback

Similar Articles

Retrospective Progress-Aware Self-Refinement for LLM Agent Training

@omarsar0: // Self-Harness: Harnesses That Improve Themselves // (bookmark this one) Most of the agent scaffolds we rely on today …

Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses

Harnesses for Inference-Time Alignment over Execution Trajectories

Stop Comparing LLM Agents Without Disclosing the Harness