trajectory-rollouts

#trajectory-rollouts

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

Hugging Face Daily Papers ↗ · 2026-06-04 Cached

Retrospective Harness Optimization (RHO) is a self-supervised method that improves LLM agent performance using only past trajectories, achieving a 78% pass rate on SWE-Bench Pro without external grading.

0 favorites 0 likes

trajectory-rollouts

Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts

Submit Feedback