Tag
HALO uses RLMs to optimize AI agent harnesses by analyzing execution traces and suggesting improvements, achieving 10%+ gains on several benchmarks like Terminal-Bench and AppWorld.
A demo showing how OpenHands acts as a control plane across multiple agent harnesses like Claude Code, Gemini CLI, and OpenHands itself, enabling swapping models or vendors without rewriting orchestration.
Duetchat introduces Duet Agent, a new harness for running long-duration AI agent tasks with state machine relay, memory compaction, and a stateless runner for sandboxes.
Vex is an open-source CLI agent harness that lets users edit videos via natural language commands, automating tasks like silence removal, b-roll addition, and visual generation.
autoharness is an automated agent harness optimization tool that automatically generates proposals and runs evaluations based on benchmark commands to improve an agent's prompts, configurations, and source code. It supports Codex and Claude.
This article introduces Factory's Missions system, a multi-agent collaboration framework designed for long-term software engineering tasks. It addresses the drift issues commonly faced by traditional agents in long-cycle tasks through structured verification and handover mechanisms.
This paper introduces ReFlect, a training-free harness system that wraps LLMs with deterministic error detection and recovery logic to improve performance on complex, long-horizon reasoning tasks.