Tag
This paper introduces Self-Harness, a new paradigm where LLM-based agents iteratively improve their own operating harness—prompts, tools, and control flow—without human engineers or stronger external agents, achieving significant performance gains across multiple models.
A weekly roundup of notable AI papers covering self-revising discovery systems from MIT, disentangling agent self-evolution, and Google's LEAP for formal mathematics using agentic scaffolds.
A paper analyzing AI agent reliability, accepted at ICML 2026, finds that even the latest frontier models (GPT 5.5, Gemini 3.1 Pro, Claude Opus 4.7) show only marginal reliability improvements over earlier versions, with low outcome consistency and persistent issues in agent scaffolding.
This paper challenges the assumption that adding more scaffolding components to LLM agents always improves performance, demonstrating through systematic experiments that cross-component interference often leads to degradation. The study finds that simpler, task-specific subsets of components frequently outperform fully equipped 'all-in' agents across various model scales.