Tag
A comprehensive guide to building AI agent harnesses, covering tool execution, context management, state/memory, and guardrails, based on lessons from building Claude Code and other harnesses for enterprise.
An experiment applying BEAM-style concurrency (Erlang VM model) to coding agents yielded surprising results, suggesting potential improvements in agent coordination and fault tolerance.
The author argues that many human approval gates for AI agents are ineffective rubber stamps, and proposes a framework for designing meaningful review mechanisms that actually catch errors.
People want to build agents easily and have them improve over time; automating the 'hill climbing' loop is hard but has high ROI.
Google Devs introduces Agent Factory series with ADK 2.0 and Gemini 3.5 Flash, demonstrating how to build production-grade stateful agents that can run for days, featuring insights on skills, MCP, and code review strategies from engineers like Rohde Davis.
The article outlines a five-step workflow for spec-driven development using Warp skills: write a product spec (PRODUCT.md), write a tech spec (TECH.md), implement with any AI agent, validate implementation matches specs, and validate using computer use with Oz. The skills are open-sourced and can be installed via npx.
The author argues that AI agents in production should be defined as declarative manifests with their own runtime, rather than being scattered across application code, in order to enable proper versioning, observability, and rollback. They present their own solution as an open-source tool.
This article introduces the concept of 'Harness Engineering,' a discipline focused on designing the systems that constrain and guide AI agents to make them reliable in production, arguing that the harness matters more than the model itself.
The author reflects on why AI agents that perform well in demos often fail in real workflows, arguing that execution quality may be more tied to data issues (task examples, tool traces, evaluation sets) than to reasoning or planning alone, and notes that they are exploring this problem through the OpenDCAI/DataFlow project.
Google announces the official launch of Managed Agents in the Gemini API, enabling agent development with zero infrastructure overhead. The article also highlights AGENTS.md, an open format for providing context to AI coding agents, adopted by over 60k open-source projects.
This paper introduces the Meta-Agent Challenge (MAC), a benchmark for evaluating AI models' ability to autonomously develop agent systems through iterative programming. Results show that current models rarely match human baselines and exhibit issues like reward hacking, highlighting gaps in self-improvement capabilities.
The author created the pi-docs-playbook repository, organizing the official documentation of the pi framework into categories so that coding agents (such as Codex/Claude) can read it efficiently and accurately, thereby assisting in building business agents and reducing hallucinations.
LangChain introduces Managed Deep Agents, maintaining a familiar project layout with AGENTS.md, skills/, subagents/, and tools.json, along with Context Hub for persistent context management across sessions.
The article explores whether the Model Context Protocol (MCP) effectively reduces integration work for AI agents by standardizing agent-tool communication, comparing native MCP integration in Evose to manual wiring in other stacks like LangGraph and CrewAI.
A product manager without ML background adapts wikiLLM to create an 'agent-as-developer' that generates its own context from surprises and promotes repeated patterns to validated rules, reducing mandatory context by ~80% and preventing repetition of resolved issues.
LangChain's newsletter announces major product launches from Interrupt 2026: LangSmith Engine for automated agent failure diagnosis and fixes, and Sandboxes GA for secure code execution, alongside a new LangChain Labs research initiative and upcoming events.
The Hermes Agent and its ecosystem toolkit have attracted attention in the developer community, including an enhanced fork version, Alibaba Cloud memory plugin, Felo skills pack, community bible, and lightweight Web UI, showcasing the deep customization and cloud collaboration capabilities of AI Agents.
Microsoft open-sourced the VS Code extension AI Toolkit, which integrates model selection, Playground, Agent building, batch testing, and evaluation into one interface, simplifying AI Agent development process.
LangChain launches LangSmith Engine in public beta, an autonomous agent that monitors production traces, clusters failures, diagnoses root causes, and proposes fixes and eval coverage to streamline agent development.
This post provides practical advice on optimizing SKILL.md descriptions for Claude to ensure proper trigger activation, emphasizing the importance of specific keywords, negative constraints, and iterative testing over generic documentation.