Tag
Google Devs introduces Agent Factory series with ADK 2.0 and Gemini 3.5 Flash, demonstrating how to build production-grade stateful agents that can run for days, featuring insights on skills, MCP, and code review strategies from engineers like Rohde Davis.
The article outlines a five-step workflow for spec-driven development using Warp skills: write a product spec (PRODUCT.md), write a tech spec (TECH.md), implement with any AI agent, validate implementation matches specs, and validate using computer use with Oz. The skills are open-sourced and can be installed via npx.
The author argues that AI agents in production should be defined as declarative manifests with their own runtime, rather than being scattered across application code, in order to enable proper versioning, observability, and rollback. They present their own solution as an open-source tool.
This article introduces the concept of 'Harness Engineering,' a discipline focused on designing the systems that constrain and guide AI agents to make them reliable in production, arguing that the harness matters more than the model itself.
The author reflects on why AI agents that perform well in demos often fail in real workflows, arguing that execution quality may be more tied to data issues (task examples, tool traces, evaluation sets) than to reasoning or planning alone, and notes that they are exploring this problem through the OpenDCAI/DataFlow project.
Google announces the official launch of Managed Agents in the Gemini API, enabling agent development with zero infrastructure overhead. The article also highlights AGENTS.md, an open format for providing context to AI coding agents, adopted by over 60k open-source projects.
This paper introduces the Meta-Agent Challenge (MAC), a benchmark for evaluating AI models' ability to autonomously develop agent systems through iterative programming. Results show that current models rarely match human baselines and exhibit issues like reward hacking, highlighting gaps in self-improvement capabilities.
The author created the pi-docs-playbook repository, organizing the official documentation of the pi framework into categories so that coding agents (such as Codex/Claude) can read it efficiently and accurately, thereby assisting in building business agents and reducing hallucinations.
LangChain introduces Managed Deep Agents, maintaining a familiar project layout with AGENTS.md, skills/, subagents/, and tools.json, along with Context Hub for persistent context management across sessions.
The article explores whether the Model Context Protocol (MCP) effectively reduces integration work for AI agents by standardizing agent-tool communication, comparing native MCP integration in Evose to manual wiring in other stacks like LangGraph and CrewAI.
A product manager without ML background adapts wikiLLM to create an 'agent-as-developer' that generates its own context from surprises and promotes repeated patterns to validated rules, reducing mandatory context by ~80% and preventing repetition of resolved issues.
LangChain's newsletter announces major product launches from Interrupt 2026: LangSmith Engine for automated agent failure diagnosis and fixes, and Sandboxes GA for secure code execution, alongside a new LangChain Labs research initiative and upcoming events.
The Hermes Agent and its ecosystem toolkit have attracted attention in the developer community, including an enhanced fork version, Alibaba Cloud memory plugin, Felo skills pack, community bible, and lightweight Web UI, showcasing the deep customization and cloud collaboration capabilities of AI Agents.
Microsoft open-sourced the VS Code extension AI Toolkit, which integrates model selection, Playground, Agent building, batch testing, and evaluation into one interface, simplifying AI Agent development process.
LangChain launches LangSmith Engine in public beta, an autonomous agent that monitors production traces, clusters failures, diagnoses root causes, and proposes fixes and eval coverage to streamline agent development.
This post provides practical advice on optimizing SKILL.md descriptions for Claude to ensure proper trigger activation, emphasizing the importance of specific keywords, negative constraints, and iterative testing over generic documentation.
This article comments on Anthropic’s talk regarding Claude, noting that the model now includes built-in Agent scaffolding features such as routing and retries. It highlights a demo showcasing a smooth closed-loop workflow where Claude independently reproduces, fixes, and tests frontend bugs, marking a new era in Agent development.
Developer @real_kai42 describes an intense two-week sprint building AI agents, fueled by excitement over limitless possibilities.
OpenAI launches new tools for building agents including the Responses API, built-in tools (web search, file search, computer use), Agents SDK, and observability features designed to simplify agentic application development.