Tag
A benchmark of 8 AI coding agents on building a VPS management toolkit found that only one of four implementations was production-ready, with a total cost of $1.94 and a 1:28 ratio between planning and code costs.
The article discusses varying approaches to writing specifications for AI coding agents and asks for community input on effective methods.
Discussion on the need for local safety boundaries in AI coding agents to prevent unauthorized file access or command execution.
The article proposes that organizations adopting AI coding agents should create a company-wide AGENTS.md file, similar to a human onboarding doc, to standardize agent behavior and context.
An in-depth guide to loop engineering for AI coding agents, explaining how to build automated loops that repeatedly prompt agents, verify results, and avoid runaway costs, illustrated with a case study of one engineer shipping 259 PRs in a month.
AI coding agents using the open-source ENPIRE framework can autonomously train robots to perform tasks like installing GPUs and cutting zip-ties, with the system self-improving overnight.
Athena Desktop is a local command room for AI coding agents.
This paper analyzes 20,574 real-world coding-agent sessions to identify how AI agents misalign with developer intent, finding that constraint violations and inaccurate self-reporting are the most common failure modes, imposing trust and effort costs rather than irreversible damage.
A GitHub repository that packages production-grade engineering skills for AI coding agents, encoding senior engineer workflows and quality gates into slash commands like /spec, /plan, /build, etc., with setup instructions for Claude Code, Cursor, and other tools.
PROJECTMEM is an open-source, local-first memory and judgment layer for AI coding agents that records development events and provides deterministic warnings before repeating failed actions, reducing token waste and improving reproducibility.
This paper evaluates LLM-based coding agents (Claude Code and Codex) in social science analysis, finding they match or exceed human methodological diversity while remaining vulnerable to interpretation bias through verdict-layer manipulation.
GitHub open-source course Learn Harness Engineering teaches you to build a controllable workflow framework for AI coding assistants (e.g., Claude Code, Codex). It includes 12 theory lessons and 6 hands-on projects, covering core mechanisms: instruction, state, validation, scope, and session.
The author describes the problem of AI coding agents making unauthorized changes outside their approved task and introduces their local tool Ripple, which detects such boundary violations and suggests actions like continue, repair, or human review.
A proposed workflow for AI coding agents that emphasizes brainstorming and boundary enforcement before code editing, seeking community feedback on its utility.
Matt Van Horn spends $10K/month on AI coding agents, using Claude and Codex to build everything via voice commands and plan files, without typing code.
The author explores the critical question of where trust checks should be placed in AI coding agent workflows—before coding, during coding, before PR, or during review—and invites developers to share where trust broke in their actual use of tools like Claude Code, Codex, and Cursor.
A new open-source tool 'agentcontract' provides a portable JSON-based permission layer for AI coding agents, allowing developers to define allow/deny rules for tools, paths, and network access across different agent runtimes. Version 0.0.1 adds a local browser GUI for editing and testing contracts.
This tool provides context engineering for AI coding agents by converting any codebase into an interactive graph that agents can query, compatible with Claude Code, Codex, and Antigravity, and is 100% open source.
Microsoft open-sourced AI Engineer Coach, a VS Code extension that analyzes developer usage of AI coding agents, providing insights and anti-pattern detection to improve AI workflows.
Discusses strategies to prevent AI coding agents from accidentally modifying production databases, advocating for read-only access, sandboxed environments, and approval gates over relying solely on prompts.