@GitHub_Daily: Using AI agents for production-grade tasks—writing code, running workflows, calling APIs—works fine initially, but as the scale grows, things easily get out of control: permissions too broad, context loss, and debugging becomes impossible. That's where agents-best-practices comes in: a complete guide to designing a runtime framework for AI agents, not limited to coding scenarios, but also applicable to operations, sales...
Summary
Introduces the agents-best-practices repository, a production-grade AI agent runtime framework design guide covering tool permission tiers, context compression, etc., supporting Codex and Claude Code installation.
View Cached Full Text
Cached at: 06/03/26, 09:47 AM
Using AI agents for production-level tasks — writing code, running workflows, calling APIs — works fine at first, but as scale grows, things easily spiral out of control: overly broad permissions, context loss, and no way to debug. That’s when I found agents-best-practices, a complete design guide for agent runtime frameworks. It’s not limited to coding; it’s equally applicable to operations, sales, data analysis, and more. The core idea is to let the model only propose actions, while actual validation, authorization, execution, and recording are handled by an external runtime framework — ensuring every step is auditable and controllable. GitHub: http://github.com/DenisSergeevitch/agents-best-practices…
The content covers tool permission levels, context compression strategies, budget control mechanisms, workflow decomposition methods, and pre-launch checklists. Installation is flexible, supporting Codex and Claude Code — load it into your development environment with a single command. If you are building or optimizing your own AI agent system, this production-grade guide for AI agents is worth checking out.
Use cases
Once installed, the skill activates when a conversation touches agent architecture, harness design, tool permissions, planning mode, workflow orchestration, context and memory, skills, connectors, observability, evals, prompt caching, or production readiness. Three common ways to use it:
Case 1 - Generate an MVP agent blueprint
You have a domain and need the smallest useful production-safe agent harness, not a vague list of best practices.
You > Build an agent for account renewal risk. It should read CRM, support tickets, and usage data, then draft renewal actions.
Agent > Start with an approval-gated Level 2 harness. The MVP has one job: produce a renewal-risk brief plus draft next actions for a human account owner.
Core loop: user/task -> context builder -> model call -> typed tool call -> schema validation -> permission check -> execution or pause -> structured observation -> next step or final brief
Minimal tools:
- read_account_profile read_private_data
- list_support_tickets read_private_data
- fetch_usage_summary read_private_data
- draft_customer_email draft_external_message
- request_approval approval_gate
Launch gate: 20 historical accounts, trace review, no unapproved external sends, and human acceptance on at least 80% of draft actions.
The primary reference is references/mvp-agent-blueprint.md.
Case 2 - Audit an existing agent harness
You already have a working agent, but it is brittle, expensive, too broad, or hard to debug.
You > Our research agent sometimes runs tools forever and forgets why it made a decision after context compaction. Audit the harness.
Agent > The failure points are runtime-level, not prompt-level:
- no hard step/tool/time/cost budget
- compaction preserves prose but loses active approvals
- tool results are unbounded and mix trusted/untrusted data
- no event trace for model output -> tool call -> observation
Fix order:
1. Add loop budgets and termination reasons.
2. Store plan, approvals, todos, and artifacts outside the prompt.
3. Make compaction rehydrate active state, not chat history.
4. Add evals for injection, missing tool result, timeout, and budget exhaustion.
Start with references/agentic-loop.md, references/context-memory-compaction.md, and references/security-evals-observability.md.
Case 3 - Design tools, permissions, and connectors
You need an agent to work with real systems without turning every API into an unsafe broad tool.
You > The ops agent needs Slack, Linear, Google Drive, and our internal deploy API. How should tools and permissions work?
Agent > Split by risk class. Reads can be autonomous when scoped. Drafts can be autonomous when labeled. External writes, deploys, destructive actions, privileged access, and financial operations require an approval record outside the model. Do not expose generic send_message, write_database, or run_command. Wrap each action as a narrow typed tool with structured results and deterministic permission checks.
Use references/tools-and-permissions.md and references/skills-and-connectors.md.
Other things the skill is good for
- “How do I add planning mode without making the agent passive?” -> use
references/planning-and-goals.md. - “When should a large task become a decomposed workflow?” -> use
references/workflow-orchestration.md. - “How should auto-compaction preserve active work?” -> use
references/context-memory-compaction.md. - “How do I make prompt caching work in a long-running agent?” -> use
references/prompt-caching-and-cost.md. - “How do I support OpenAI, Anthropic, and OpenAI-compatible APIs?” -> use
references/provider-api-patterns.md. - “What should I check before launch?” -> use
references/checklists.md.
“Keep the loop simple and make the runtime rigorous.”
What this is
A reference for people building agentic systems where the model is only one part of the runtime. It helps design a harness that includes:
- a provider-neutral model-tool-observation loop,
- narrow typed tools and structured tool results,
- runtime permission checks outside the model,
- planning mode and approval-gated execution,
- workflow orchestration for large decomposable tasks,
- goal-like loops with budgets, checkpoints, validation, and stop rules,
- context, memory, retrieval, and auto-compaction,
- skills, MCP, and external connector governance,
- prompt-cache-aware context layout and cost telemetry,
- observability, evals, launch gates, and incident response. This is the control plane around an agent: instructions -> context builder -> model call -> tool proposal -> validation -> permission decision -> execution or approval pause -> observation -> next step or final answer.
What this is not
- Not only for coding agents.
- Not a multi-agent framework by default.
- Not a replacement for runtime authorization, sandboxing, or audit logs.
- Not a prompt-only safety strategy.
- Not a reason to expose broad tools like
execute_anything,send_message, orwrite_database. Use the single-agent MVP first. Add goal loops, connectors, and broader autonomy only after measured failures justify them.
Layout
agents-best-practices/
├── README.md # public-facing overview and install notes
├── SKILL.md # skill entry point and trigger rules
├── icon.jpeg # skill image used by the README
└── references/
├── mvp-agent-blueprint.md # domain-specific MVP harness blueprint
├── architecture.md # component model and harness boundaries
├── agentic-loop.md # loop invariants, retries, budgets, stopping
├── tools-and-permissions.md # typed tools, risk classes, approvals
├── planning-and-goals.md # planning mode and long-running goals
├── workflow-orchestration.md # decomposed workflows, packets, verification
├── context-memory-compaction.md # context, memory, retrieval, compaction
├── prompt-caching-and-cost.md # stable prefixes and cost-aware context
├── skills-and-connectors.md # Agent Skills, MCP, connectors, tool search
├── system-prompts-instructions.md # instruction hierarchy and templates
├── provider-api-patterns.md # OpenAI, Anthropic, compatible APIs
├── security-evals-observability.md # guardrails, tracing, evals, launch gates
├── agent-legibility-feedback-loops.md # source-of-truth artifacts and cleanup
├── checklists.md # implementation and audit checklists
├── coverage-audit.md # topic coverage verification
└── source-links.md # official references and further reading
Philosophy
The central tension this skill resolves: how can an agent do useful work in real systems without turning the model into an unaudited operator? The answer is a small set of runtime rules:
- The harness acts, not the model - the model proposes; application code validates, authorizes, executes, and records.
- Every tool call gets a result - denial, timeout, malformed arguments, and aborts are observations too.
- Risk changes the loop - reads, drafts, writes, external communications, financial actions, destructive actions, and privileged actions need different permission paths.
- Draft and commit are separate - high-risk side effects require approval records outside the prompt.
- Context is built, not dumped - retrieve just enough, label trust boundaries, and preserve active state across compaction.
- Long-running work needs budgets - step, time, token, cost, and tool-call budgets are part of the product.
- Skills and connectors are progressively disclosed - expose names and descriptions first; load detailed workflows only when relevant.
- Repeated failures become harness features - validators, tools, docs, evals, or policies beat repeating prompt advice.
Read
SKILL.mdfirst. Usereferences/mvp-agent-blueprint.mdwhen the user asks to make or build an agent.
About Agent Skills
Agent Skills package reusable domain knowledge so compatible agents can discover, load, and apply a workflow only when it is relevant. This repository uses the portable SKILL.md entrypoint and works as a Codex skill, a Claude Code skill, or a skill for other Agent-Skill-aware runtimes.
Sources
- Agent Skills specification: agentskills.io/specification (https://agentskills.io/specification)
- OpenAI function calling, tools, agents, guardrails, sandboxing, Responses, and prompt caching docs are listed in
references/source-links.md. - Anthropic agent, context engineering, tool writing, long-running harness, MCP, and Agent Skills references are listed in
references/source-links.md. - MCP specification and governance references are listed in
references/source-links.md.
License
MIT - see LICENSE.
Credits
Authored as an Agent Skill for provider-neutral agent harness design. The recommendations synthesize common production harness patterns across OpenAI, Anthropic, OpenAI-compatible APIs, Agent Skills, MCP, and external connector workflows.
Similar Articles
@Xudong07452910: Open-source framework recommendation: Agency Agents — 232 professional AI agents, divided by function, covering 16 business departments. If you've used Claude Code or Codex, you may have encountered this problem: AI is very capable at coding tasks, but when it comes to front-end design, writing marketing...
Agency Agents is an open-source framework providing 232 professional AI agents covering 16 business departments. Each agent has a unique personality, communication style, and delivery standards. It supports multiple development tools such as Claude Code, GitHub Copilot, and has community-translated versions.
@Xudong07452910: High-quality Open Source Project Recommendation: 'Agents Best Practices' — Production-level AI Agent Harness Design Guide
A guide titled 'Agents Best Practices' providing a provider-neutral Agent Skill for building production-level AI Agent harnesses, designed for tools like Claude Code and Codex.
@FakeMaidenMaker: The scariest thing about using an AI agent to write code is losing control: the agent runs wild, quality is inconsistent, you don’t know what stage it’s in, and it messes things up halfway through. AWS just open-sourced a set of development lifecycle workflow rules specifically designed for AI coding agents — AI-DLC — that make the agent…
AWS has open-sourced AI-DLC (AI-Driven Development Life Cycle), a set of development lifecycle workflow rules designed for AI coding agents to help developers control agent behavior and ensure quality. It supports multiple platforms including Claude Code, Cursor, and GitHub Copilot.
@nash_su: Official best practices for Claude Code in large codebases. Of course, the same methodology can also be applied to Codex or any Agent. AI can make mistakes and bluff, and the larger the project, the more AI debt accumulates. This article covers some basic safeguards and optimization methods. This article uses http://Wi…
Official best practices for Claude Code in large codebases, also applicable to Codex or other AI Agents, introducing basic safeguards and optimization methods.
This article systematically reviews AI Agent architecture and engineering practices, covering control flow, context engineering, tool design, memory, multi-agent organization, evaluation, tracing, and security. It is based on the OpenClaw implementation and emphasizes the critical role of Harness (testing and validation infrastructure) for system stability.
This article systematically reviews AI Agent architecture and engineering practices, covering control flow, context engineering, tool design, memory, multi-agent organization, evaluation, tracing, and security. It is based on the OpenClaw implementation and emphasizes the critical role of Harness (testing and validation infrastructure) for system stability.