@GitHub_Daily: 用 AI 智能体生产级事情，写代码、跑流程、调接口，一开始还行，但规模一大就容易失控，权限太宽、上下文丢失、调试无从下手。于是找到了 agents-best-practices 这套完整的智能体运行框架设计指南，不限于编码场景，运营、销…

X AI KOLs Timeline 2026/06/03 00:00 工具

ai-agents best-practices agent-framework production-readiness guidelines github

摘要

介绍了 agents-best-practices 仓库，这是一份生产级 AI 智能体运行框架设计指南，涵盖工具权限分级、上下文压缩等，支持 Codex 和 Claude Code 安装。

用 AI 智能体生产级事情，写代码、跑流程、调接口，一开始还行，但规模一大就容易失控，权限太宽、上下文丢失、调试无从下手。于是找到了 agents-best-practices 这套完整的智能体运行框架设计指南，不限于编码场景，运营、销售、数据分析等领域同样适用。核心是让模型只负责「提议」，真正的验证、授权、执行和记录全部交给外部运行框架来完成，确保每一步都可审计、可控制。 GitHub：http://github.com/DenisSergeevitch/agents-best-practices… 内容涵盖了工具权限分级、上下文压缩策略、预算控制机制、工作流拆解方法，还有上线前的检查清单。安装方式也很灵活，支持 Codex 和 Claude Code，一条命令就能加载到你的开发环境里。如果你正在搭建或优化自己的 AI 智能体系统，这份给 AI Agent 的生产级指南值得看看。

查看原文

查看缓存全文

缓存时间: 2026/06/03 09:47

用 AI 智能体生产级事情，写代码、跑流程、调接口，一开始还行，但规模一大就容易失控，权限太宽、上下文丢失、调试无从下手。

于是找到了 agents-best-practices 这套完整的智能体运行框架设计指南，不限于编码场景，运营、销售、数据分析等领域同样适用。

核心是让模型只负责「提议」，真正的验证、授权、执行和记录全部交给外部运行框架来完成，确保每一步都可审计、可控制。

GitHub：http://github.com/DenisSergeevitch/agents-best-practices…

内容涵盖了工具权限分级、上下文压缩策略、预算控制机制、工作流拆解方法，还有上线前的检查清单。

安装方式也很灵活，支持 Codex 和 Claude Code，一条命令就能加载到你的开发环境里。

如果你正在搭建或优化自己的 AI 智能体系统，这份给 AI Agent 的生产级指南值得看看。

DenisSergeevitch/agents-best-practices

Source: https://github.com/DenisSergeevitch/agents-best-practices

agents-best-practices

“The model proposes actions; the harness validates, authorizes, executes, records, and returns observations.”

A provider-neutral Agent Skill for designing, generating MVP blueprints for, auditing, refactoring, and explaining agentic harnesses.

It applies beyond coding agents: research, support, operations, sales, finance, data analysis, procurement, legal workflows, healthcare workflows, education, and workflow automation agents all need the same core runtime discipline.

Install - pick one:

A. With skills (any compatible agent):

npx skills add DenisSergeevitch/agents-best-practices -g

The -g flag installs globally at user level so every project can discover it.

B. Or paste this prompt to your AI agent:

Install the agents-best-practices skill for me:

1. Clone https://github.com/DenisSergeevitch/agents-best-practices into my
   user-level skills directory as `agents-best-practices/`.
   Use the skill directory my agent reads on this machine, for example:
   - Codex: ~/.codex/skills/
   - Claude Code: ~/.claude/skills/
2. Verify that SKILL.md, icon.jpeg, and the references/ directory are present.
3. Confirm the install path when done.

C. Manual install paths:

# Codex
mkdir -p "${CODEX_HOME:-$HOME/.codex}/skills"
git clone https://github.com/DenisSergeevitch/agents-best-practices.git \
  "${CODEX_HOME:-$HOME/.codex}/skills/agents-best-practices"

# Claude Code, user-level
mkdir -p "$HOME/.claude/skills"
git clone https://github.com/DenisSergeevitch/agents-best-practices.git \
  "$HOME/.claude/skills/agents-best-practices"

# Claude Code, project-level
mkdir -p .claude/skills
git clone https://github.com/DenisSergeevitch/agents-best-practices.git \
  .claude/skills/agents-best-practices

Use cases · What this is · Philosophy · Layout · Sources

Use cases

Once installed, the skill activates when a conversation touches agent architecture, harness design, tool permissions, planning mode, workflow orchestration, context and memory, skills, connectors, observability, evals, prompt caching, or production readiness. Three common ways to use it:

Case 1 - Generate an MVP agent blueprint

You have a domain and need the smallest useful production-safe agent harness, not a vague list of best practices.

You    > Build an agent for account renewal risk. It should read CRM,
         support tickets, and usage data, then draft renewal actions.

Agent  > Start with an approval-gated Level 2 harness.
         The MVP has one job: produce a renewal-risk brief plus draft
         next actions for a human account owner.

         Core loop:
           user/task -> context builder -> model call -> typed tool call
           -> schema validation -> permission check -> execution or pause
           -> structured observation -> next step or final brief

         Minimal tools:
           - read_account_profile      read_private_data
           - list_support_tickets      read_private_data
           - fetch_usage_summary       read_private_data
           - draft_customer_email      draft_external_message
           - request_approval          approval_gate

         Launch gate:
           20 historical accounts, trace review, no unapproved external
           sends, and human acceptance on at least 80% of draft actions.

The primary reference is references/mvp-agent-blueprint.md.

Case 2 - Audit an existing agent harness

You already have a working agent, but it is brittle, expensive, too broad, or hard to debug.

You    > Our research agent sometimes runs tools forever and forgets why
         it made a decision after context compaction. Audit the harness.

Agent  > The failure points are runtime-level, not prompt-level:
           - no hard step/tool/time/cost budget
           - compaction preserves prose but loses active approvals
           - tool results are unbounded and mix trusted/untrusted data
           - no event trace for model output -> tool call -> observation

         Fix order:
           1. Add loop budgets and termination reasons.
           2. Store plan, approvals, todos, and artifacts outside the prompt.
           3. Make compaction rehydrate active state, not chat history.
           4. Add evals for injection, missing tool result, timeout, and
              budget exhaustion.

Start with references/agentic-loop.md, references/context-memory-compaction.md, and references/security-evals-observability.md.

Case 3 - Design tools, permissions, and connectors

You need an agent to work with real systems without turning every API into an unsafe broad tool.

You    > The ops agent needs Slack, Linear, Google Drive, and our internal
         deploy API. How should tools and permissions work?

Agent  > Split by risk class. Reads can be autonomous when scoped.
         Drafts can be autonomous when labeled. External writes, deploys,
         destructive actions, privileged access, and financial operations
         require an approval record outside the model.

         Do not expose generic send_message, write_database, or run_command.
         Wrap each action as a narrow typed tool with structured results and
         deterministic permission checks.

Use references/tools-and-permissions.md and references/skills-and-connectors.md.

Other things the skill is good for

“How do I add planning mode without making the agent passive?” -> use references/planning-and-goals.md.
“When should a large task become a decomposed workflow?” -> use references/workflow-orchestration.md.
“How should auto-compaction preserve active work?” -> use references/context-memory-compaction.md.
“How do I make prompt caching work in a long-running agent?” -> use references/prompt-caching-and-cost.md.
“How do I support OpenAI, Anthropic, and OpenAI-compatible APIs?” -> use references/provider-api-patterns.md.
“What should I check before launch?” -> use references/checklists.md.

“Keep the loop simple and make the runtime rigorous.”

What this is

A reference for people building agentic systems where the model is only one part of the runtime. It helps design a harness that includes:

a provider-neutral model-tool-observation loop,
narrow typed tools and structured tool results,
runtime permission checks outside the model,
planning mode and approval-gated execution,
workflow orchestration for large decomposable tasks,
goal-like loops with budgets, checkpoints, validation, and stop rules,
context, memory, retrieval, and auto-compaction,
skills, MCP, and external connector governance,
prompt-cache-aware context layout and cost telemetry,
observability, evals, launch gates, and incident response.

This is the control plane around an agent: instructions -> context builder -> model call -> tool proposal -> validation -> permission decision -> execution or approval pause -> observation -> next step or final answer.

What this is not

Not only for coding agents.
Not a multi-agent framework by default.
Not a replacement for runtime authorization, sandboxing, or audit logs.
Not a prompt-only safety strategy.
Not a reason to expose broad tools like execute_anything, send_message, or write_database.

Use the single-agent MVP first. Add goal loops, connectors, and broader autonomy only after measured failures justify them.

Layout

agents-best-practices/
├── README.md                                 # public-facing overview and install notes
├── SKILL.md                                  # skill entry point and trigger rules
├── icon.jpeg                                 # skill image used by the README
└── references/
    ├── mvp-agent-blueprint.md                # domain-specific MVP harness blueprint
    ├── architecture.md                       # component model and harness boundaries
    ├── agentic-loop.md                       # loop invariants, retries, budgets, stopping
    ├── tools-and-permissions.md              # typed tools, risk classes, approvals
    ├── planning-and-goals.md                 # planning mode and long-running goals
    ├── workflow-orchestration.md             # decomposed workflows, packets, verification
    ├── context-memory-compaction.md          # context, memory, retrieval, compaction
    ├── prompt-caching-and-cost.md            # stable prefixes and cost-aware context
    ├── skills-and-connectors.md              # Agent Skills, MCP, connectors, tool search
    ├── system-prompts-instructions.md        # instruction hierarchy and templates
    ├── provider-api-patterns.md              # OpenAI, Anthropic, compatible APIs
    ├── security-evals-observability.md       # guardrails, tracing, evals, launch gates
    ├── agent-legibility-feedback-loops.md    # source-of-truth artifacts and cleanup
    ├── checklists.md                         # implementation and audit checklists
    ├── coverage-audit.md                     # topic coverage verification
    └── source-links.md                       # official references and further reading

Philosophy

The central tension this skill resolves: how can an agent do useful work in real systems without turning the model into an unaudited operator? The answer is a small set of runtime rules:

The harness acts, not the model - the model proposes; application code validates, authorizes, executes, and records.
Every tool call gets a result - denial, timeout, malformed arguments, and aborts are observations too.
Risk changes the loop - reads, drafts, writes, external communications, financial actions, destructive actions, and privileged actions need different permission paths.
Draft and commit are separate - high-risk side effects require approval records outside the prompt.
Context is built, not dumped - retrieve just enough, label trust boundaries, and preserve active state across compaction.
Long-running work needs budgets - step, time, token, cost, and tool-call budgets are part of the product.
Skills and connectors are progressively disclosed - expose names and descriptions first; load detailed workflows only when relevant.
Repeated failures become harness features - validators, tools, docs, evals, or policies beat repeating prompt advice.

Read SKILL.md first. Use references/mvp-agent-blueprint.md when the user asks to make or build an agent.

About Agent Skills

Agent Skills package reusable domain knowledge so compatible agents can discover, load, and apply a workflow only when it is relevant. This repository uses the portable SKILL.md entrypoint and works as a Codex skill, a Claude Code skill, or a skill for other Agent-Skill-aware runtimes.

Sources

Agent Skills specification: agentskills.io/specification
OpenAI function calling, tools, agents, guardrails, sandboxing, Responses, and prompt caching docs are listed in references/source-links.md.
Anthropic agent, context engineering, tool writing, long-running harness, MCP, and Agent Skills references are listed in references/source-links.md.
MCP specification and governance references are listed in references/source-links.md.

License

MIT - see LICENSE.

Credits

Authored as an Agent Skill for provider-neutral agent harness design. The recommendations synthesize common production harness patterns across OpenAI, Anthropic, OpenAI-compatible APIs, Agent Skills, MCP, and external connector workflows.

相似文章

@Xudong07452910: 开源框架推荐：《Agency Agents》—— 232 位专业 AI 智能体，按职能分工，覆盖 16 个业务部门如果你用过 Claude Code 或 Codex，可能遇到过这个问题：AI 在代码任务上很能干，但让它做前端设计、写营销…

X AI KOLs Timeline

Agency Agents 是一个开源框架，提供232个专业AI智能体覆盖16个业务部门，每个智能体具有独特个性、沟通风格和交付标准，支持Claude Code、GitHub Copilot等多种开发工具，并有社区翻译版本。

@Xudong07452910: 高质量开源项目推荐：《Agents Best Practices》—— 生产级AI Agent Harness设计指南这是一个 provider-neutral Agent Skill，专为 Claude Code、Codex 等 AI…

X AI KOLs Timeline

A guide titled 'Agents Best Practices' providing a provider-neutral Agent Skill for building production-level AI Agent harnesses, designed for tools like Claude Code and Codex.

@GitHub_Daily: 用 AI 处理长周期复杂任务，随着上下文越来越长，模型容易出现「忘事」，输出质量也直线下降。 LangChain 官方团队开源了一套教程：Deep Agents from Scratch，从零拆解主流 Agent 的核心设计模式，讲得很透…

X AI KOLs Timeline

LangChain 官方团队开源了教程 'Deep Agents from Scratch'，从零拆解主流 Agent 的核心设计模式，涵盖任务规划、上下文卸载到文件系统以及子代理隔离等思路，共 5 个渐进式 Notebook，可上手搭建完整深度研究 Agent。

@thinkszyg: https://x.com/thinkszyg/status/2066837941477920993

X AI KOLs Timeline

一篇面向开发者（尤其是AI编码工具使用者）的实用指南，介绍如何安全高效地使用Claude Code、Codex等工具进行多Agent并行开发，重点包括任务拆解、文件隔离（worktree）、边界控制、顺序合并等最佳实践，避免文件冲突和混乱。

@GitHub_Daily: 用 Claude Code 和 Codex 同时开好几个任务，在终端里切来切去查看，效率实在低。最近找到 Orca，用来统一管理多个 AI 编程智能体，让它们并行干活，结果集中在一个界面里查看。核心是支持并行工作区，把一个需求同时发给…

X AI KOLs Timeline

Orca 是一个统一的 AI 编程智能体管理工具，支持并行工作区，可同时向 Claude Code、Codex 等 AI 助手发送任务，各在独立 git 分支中生成代码并对比结果，并配有手机 App 实时跟进。

DenisSergeevitch/agents-best-practices

agents-best-practices

Use cases

Case 1 - Generate an MVP agent blueprint

Case 2 - Audit an existing agent harness

Case 3 - Design tools, permissions, and connectors

Other things the skill is good for

What this is

What this is not

Layout

Philosophy

About Agent Skills

Sources

License

Credits

相似文章

@Xudong07452910: 开源框架推荐：《Agency Agents》—— 232 位专业 AI 智能体，按职能分工，覆盖 16 个业务部门 如果你用过 Claude Code 或 Codex，可能遇到过这个问题：AI 在代码任务上很能干，但让它做前端设计、写营销…

@Xudong07452910: 高质量开源项目推荐：《Agents Best Practices》—— 生产级AI Agent Harness设计指南 这是一个 provider-neutral Agent Skill，专为 Claude Code、Codex 等 AI…

@GitHub_Daily: 用 AI 处理长周期复杂任务，随着上下文越来越长，模型容易出现「忘事」，输出质量也直线下降。 LangChain 官方团队开源了一套教程：Deep Agents from Scratch，从零拆解主流 Agent 的核心设计模式，讲得很透…

@thinkszyg: https://x.com/thinkszyg/status/2066837941477920993

提交意见反馈

@Xudong07452910: 开源框架推荐：《Agency Agents》—— 232 位专业 AI 智能体，按职能分工，覆盖 16 个业务部门如果你用过 Claude Code 或 Codex，可能遇到过这个问题：AI 在代码任务上很能干，但让它做前端设计、写营销…

@Xudong07452910: 高质量开源项目推荐：《Agents Best Practices》—— 生产级AI Agent Harness设计指南这是一个 provider-neutral Agent Skill，专为 Claude Code、Codex 等 AI…