构建自修复智能体循环(39分钟阅读)

TLDR AI 工具

摘要

本文介绍了一种使用OpenAI的Codex构建自修复智能体循环的方法,智能体通过结构化反馈循环迭代地审查、修复和验证输出,并提供了一个修复过时API文档的实例。

OpenAI分享了一个Codex工作流,智能体使用结构化反馈循环迭代地审查、修复和验证输出,以提高可靠性。
查看原文
查看缓存全文

缓存时间: 2026/05/14 00:10

# 使用 Codex 构建迭代修复循环 来源:https://developers.openai.com/cookbook/examples/codex/build_iterative_repair_loops_with_codex 本 Cookbook 介绍闭环智能体工作流:智能体先产生输出,然后验证该输出,并利用反馈来改进下一轮。我们将探索一个文档可靠性工作流,用于检测、修复和验证过时或损坏的 API 和 SDK 示例。本示例使用了故意设置过时的 notebook(源自本 Cookbook 仓库)。我们将使用 Codex 构建此智能体循环。Codex 会审查当前状态,应用有针对性的更改,运行验证,并在反馈显示仍有问题时重复上述流程。Notebook 任务仅为示例。该模式适用于任何可以通过可靠反馈来度量智能体输出的场景。 本工作流包含三个阶段: - **审查**:检查当前工件,返回结构化发现结果,但不编辑文件。 - **修复**:根据发现结果和最新验证反馈,对复制的工件进行有针对性的编辑。 - **验证**:运行相关检查,报告仍需改进的内容。验证环节使循环闭合。修复后的 notebook 必须通过相关检查,任何遗留问题将成为下一轮修复的输入。 Codex 技术文档的迭代修复循环 本笔记本使用 **Codex CLI**(https://developers.openai.com/codex/cli)的无头模式,因此修复步骤可以从 Python 单元格(而非聊天界面)运行。第一个代码单元格安装 CLI;如果已安装,可以跳过该单元格。 在运行实时修复循环之前,请在环境中设置 `OPENAI_API_KEY`。笔记本默认使用快速的修复模型,以便整个示例能在合理时间内完成。若想尝试其他模型,请在开始前设置 `REPAIR_MODEL`。安装单元格固定了已知的 Codex CLI 版本以保证可复现性;需要更新版本时请有意修改该版本。 ``` !npm install -g @openai/[email protected] ``` ``` import concurrent.futures import json import os import shlex import shutil import subprocess import tempfile from pathlib import Path from typing import Any CANDIDATE_EXAMPLE_DIRS = [Path("."), Path("examples/codex")] EXAMPLE_DIR = next((base for base in CANDIDATE_EXAMPLE_DIRS if (base / "data" / "docs").exists()), None) if EXAMPLE_DIR is None: raise RuntimeError( "This notebook needs its companion sample notebooks. " "Download the data folder that ships with this example and place it next to " "this notebook as ./data/docs, or run from a checkout where examples/codex/data/docs exists." ) DATA_DIR = EXAMPLE_DIR / "data" / "docs" DEFAULT_RUNS_DIR = Path(tempfile.gettempdir()) / "codex_iterative_repair_loop_outputs" RUNS_DIR = Path(os.getenv("CODEX_REPAIR_RUNS_DIR", str(DEFAULT_RUNS_DIR))).expanduser() RUNS_DIR.mkdir(parents=True, exist_ok=True) MODEL = os.getenv("REPAIR_MODEL", "gpt-5.4-mini") COOKBOOK_CHAT_MODEL = os.getenv("COOKBOOK_CHAT_MODEL", "gpt-5.5") REPAIR_REASONING_EFFORT = os.getenv("REPAIR_REASONING_EFFORT", "low") if not os.environ.get("OPENAI_API_KEY"): raise ValueError("Set the OPENAI_API_KEY environment variable before running the live Codex repair loop.") CODEX_CLI = shutil.which("codex") if CODEX_CLI is None: raise RuntimeError("Run the install cell before continuing; Codex CLI is not on PATH.") ``` 下面的单元格加载三个配套 notebook,并总结驱动修复循环的元数据。样本故意设置得小巧。它们运行很快,但仍能体现架构:审查发现实质性问题,修复进行有针对性的编辑,验证为下一轮提供反馈。 如果你单独下载此笔记本,请同时下载配套的 `data/docs/` 文件夹,并放在笔记本旁边,然后再运行下面的单元格。代码期望这些示例 notebook 在本地可用。 在本示例中,验证环节端到端地执行每个修复后的 notebook。在其他领域,验证可能是单元测试、策略检查、模式验证、模拟或人工审批步骤。重要之处在于,失败会变成结构化的反馈,而不是死胡同。 ``` NOTEBOOKS = [ DATA_DIR / "qdrant_embeddings_search_pre_repair.ipynb", DATA_DIR / "getting_started_evals_pre_repair.ipynb", DATA_DIR / "knowledge_retrieval_pre_repair.ipynb", ] def read_notebook(path: Path) -> dict[str, Any]: return json.loads(path.read_text(encoding="utf-8")) def case_metadata(path: Path) -> dict[str, Any]: return read_notebook(path).get("metadata", {}).get("codex_case_study", {}) cases = [] for notebook_path in NOTEBOOKS: notebook = read_notebook(notebook_path) metadata = notebook.get("metadata", {}).get("codex_case_study", {}) repair_story = metadata.get("repair_story", {}) cases.append( { "notebook": notebook_path.name, "cells": len(notebook["cells"]), "code_cells": sum(cell["cell_type"] == "code" for cell in notebook["cells"]), "source": metadata.get("source_path"), "target_iteration": repair_story.get("target_iteration"), "repair_depth": repair_story.get("repair_depth", ""), } ) cases ``` ``` [{'notebook': 'qdrant_embeddings_search_pre_repair.ipynb', 'cells': 5, 'code_cells': 4, 'source': 'examples/vector_databases/qdrant/Using_Qdrant_for_embeddings_search.ipynb', 'target_iteration': 1, 'repair_depth': 'One-pass cleanup: modernize the local Qdrant query path and clarify the sampled fixture framing.'}, {'notebook': 'getting_started_evals_pre_repair.ipynb', 'cells': 5, 'code_cells': 4, 'source': 'examples/evaluation/Getting_Started_with_OpenAI_Evals.ipynb', 'target_iteration': 2, 'repair_depth': 'Two-pass cleanup: first modernize the obvious stale Evals flow, then use validation feedback to remove result-log brittleness.'}, {'notebook': 'knowledge_retrieval_pre_repair.ipynb', 'cells': 5, 'code_cells': 4, 'source': 'examples/How_to_call_functions_for_knowledge_retrieval.ipynb', 'target_iteration': 3, 'repair_depth': 'Three-pass cleanup: modernize model/API shape, then tighten runnable local setup, then restore the full retrieval teaching flow.'}] ``` 在要求 Codex 审查或修复工件之前,先给它一个小的共享契约。这使循环聚焦于重要问题,而不是让模型从头推断每个产品和样式规则。下面的规则定义了这些示例 notebook 的“好”标准:当前的 API 模式、清晰的设置、可运行的本地样例,以及保留原始教学目标。在其他工作流中,这个契约将描述该领域的事实来源。 ``` business_rules = { "preferred_chat_model": COOKBOOK_CHAT_MODEL, "preferred_embedding_model": "text-embedding-3-large", "modernize": [ "client.chat.completions.create -> client.responses.create", "legacy function-calling schemas -> current tools schema", "qdrant.search -> qdrant.query_points", "oaieval CLI examples -> current Evals API workflow", ], "reader_experience": [ "Make fresh-environment setup explicit.", "Keep the included examples runnable with local data and the standard library.", "Keep sample repairs self-contained unless the notebook explicitly teaches external setup.", "Remove manual result-file placeholders.", "State runtime prerequisites and side effects before readers run cells.", "Preserve the original teaching goal while modernizing the implementation.", ], } business_rules ``` 每个阶段返回结构化数据,以便下一阶段有具体内容可用。审查返回发现结果。修复返回更改摘要和更新后工件的路径。验证返回下一轮所需的剩余差异。通过结构化的交接,循环更易于调试、重新运行,并适应其他工件类型。 ``` def object_schema(properties: dict[str, Any], required: list[str] | None = None) -> dict[str, Any]: return { "type": "object", "properties": properties, "required": required or list(properties), "additionalProperties": False, } def string_array() -> dict[str, Any]: return {"type": "array", "items": {"type": "string"}} finding_schema = object_schema( { "artifact": {"type": "string"}, "issue_type": {"type": "string"}, "severity": {"type": "string"}, "description": {"type": "string"}, "suggested_fix_direction": {"type": "string"}, } ) review_schema = object_schema( {"findings": {"type": "array", "items": finding_schema}} ) fix_schema = object_schema( { "artifact": {"type": "string"}, "iteration": {"type": "integer"}, "changes_made": string_array(), "unresolved_items": string_array(), "updated_artifact_path": {"type": "string"}, } ) validation_case_schema = object_schema( { "name": {"type": "string"}, "passed": {"type": "boolean"}, "severity": {"type": "string"}, "evidence": {"type": "string"}, "feedback": {"type": "string"}, } ) validation_schema = object_schema( { "overall_passed": {"type": "boolean"}, "cases": {"type": "array", "items": validation_case_schema}, "remaining_delta": string_array(), } ) ``` 审查阶段读取工件并返回结构化发现结果。它不会运行验证,也不会编辑文件。这种分离使第一步保持专注:在更改任何内容之前识别可能的问题。我们将审查提示通过 JSON schema 发送给 `codex exec`。该 schema 确保结果机器可读,因此后续单元格可以直接将发现结果传入修复提示,而无需从之前的回答中抓取文本。 ``` def notebook_text(path: Path, max_chars: int = 7000) -> str: chunks = [] for index, cell in enumerate(read_notebook(path)["cells"]): source = "".join(cell.get("source", [])) chunks.append(f"cell {index} ({cell['cell_type']})\n{source}") text = "\n\n".join(chunks) if len(text) <= max_chars: return text return text[:max_chars] + "\n\n[truncated for prompt size]" def run_command(command: str, *, stdin: str | None = None, cwd: Path | None = None, timeout: int | None = None): cwd = Path.cwd() if cwd is None else cwd return subprocess.run( shlex.split(command), input=stdin, cwd=cwd, capture_output=True, text=True, timeout=timeout, check=False, ) def run_codex_json(prompt: str, schema: dict[str, Any], run_dir: Path) -> dict[str, Any]: run_dir.mkdir(parents=True, exist_ok=True) prompt_file = run_dir / "prompt.txt" schema_file = run_dir / "schema.json" answer_file = run_dir / "answer.json" prompt_file.write_text(prompt, encoding="utf-8") schema_file.write_text(json.dumps(schema, indent=2), encoding="utf-8") command = f""" {CODEX_CLI} exec --model {MODEL} --sandbox workspace-write --ask-for-approval never --config model_reasoning_effort={REPAIR_REASONING_EFFORT} --output-schema {schema_file} --output-last-message {answer_file} - """ result = run_command(command, stdin=prompt) (run_dir / "stdout.txt").write_text(result.stdout, encoding="utf-8") (run_dir / "stderr.txt").write_text(result.stderr, encoding="utf-8") if result.returncode != 0: raise RuntimeError(f"Codex exited with {result.returncode}. See {run_dir / 'stderr.txt'}.") return json.loads(answer_file.read_text(encoding="utf-8")) def review_notebook(path: Path, run_dir: Path) -> list[dict[str, Any]]: prompt = "\n".join( [ "You are reviewing a public OpenAI Cookbook notebook before publication.", f"Artifact: {path.name}", "Find issues that would make the notebook stale, hard to run, or confusing for a developer reader.", "Do not execute the notebook or edit files.", "Use concise issue_type labels such as stale_model, deprecated_api, setup_gap, runtime_risk, or clarity_issue.", f"Business rules: {json.dumps(business_rules)}", "Base findings only on the notebook content below.", "Keep the findings focused; three strong findings are better than a long list.", "", notebook_text(path), ] ) return run_codex_json(prompt, review_schema, run_dir)["findings"] def run_initial_review(path: Path) -> tuple[str, list[dict[str, Any]]]: return path.name, review_notebook(path, RUNS_DIR / "initial_review" / path.stem) with concurrent.futures.ThreadPoolExecutor(max_workers=min(3, len(NOTEBOOKS))) as executor: initial_reviews = dict(executor.map(run_initial_review, NOTEBOOKS)) initial_reviews ``` 修复阶段获取当前工件、审查发现结果、业务规则,以及上一轮的验证反馈。随着循环的学习,提示变得更加具体。Codex 在迭代目录内编辑副本,并返回一个简短的更改摘要。循环并不假设编辑成功;验证将在下一步做出判断。 ``` def repair_prompt(path: Path, updated_path: Path, findings: list[dict[str, Any]], remaining_delta: list[str], iteration: int) -> str: repair_story = case_metadata(path).get("repair_story", {}) return "\n".join( [ "You are repairing a copy of a public OpenAI Cookbook notebook.", f"Source notebook: {path}", f"Editable copy: {updated_path}", f"Iteration: {iteration}", "Make the smallest useful edits that address the review findings and validation delta.", "Preserve the notebook's teaching flow and original purpose.", "Keep sample repairs self-contained unless the notebook explicitly teaches external setup.", "For staged examples, focus on the most important remaining issue for this pass instead of rewriting everything at once.", "Edit only the editable copy. Do not claim the notebook passes validation.", f"Repair depth: {json.dumps(repair_story, indent=2)}", f"Business rules: {json.dumps(business_rules, indent=2)}", f"Review findings: {json.dumps(findings, indent=2)}", f"Remaining validation delta: {json.dumps(remaining_delta, indent=2)}", ] ) def repair_notebook(path: Path, iteration: int, findings: list[dict[str, Any]], remaining_delta: list[str], case_dir: Path) -> dict[str, Any]: updated_path = case_dir / "updated.ipynb" updated_path.parent.mkdir(parents=True, exist_ok=True) shutil.copy2(path, updated_path) prompt = repair_prompt(path, updated_path, findings, remaining_delta, iteration) return run_codex_json(prompt, fix_schema, case_dir / "repair") ``` 验证的工作方式类似于小型评估。我们定义期望的行为,运行相关检查,并让评判者根据该评分标准对结果进行评分。对于文档示例,执行是第一位的。许多 notebook 问题只有在运行时才会出现:缺少导入、过时的文件路径、依赖旧 API 响应的单元格,或者对作者清晰但对新读者不明确的设置指南。如果验证失败,失败将成为下一轮修复的证据。这使得下一轮修复基于观察到的行为,而不仅仅是差异中看起来正确的部分。 ``` VALIDATION_CASES = [ { "name": "api_modernization", "question": "Does the notebook avoid stale OpenAI API patterns, legacy function-calling syntax, and outdated model names?", }, { "name": "setup_reproducibility", "question": "Could a reader run the notebook from a fresh environment without hidden manual steps?", }, { "name": "artifact_integrity", "question": "Did the update preserve the notebook's teaching flow and avoid deleting substantive cells?", }, ] def short_output(value: Any, limit: int = 1200) -> str: if value is None: return "" if isinstance(value, bytes): value = value.decode("utf-8", errors="replace") return str(value)[-limit:] def execute_notebook(path: Path) -> dict[str, Any]: code_cells = sum(cell["cell_type"] == "code" for cell in read_notebook(path)["cells"]) command = f"jupyter nbconvert --to notebook --execute --inplace {path.name}" try: result = run_command( command, cwd=path.parent, timeout=int(os.getenv("SAMPLE_NOTEBOOK_TIMEOUT_SECONDS", "300")), ) except FileNotFoundError: return { "status": "failed", "executed_code_cells": 0, "error": "Jupyter or nbconvert is not installed or is not available on PATH.", "summary": "Install Jupyter with nbconvert before running the validation loop.", } except subprocess.TimeoutExpired as exc: return { "status": "failed", "executed_code_cells": 0, "error": f"Notebook execution timed out after {exc.timeout} seconds.", "summary": short_output(exc.stderr or exc.stdout), } output = result.stderr or result.stdout return { "status": "pass ```

相似文章

解析Codex代理循环

OpenAI Blog

# 解析Codex代理循环 来源:[https://openai.com/index/unrolling-the-codex-agent-loop/](https://openai.com/index/unrolling-the-codex-agent-loop/) [Codex CLI⁠\\(在新窗口中打开\\)](https://developers.openai.com/codex/cli)是我们的跨平台本地软件代理,旨在在你的机器上安全高效地运行,生成高质量、可靠的软件更改。自从我们首次推出以来,我们已经学到了大量关于如何构建世界一流软件代理的知识[自从我们首次启动

驾驭工程:在智能体优先的世界中利用Codex

OpenAI Blog

OpenAI描述了一项内部实验,使用Codex智能体构建了一个零手动编写代码的生产软件产品,在五个月内由AI编写了150万行代码,开发速度提升了约10倍。团队认识到,有效的智能体驱动开发要求工程师专注于系统设计、脚手架和反馈循环,而不是直接编写代码。

使用MCP和可观测性构建自愈代理

Reddit r/AI_Agents

一个自愈代理的演示,它利用可观测性(Monocle)和MCP来调试和修复一个损坏的应用程序,通过检查遥测数据和运行测试,将可观测性视为代理循环的一部分。