@krystal_ning: 感谢分享我们的调研！我们还在维护一个 Awesome Code as Agent Harness Papers 仓库，用于收集近期关于…

X AI KOLs Following 2026/05/20 05:25 工具

awesome-list survey agent-harness code-as-agent agentic-systems research github

摘要

Krystal Ning 分享了一个精选的 Awesome 列表仓库，收录关于以代码为中心的智能体系统和工具链工程的论文，该列表伴随一项名为“Code as Agent Harness”的调研。

感谢分享我们的调研！我们还在维护一个 Awesome Code as Agent Harness Papers 仓库，用于收集近期关于以代码为中心的智能体系统和工具链工程的工作：https://github.com/YennNing/Awesome-Code-as-Agent-Harness-Papers…

查看原文

查看缓存全文

缓存时间: 2026/05/20 06:25

感谢您分享我们的调查！我们还维护了一个 Awesome Code as Agent Harness Papers 仓库，收录了关于以代码为中心的智能体系统与操控框架工程的最新研究：https://github.com/YennNing/Awesome-Code-as-Agent-Harness-Papers…

YennNing/Awesome-Code-as-Agent-Harness-Papers

来源：https://github.com/YennNing/Awesome-Code-as-Agent-Harness-Papers

Awesome Code as Agent Harness Papers（代码作为智能体操控框架论文集锦）

Awesome (https://awesome.re)
arXiv (https://arxiv.org/abs/2605.18747)
官方网站 (https://code-as-harness.github.io/code-as-harness-webpage/)
Hugging Face 当日最佳论文 #1 (https://huggingface.co/papers/2605.18747)
@_akhaliq (https://x.com/_akhaliq/status/2056900568921133565?s=20)
访问者统计

本仓库是综述论文 《Code as Agent Harness: Toward Executable, Verifiable, and Stateful Agent Systems》（代码作为智能体操控框架：迈向可执行、可验证、有状态的智能体系统）(https://arxiv.org/abs/2605.18747) 的配套资源。
我们研究代码在智能体 AI 中新兴的角色：代码不再仅仅是生成的产物，而是日益成为一种可执行、可检查、有状态的操控框架，智能体通过它进行推理、行动、建模环境、接收反馈以及协调。本仓库围绕三个相互关联的层次组织代表性论文：操控框架接口（Harness Interface）、操控框架机制（Harness Mechanisms） 和 操控框架扩展（Scaling the Harness），涵盖了编码助手、GUI/OS 自动化、科学发现和具身智能等方向。

👋 我们欢迎论文建议、拉取请求以及与代码作为智能体操控框架相关的合作。请联系 [email protected]、[email protected]、[email protected]、[email protected] 和 [email protected]。我们将持续更新本仓库，收录以代码为中心的智能体系统与操控框架工程的最新工作。

📚 如果您觉得本资源有用，请引用并给仓库点星 (https://github.com/YennNing/Awesome-Code-as-Agent-Harness-Papers)：

@article{ning2026codeasharness,
  title   = {Code as Agent Harness: Toward Executable, Verifiable, and Stateful Agent Systems},
  author  = {Ning, Xuying and Tieu, Katherine and Fu, Dongqi and Wei, Tianxin and Li, Zihao and Bei, Yuanchen and others},
  journal = {arXiv preprint arXiv:2605.18747},
  year    = {2026}
}

框架总览图

🔔 新闻

[2026-05] 🚀 我们的综述 《Code as Agent Harness: Toward Executable, Verifiable, and Stateful Agent Systems》 已在 arXiv (https://arxiv.org/abs/2605.18747) 上线。幻灯片和项目页面链接将在可用后添加。

🧩 操控框架接口

代码作为模型与任务环境之间的基本接口。程序将模型输出转化为可执行、可检查、有状态的结构：代码使推理变得可执行，行动变得可编程，环境状态变得可检查。

操控框架接口示意图

💭 用于推理的代码

程序将内部逻辑外化为可验证的计算，允许解释器、符号求解器、执行轨迹或过程奖励来检查和优化中间步骤。

程序委托推理

论文	发表会议/期刊
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks (https://arxiv.org/abs/2211.12588)	TMLR 2023
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning (https://arxiv.org/abs/2310.03731)	ICLR 2024
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator (https://arxiv.org/abs/2312.04474)	ICML 2024
Method-Based Reasoning for Large Language Models: Extraction, Reuse, and Continuous Improvement (https://arxiv.org/abs/2508.04289)	arXiv 2025
Code-Enabled Language Models Can Outperform Reasoning Models on Diverse Tasks (https://arxiv.org/abs/2510.20909)	arXiv 2025
When Do Program-of-Thought Works for Reasoning? (https://ojs.aaai.org/index.php/AAAI/article/view/29721)	AAAI 2024
PAL: Program-aided Language Models (https://proceedings.mlr.press/v202/gao23f.html)	ICML 2023
Show Your Work: Scratchpads for Intermediate Computation with Language Models (https://arxiv.org/abs/2112.00114)	arXiv 2021
Reasoning Like Program Executors (https://aclanthology.org/2022.emnlp-main.48/)	EMNLP 2022
Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments (https://aclanthology.org/2025.findings-acl.817/)	ACL 2025 Findings
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (https://openreview.net/forum?id=_VjQlMeSB_J)	NeurIPS 2022

混合符号–神经执行

论文	发表会议/期刊
Self-Verifying Reflection Helps Transformers with CoT Reasoning (https://neurips.cc/virtual/2025/poster/119948)	NeurIPS 2025
SSR: Socratic Self-Refine for Large Language Model Reasoning (https://arxiv.org/abs/2511.10621)	arXiv 2025
CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance (https://arxiv.org/abs/2502.04350)	ICML 2025
Graph of Thoughts: Solving Elaborate Problems with Large Language Models (https://ojs.aaai.org/index.php/AAAI/article/view/29720)	AAAI 2024
Code-as-Symbolic-Planner: Foundation Model-Based Robot Planning via Symbolic Code Generation (https://arxiv.org/abs/2503.01700)	IROS 2025

迭代代码接地推理

论文	发表会议/期刊
NExT: Teaching Large Language Models to Reason about Code Execution (https://arxiv.org/abs/2404.14662)	ICML 2024
What I cannot execute, I do not understand: Training and Evaluating LLMs on Program Execution Traces (https://arxiv.org/abs/2503.05703)	arXiv 2025
Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation (https://arxiv.org/abs/2412.15118)	ICML 2025
CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment (https://arxiv.org/abs/2510.18471)	arXiv 2025
RLTF: Reinforcement Learning from Unit Test Feedback (https://arxiv.org/abs/2307.04349)	TMLR 2023
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning (https://arxiv.org/abs/2410.02089)	ICML 2025
Execution guided line-by-line code generation (https://openreview.net/forum?id=ySFDPoiANu)	NeurIPS 2025
R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning (https://arxiv.org/abs/2505.21668)	arXiv 2025
CYCLE: Learning to Self-Refine the Code Generation (https://dl.acm.org/doi/full/10.1145/3649825)	OOPSLA 2024
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback (https://aclanthology.org/2024.acl-long.251/)	ACL 2024
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning (https://openreview.net/forum?id=WaGvb7OzySA)	NeurIPS 2022
CodePRM: Execution Feedback-enhanced Process Reward Model for Code Generation (https://aclanthology.org/2025.findings-acl.428/)	ACL 2025 Findings
SatLM: Satisfiability-Aided Language Models Using Declarative Prompting (https://openreview.net/forum?id=8tt9KxyV2s)	NeurIPS 2023
Self-Edit: Fault-Aware Code Editor for Code Generation (https://aclanthology.org/2023.acl-long.45/)	ACL 2023

🤖 用于行动的代码

生成的程序用作策略、工具调用、行为树或可复用技能，适用于具身、GUI、软件和工具使用环境。

接地技能选择

论文	发表会议/期刊
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (https://arxiv.org/abs/2204.01691)	CoRL 2022
Robots That Ask for Help: Uncertainty Alignment for Large Language Model Planners (https://arxiv.org/abs/2307.01928)	CoRL 2023
Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance (https://arxiv.org/abs/2310.10021)	CoRL 2023
SkillVLA: Tackling Combinatorial Diversity in Dual-Arm Manipulation via Skill Reuse (https://arxiv.org/abs/2603.03836)	arXiv 2026
Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition (https://proceedings.mlr.press/v229/ha23a.html)	CoRL 2023
Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models (https://ieeexplore.ieee.org/document/10611448/)	ICRA 2024

程序化策略生成

论文	发表会议/期刊
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis (https://arxiv.org/abs/2402.16117)	ICML 2024
CP-Agent: Agentic Constraint Programming (https://arxiv.org/abs/2508.07468)	arXiv 2025
LLM-Driven Corrective Robot Operation Code Generation with Static Text-Based Simulation (https://arxiv.org/abs/2512.02002)	ICRA 2026
NormCode: A Semi-Formal Language for Auditable AI Planning (https://arxiv.org/abs/2512.10563)	arXiv 2025
ALRM: Agentic LLM for Robotic Manipulation (https://arxiv.org/abs/2601.19510)	arXiv 2026
RACAS: Controlling Diverse Robots With a Single Agentic System (https://arxiv.org/abs/2603.05621)	arXiv 2026
ReAct: Synergizing Reasoning and Acting in Language Models (https://openreview.net/forum?id=WE_vluYUL-X)	ICLR 2023
GenSwarm: Scalable Multi-Robot Code-Policy Generation and Deployment via Language Models (https://www.nature.com/articles/s44182-025-00065-w)	npj Robotics 2026
Code as Policies: Language Model Programs for Embodied Control (https://ieeexplore.ieee.org/document/10160591/)	ICRA 2023
Robotic Programmer: Video Instructed Policy Code Generation for Robotic Manipulation (https://arxiv.org/abs/2501.04268)	arXiv 2025
Code-BT: A Code-Driven Approach to Behavior Tree Generation for Robot Tasks Planning with Large Language Models (https://www.ijcai.org/proceedings/2025/980)	IJCAI 2025

终身代码型智能体

论文	发表会议/期刊
Growing with Your Embodied Agent: A Human-in-the-Loop Lifelong Code Generation Framework for Long-Horizon Manipulation Skills (https://arxiv.org/abs/2509.18597)	arXiv 2025
ViReSkill: Vision-Grounded Replanning with Skill Memory for LLM-Based Planning in Lifelong Robot Learning (https://arxiv.org/abs/2509.24219)	arXiv 2025
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience (https://arxiv.org/abs/2603.24533)	arXiv 2026
Voyager: An Open-Ended Embodied Agent with Large Language Models (https://openreview.net/forum?id=ehfRiF0R3a)	TMLR 2023
Lifelong Language-Conditioned Robotic Manipulation Learning (https://arxiv.org/abs/2603.05160)	arXiv 2026

🌍 用于环境建模的代码

程序状态、仓库、轨迹、模拟器和测试表示智能体交互的状态、动态和反馈信号。

结构化世界表示

论文	发表会议/期刊
From Programs to Poses: Factored Real-World Scene Generation via Learned Program Libraries (https://openreview.net/forum?id=Ew8bJkSt3g)	NeurIPS 2025
PoE-World: Compositional World Modeling with Products of Programmatic Experts (https://openreview.net/forum?id=obwRcksFZw)	NeurIPS 2025
Code2World: A GUI World Model via Renderable Code Generation (https://arxiv.org/abs/2602.09856)	arXiv 2026
Code2Worlds: Empowering Coding LLMs for 4D World Generation (https://arxiv.org/abs/2602.11757)	arXiv 2026
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation (https://aclanthology.org/2023.emnlp-main.824/)	EMNLP 2023

执行轨迹世界建模

论文	发表会议/期刊
SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning (https://arxiv.org/abs/2406.01006)	NeurIPS 2024
CWM: An Open-Weights LLM for Research on Code Generation with World Models (https://arxiv.org/abs/2510.02387)	arXiv 2025
Reinforcement World Model Learning for LLM-based Agents (https://arxiv.org/abs/2602.05842)	arXiv 2026
Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning (https://arxiv.org/abs/2602.10090)	arXiv 2026
Aligning Agentic World Models via Knowledgeable Experience Learning (https://arxiv.org/abs/2601.13247)	arXiv 2026
WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment (https://proceedings.neurips.cc/paper_files/paper/2024/file/820c61a0cd419163ccbd2c33b268816e-Paper-Conference.pdf)	NeurIPS 2024

代码接地评估环境

论文	发表会议/期刊
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution (https://arxiv.org/abs/2401.03065)	ICML 2024
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code (https://openreview.net/forum?id=chfJJYC3iL)	ICLR 2025
SWE-bench: Can Language Models Resolve Real-world Github Issues? (https://arxiv.org/abs/2310.06770)	ICLR 2024
AgentBench: Evaluating LLMs as Agents (https://arxiv.org/abs/2308.03688)	ICLR 2024
CoRe: Benchmarking LLMs’ Code Reasoning Capabilities through Static Analysis Tasks (https://neurips.cc/virtual/2025/poster/121601)	NeurIPS 2025
Geogrambench: Benchmarking the geometric program reasoning in modern LLMs (https://arxiv.org/abs/2505.17653)	arXiv 2025
CodeGlance: Understanding Code Reasoning Challenges in LLMs through Multi-Dimensional Feature Analysis (https://arxiv.org/abs/2602.13962)	arXiv 2026
Endless Terminals: Scaling RL Environments for Terminal Agents (https://arxiv.org/abs/2601.16443)	arXiv 2026
Reflexion: Language Agents with Verbal Reinforcement Learning (https://openreview.net/forum?id=vAElhFcKW6)	NeurIPS 2023
CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution (https://aclanthology.org/2025.acl-long.1158/)	ACL 2025
InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback (https://proceedings.neurips.cc/paper_files/paper/2023/hash/4b175d846fb008d540d233c188379ff9-Abstract-Datasets_and_Benchmarks.html)	NeurIPS 2023

🛠️ 操控框架机制

当代码被放入智能体循环后，操控框架必须决定接下来执行什么、保留有用的状态、暴露正确的工具，并将失败转化为纠正行动。

操控框架机制示意图

🗺️ 代码智能体的规划

规划是操控框架的控制：它结构化智能体如何将意图外化为可执行步骤，安排与代码产物和工具的交互，并调节轨迹。

相似文章

代码即代理框架

Hugging Face Daily Papers

本综述论文提出了一个统一视角，将代码视为代理系统中代理推理与执行的操作基础，围绕三个层次组织讨论：框架接口、机制与扩展。

@rohanpaul_ai: 这篇来自Meta、斯坦福和伊利诺伊的调研论文认为，当代码成为AI智能体的主要工作层时，它们的效果更好…

X AI KOLs Following

这篇来自Meta、斯坦福和伊利诺伊的调研论文认为，当代码被用作AI智能体的主要工作层时，它们表现更好，将代码视为推理、行动和建模的环境。作者引入了‘智能体框架’的概念，包含工具、内存、沙箱和反馈循环。

@FakeMaidenMaker: awesome-harness-engineering，这个项目收录的知识含金量远超这个数字——OpenAI、Anthropic、微软、Meta 的一线工程实践全在里头。 GitHub：https://github.com/ai-boos…

X AI KOLs Timeline

awesome-harness-engineering 是一个收录了来自 OpenAI、Anthropic、微软、Meta 等公司关于 AI agent harness 工程（上下文管理、工具设计、验证回路、记忆系统等）实践资料的精选资源列表，旨在帮助开发者构建可靠的 agent 框架。

@tom_doerr: 智能体深度研究资源精选列表 https://github.com/DavidZWZ/Awesome-Deep-Research…

X AI KOLs Timeline

本文介绍了 'Awesome-Deep-Research'，这是一个精选的 GitHub 仓库，聚合了与智能体深度研究相关的资源、工具和论文。

@AlphaSignalAI: https://x.com/AlphaSignalAI/status/2057153343081111582

X AI KOLs Timeline

UIUC、Meta和斯坦福大学联合发布的一份100页调查报告引入了人工智能代理的三个 harness 层（接口、机制、Scaling），认为大多数代理失败源于 harness 问题而非推理缺陷，并提供了一个用于审计代理堆栈的分类体系。

YennNing/Awesome-Code-as-Agent-Harness-Papers

Awesome Code as Agent Harness Papers（代码作为智能体操控框架论文集锦）

🔔 新闻

📋 目录

🧩 操控框架接口

💭 用于推理的代码

程序委托推理

混合符号–神经执行

迭代代码接地推理

🤖 用于行动的代码

接地技能选择

程序化策略生成

终身代码型智能体

🌍 用于环境建模的代码

结构化世界表示

执行轨迹世界建模

代码接地评估环境

🛠️ 操控框架机制

🗺️ 代码智能体的规划

相似文章

代码即代理框架

@rohanpaul_ai: 这篇来自Meta、斯坦福和伊利诺伊的调研论文认为，当代码成为AI智能体的主要工作层时，它们的效果更好…

@FakeMaidenMaker: awesome-harness-engineering，这个项目收录的知识含金量远超这个数字——OpenAI、Anthropic、微软、Meta 的一线工程实践全在里头。 GitHub：https://github.com/ai-boos…

@tom_doerr: 智能体深度研究资源精选列表 https://github.com/DavidZWZ/Awesome-Deep-Research…

@AlphaSignalAI: https://x.com/AlphaSignalAI/status/2057153343081111582

提交意见反馈