@janehu07: https://x.com/janehu07/status/2058359677843599494

X AI KOLs Timeline 05/24/26, 01:29 AM Papers

agent-harness llm-agents agent-infrastructure coding-agent taxonomy etclovg

Summary

This learning note introduces the concept of an agent harness as the infrastructure layer around an LLM, proposing the ETCLOVG taxonomy (Execution, Tooling, Context, Lifecycle, Observability, Verification, Governance) and demonstrating its application through a coding agent case study.

https://t.co/6p3vxHrf6s

Original Article

View Cached Full Text

Cached at: 05/24/26, 12:29 PM

Learning note: What is an agent harness?

Recently I started learning more about agents from a systems perspective. One framing I found particularly useful is:

Agent = Model + Harness

I used to think agent performance was mostly about model capability: better reasoning, better coding ability, better tool use. But for long-running tasks, the paper argues that the harness around the model can be just as important. With the exact same model, changes in tool interfaces, context management, execution environments, verification, or orchestration can lead to massive performance gains.

This made me realize that “agent infra” is not one narrow thing. It is more like the complete system stack that turns model calls into reliable task execution.

The paper proposes a taxonomy called ETCLOVG to break down this infrastructure:

Execution: Where the agent runs.
Tooling: How the agent discovers and calls tools.
Context: What information the model sees.
Lifecycle: How the task is orchestrated over time.
Observability: How traces, cost, latency, and failures are monitored.
Verification: How we evaluate whether the agent actually succeeded.
Governance: How permissions, policies, and security boundaries are enforced.

Case Study: A Coding Agent

A coding agent is a great example. When we say “an agent fixes a bug,” it is not just one LLM call that magically generates a patch. A full workflow looks much more like this:

⚙️Execution: The agent starts inside a sandboxed repo environment, so it can inspect files, run commands, and execute tests without touching the user’s real machine.
🛠️ Tooling: It uses tools like search, grep, read_file, edit_file, and run_tests to interact with the codebase. These tools need clear inputs, structured outputs, and reliable error messages.
🧠 Context: It brings relevant files, error logs, issue descriptions, and previous attempts into the context window, instead of loading the entire repo.
🔄 Lifecycle: It follows an edit-test-debug loop: understand the bug → locate relevant code → propose a fix → edit files → run tests → inspect failures → iterate. In real systems, this lifecycle can be much more complex: retry, rollback, summarize state, recover from failures, or split work across multiple agents.
📊 Observability: During the process, the system records traces: which files were opened, which tools were called, how many tokens were used, where time was spent, and what failed.
✅ Verification: The agent validates the patch by running tests or benchmark-specific checks, and tries to attribute the failure if the fix does not work.
🛡️ Governance: The system enforces boundaries: what files the agent can access, whether it can use the network, whether it needs approval before destructive commands, and how actions are audited.

This framework helped me connect a lot of topics that used to feel separate:

RAG / Memory → Context
MCP / Tool Schema → Tooling
SWE-bench / Terminal-Bench → Verification
Sandboxing → Execution
Trace Analysis / Cost Tracking → Observability
Permissions / Audit → Governance

Open question I’m still thinking about:

As models become stronger, will the impact of harness engineering become smaller? Or will harness become even more important because stronger models can take more actions and therefore need better control, verification, and governance?

My current guess is that the relative impact of some harness tricks may decrease over time, but the need for a robust harness probably will not disappear.

Curious what others think 🤔

Link to the paper: https://picrew.github.io/LLM-Harness/

@janehu07: https://x.com/janehu07/status/2058359677843599494

Learning note: What is an agent harness?

Similar Articles

Code as Agent Harness

@Potatoloogs: https://x.com/Potatoloogs/status/2057391224592667051

@ByteMohit: https://x.com/ByteMohit/status/2063493300884246598

Auditing Agent Harness Safety

Submit Feedback

Similar Articles

@Potatoloogs: https://x.com/Potatoloogs/status/2057391224592667051

@ByteMohit: https://x.com/ByteMohit/status/2063493300884246598

@vintcessun: Tonight I came across a learning roadmap project that redefined where to start learning Agent. I used to think Agent was just a pile of tools and frameworks, but its core is the "observe-think-execute" loop and the harness engineering's organization of permissions, state, and backtracking. It breaks down learning into building a minimal Agent loop from scratch all the way to deploying a real Agent, with 8 stages, each with clear deliverables and recommended resources — not just links but an actionable todo list. This systematic approach made me realize my previous learning was too fragmented.