@dongxi_nlp: https://x.com/dongxi_nlp/status/2065200644802101633

X AI KOLs Timeline 06/11/26, 10:32 PM Papers

coding-agent tools contracts agent-design harness function-calling

Summary

The article proposes that in a Coding Agent, tool invocations should be treated as contracts rather than simple functions, emphasizing the Harness's adjudication role in verification, permissions, lifecycle management, and others, and discusses in detail the composition and lifecycle of tool contracts.

https://t.co/VQEiwfiJiO

Original Article

View Cached Full Text

Cached at: 06/12/26, 10:56 AM

Tools Are Contracts, Not Functions

Harness series, part 3: Tools Are Contracts.

If a Coding Agent had a world, then every tool call is a request—via generated text—to change the state of that world.

Tool calls can easily appear too simple: the model outputs JSON, the Harness parses it, some local function gets executed, and the result is placed back into the next prompt.

That’s the tiny agent version.

But a coding agent’s tool call is not a simple function.

A tool call comes from generated text; it is requesting access to the workspace, shell, network, transcript, or another Agent.

This difference changes the entire state and environment of the Coding Agent.

The model can ask. The harness decides.

The Proposal and Contract

Imagine the model outputs:

It looks structured. It is still just a proposal.

The model producing seemingly valid JSON does not automatically grant write access.

The Harness still must answer:

Does this tool exist?
Do the args match the schema?
Is the path within the workspace?
Is this a create, overwrite, or patch?
Has the model recently read the existing file?
Is the file-state baseline fresh?
Does this action require approval?
What should be shown to the human before approval?
How much of the result output can safely enter the context?
Which transcript and state need updating after execution?

This string of questions is the tool contract.

The Naive Design

A naive design treats tools as a function map:

tools = {
    "read_file":   read_file,
    "write_file":  write_file,
    "run_shell":   run_shell,
    "search_code": search_code,
}

This code is short, so it looks clean, but it packs many responsibilities into one line.

Parsing, schema validation, path safety, permission policy, sandbox choice, execution, output clipping, transcript recording, state updates—all crammed into tool(...).

A demo can be written this way. But a coding agent that modifies real repositories needs stronger boundaries. The problem isn’t the function itself; the problem is that a tool boundary has different trust rules.

Function call vs Tool call

In a normal program, a function call is an implementation detail.

In a coding agent, a tool call is generated text requesting a change to a real-world location.

What A Tool Contract Contains

A useful tool contract cannot consist of just a name and a handler. It should describe:

tool:
  name: patch_file
  description: Apply a surgical edit to an existing file.
  args:
    path:  string   # absolute or workspace-relative path
    old_text: string # exact substring to replace
    new_text: string # replacement content
  returns:
    success: boolean
    message: string
  policies:
    path:   must be inside workspace
    state:  file must have a fresh baseline
    size:   patch must be smaller than N tokens
  lifecycle:
    - parse
    - schema
    - path
    - policy
    - exec
    - bound
    - record

Some fields are for the model. Some belong only to the Harness.

The model needs enough information to make a correct request. The Harness needs enough metadata to make a correct ruling.

The Lifecycle

A minimal viable lifecycle:

parse   → schema   → path   → policy   → exec   → bound   → record

Each step intercepts a different type of problem.

parse handles malformed output
schema validation handles missing fields and wrong types
path validation handles workspace escape
policy handles risky actions, approval, sandboxing, denial
execution runs the real handler
bounding prevents a single command or file read from flooding the next prompt
recording enables recovery and auditing in the next round

So the mental model of “just call the function” is insufficient; the function is only a tiny piece of the lifecycle.

Validate Before Approval

Validation should happen before approval.

If patch_file points outside the workspace, reject first.
If old_text is missing or ambiguous, reject first.
If the tool call immediately repeats the previous failed request, reject or retry first.
If the write target is an existing file but lacks a fresh baseline, reject first.
Approval is a product surface.

Users should not be asked to judge an already invalid request. When approval is needed, the summary should remain bounded.

The approval prompt for a file edit should show the affected path and the change shape, avoiding dumping unbounded raw content.

The Harness can later show a diff, count, or preview. The approval boundary should be small and clear.

Bounded Results Are Part Of Safety

Tool output becomes context. Context influences Agent behavior.

If a search returns 10,000 results, the model is not clearer.
If run_shell emits a huge log, the next round may lose the real user request.
If read_file repeats the same unchanged file every round, useful context is displaced by duplicate text.

So the tool contract needs result limits:

max_result_tokens: 2000

This isn’t just about token cost; it concerns the accuracy of the model’s working set.

The Harness should decide which result evidence goes into the transcript, which becomes durable state, and which remains as an external artifact reference.

When evaluating a coding-agent tool, don’t start by asking:

Can the model call it?

Instead, ask:

Is the argument schema precise?
What can this tool read or mutate?
Which paths, commands, and resources are allowed?
What kind of fresh state is needed before running?
What policy determines allow, ask, sandbox, deny?
What result evidence will come back to the model?
What durable state changes after success or failure?
How will this call and result be paired later?

The model can request. The Harness is responsible for ruling.

A Tool Call is only executed after the Harness has approved it.

This process is the contract.

In short: tools are contracts.

@dongxi_nlp: https://x.com/dongxi_nlp/status/2065200644802101633

Tools Are Contracts, Not Functions

The Proposal and Contract

The Naive Design

What A Tool Contract Contains

The Lifecycle

Validate Before Approval

Bounded Results Are Part Of Safety

Similar Articles

@Potatoloogs: https://x.com/Potatoloogs/status/2057391224592667051

@dongxi_nlp: https://x.com/dongxi_nlp/status/2066290950352081336

@xiaogaifun: The most thorough talk about Harness. This is probably the most thorough sharing I've seen about Harness Engineering, I recommend everyone watch it. Video link: https://podwise.ai/dashboard/episodes/8013289…

Code as Agent Harness

Submit Feedback

Similar Articles

@Potatoloogs: https://x.com/Potatoloogs/status/2057391224592667051

@dongxi_nlp: https://x.com/dongxi_nlp/status/2066290950352081336

This article systematically reviews AI Agent architecture and engineering practices, covering control flow, context engineering, tool design, memory, multi-agent organization, evaluation, tracing, and security. It is based on the OpenClaw implementation and emphasizes the critical role of Harness (testing and validation infrastructure) for system stability.

@xiaogaifun: The most thorough talk about Harness. This is probably the most thorough sharing I've seen about Harness Engineering, I recommend everyone watch it. Video link: https://podwise.ai/dashboard/episodes/8013289…