@mfpiccolo: https://x.com/mfpiccolo/status/2060069083878408689
Summary
The article argues that current agent harness frameworks like LangChain and CrewAI bundle independent concerns into a monolithic block, leading to inflexibility. It introduces the iii engine, where each responsibility is a separate, swappable worker connected via a shared bus and a single trigger primitive, allowing developers to compose their own harness by swapping workers rather than forking a framework.
View Cached Full Text
Cached at: 05/29/26, 02:10 PM
How to build your own agent harness???
Most agent teams don’t build a harness. They adopt one. LangChain, LangGraph, OpenAI Agents SDK, Anthropic SDK, CrewAI, AutoGen, the loop, the tools, the memory, and the orchestration are picked off the shelf as a single decision. The harness is a framework you import. If something inside it doesn’t fit, you fork it, fight it, or work around it.
I think that shape is wrong, and it’s the reason every long-running agent team eventually ends up rewriting its harness from scratch. The harness isn’t one thing. It’s ten or twelve different things bundled together because the surrounding ecosystem doesn’t give you a way to compose them. Pi agent packages are on the right track, but they are still in the paradigm of “Add another service and integrate it with all others.” The iii engine treats all workers the same and removes the integration logic completely. The** provider router, the credential vault, the policy engine, the approval gate, the model catalog, the session storage, the budget tracker, the after-call hook fanout, **and the durable turn loop are independent concerns. These are all interoperable with your queue, http/api server, streaming, even browser workers. A framework that ships them as one block is selling you a tradeoff you didn’t have to make.
The bet underneath iii is that they shouldn’t be one block. There should be a set of workers on a shared engine, each replaceable, each versioned independently, each connected by a single primitive: a trigger (iii.trigger()) that every other worker also uses. The harness becomes a stack of installable workers, and “build your own” stops meaning “fork a framework.” It means “swap a few workers.”
This post walks through what that actually looks like. The complete stack that drives an iii agent turn today, why each layer is its own worker, and how you replace any of them.
The 15 jobs an agent harness has to do
If you strip a production agent harness back to its responsibilities, you get a list that looks roughly like this:
-
Accept a turn request from a client and persist it
-
Resolve credentials for whichever model provider gets called
-
Look up what the chosen model can actually do (vision, tools, streaming, context window)
-
Drive the per-turn state machine, provision, stream assistant, run tools, steer, tear down
-
Load and serve skill bodies that describe each function’s request shape, error codes, and usage notes
-
Assemble the system prompt, mode paragraph, identity preamble, working directory, and default skills appendix
-
Stream tokens back to the client as the model produces them
-
Check every tool call (that’s just a function) against a policy before it runs
-
Pause tool calls that need a human decision and route the answer back to the right turn
-
Track LLM spend against per-workspace or per-agent budgets
-
Run hooks before and after tool calls (logging, redaction, custom side effects)
-
Persist the session as a branching tree so forks and resumes work
-
Compact session history when the context window fills up
-
Emit an event stream that the UI subscribes to
-
Missing piece from every agent’s company building, I see. Carry one OpenTelemetry trace across every step so you can debug it
Every serious agent harnesses most of these. The expensive ones do all of them. The cheap ones cut corners and rebuild the corners later when they hit production. The frameworks bundle them into a monolith and ship one version of each. That last part is the part that costs you, because a year in, you find out that the policy engine you want is not the policy engine the framework ships, and replacing it means replacing the harness.
The iii harness ships every one of those thirteen jobs as a separate worker on the workers.iii.dev registry. Each speaks the same WebSocket protocol. Each registers functions and triggers on the same engine bus. Each is iii worker add-able, swappable, and writable in any language with an SDK.
The stack, by worker
Here is the actual production stack from the iii-hq/workers monorepo, with each worker’s job in one line. The whole bundle ships at github.com/iii-hq/workers/harness:
Eleven workers. One engine. Each is on a published version. Each is independently runnable as a standalone process (pnpm dev:
The reason this matters: every box in that table is a place where someone can hand you a different worker, and you keep the rest. Don’t like the static model catalogue? Plug in a worker that registers models::list and reads from a live API. Don’t like file-backed credentials? Plug in a worker that registers auth::get_token and reads from a secrets manager. Want a different turn FSM for a workflow that branches differently? Replace turn-orchestrator, every dependent calls run::start and reads turn_state through the same bus, so the rest of the stack doesn’t change.
How the loop actually runs
The shape of one turn looks like this, walking through the workers in the order they fire.
A browser/cli/chat POSTs a turn through harness::trigger with {session_id, message_id, payload}. The harness meta-worker forwards payload to run::start. That hop exists so the OpenTelemetry span wrapper can seed the session and message IDs as baggage, which propagates to every nested iii.trigger call across every worker in the stack. The trace tree on the other side is one connected graph.
run::start lands on the turn-orchestrator. It persists the run request, seeds the initial TurnStateRecord in iii state at session/
The two terminal states are stopped (clean exit via finishSession()) and failed (an unexpected handler throw routes here, acks the queue so it stops retrying, and surfaces message_complete{stop_reason:‘error’} plus agent_end so the UI shows the reason). Teardown is an inline finishSession() port called from any turn-end path, not a separate enqueued step.
provisioning does three things. It boots a iii-sandbox microVM if the run needs isolated execution. It calls directory::skills::download for every namespace in system_default_skills (default [“iii://iii-directory/index”]) so iii-directory pre-caches the skill bodies the run starts with. And it assembles the system prompt in three layers: a mode paragraph picked from run_request.mode (plan, ask, or agent), the iii identity preamble that teaches the model the agent_trigger convention and the directory::skills::get on-demand discovery pattern, and an appended index of the default skills the agent boots with. The caller can override the whole prompt by passing system_prompt on run::start; otherwise the orchestrator builds it. Function schemas come from the live engine catalog.
assistant_streaming calls provider::
When the assistant returns tool calls, the FSM enters function_execute. Every tool call passes through dispatchWithHook, the single chokepoint in the orchestrator. consultBefore calls policy::check_permissions directly with a 5-second timeout. The policy worker (the harness meta-worker, in the default stack) reads iii-permissions.yaml, matches the call’s function_id against the rule set, and returns one of three outcomes:
-
allow: dispatch proceeds; the orchestrator triggers the target function and writes the result
-
deny: dispatch short-circuits with a DenialEnvelope, the result becomes a denial record
-
**needs_approval: **the individual call parks into the turn’s awaiting_approval list. The rest of the batch keeps dispatching. The turn transitions to function_awaiting_approval only when one or more entries are pending.
The approval wake is reactive and shared. The orchestrator registers exactly one **turn::on_approval **state trigger on scope approvals. When the console calls approval::resolve, the approval-gate worker writes approvals/
Fail-closed by construction: if the policy worker is unreachable or the 5-second timeout fires, consultBefore denies the call with a gate_unavailable envelope. If iii::durable::publish itself errored, the hook fanout returns publish_failed: true and the orchestrator treats it as a deny.
A few latency wins fall out of this shape. The after-function-call hook short-circuits publish_collect via a subscriber-presence cache when no durable subscriber is registered for the topic, removing roughly 500ms per executed function call. tearing_down is inlined into finishSession(), removing one durable queue hop per turn. context-compaction subscribes to a dedicated **agent::turn_end **stream the orchestrator emits at turn boundaries, so compactor wakeups are per-turn instead of per-event. The session-create fanout state trigger gates by scope alone and matches in-process, so the previous per-write **harness::session::is_create_event **RPC is gone.
After the batch completes, steering_check decides whether to continue, stop, or hit max_turns. If continue, loop back to assistant_streaming. If stop or max, finishSession() runs inline: emit agent_end, free the sandbox, transition to stopped.
Throughout the whole run, every worker that participates emits OTel spans tagged with iii.session.id, iii.message.id, and iii.function.id. Those tags are what the engine’s engine::traces::group_by reads to populate “Group by Session” / “Group by Message” / “Group by Function” in the traces UI. The instrumentation is automatic: src/runtime/worker.ts wraps every registerFunction in a Proxy so no per-worker code has to remember to add spans.
Build your own
The interesting part is that none of the workers above are special. Each one is a process that opens a WebSocket to the engine, registers some functions and triggers, and runs. The contract is the same as the contract every application worker uses. The harness is built on the same primitive your business logic is built on.
Which means “build your own harness” decomposes into the same operation as “write any worker.” You pick the layer you want to replace, you write a worker that registers the same functions on the bus, you iii worker add it, and the rest of the stack starts using your worker.
Two layers don’t show up in the worker table above but matter for how the harness behaves. Skills are how each worker advertises what its functions do. Every worker can publish a skill at iii://
Five concrete examples.
Replace the model catalogue with a live API. Write a worker that registers models::list, models::get, models::supports. Have it fetch from your provider’s catalog endpoint every N minutes and cache. Publish it. iii worker add your-org/dynamic-models-catalog. Stop the static models-catalog worker. The turn-orchestrator never knows the difference. It calls iii.trigger(‘models::list’) and the engine routes to whichever worker registered that function id most recently.
Add a new provider. The shape is provider-kimi and provider-lmstudio already prove out. Each is one worker that registers provider::
Serve skills from a private artifact store. Write a worker that registers directory::skills::get and directory::skills::list, backed by your internal docs system or a private S3 bucket. Disconnect or rename the default iii-directory worker. The orchestrator’s bootstrap calls directory::skills::download per namespace; your worker answers. The agent’s “fetch the per-function skill before calling a new function” pattern keeps working unchanged because the wire shape is the same.
Override the system prompt entirely. **run::start **accepts an optional system_prompt field. Pass it and the orchestrator uses your string verbatim, skipping the mode paragraph + identity preamble + skills appendix assembly. Useful when you have an existing prompt asset you want the harness to honour without modification. Skill download still runs in bootstrap, so the agent keeps directory::skills::get on-demand discovery even with a custom prompt.
Replace the approval gate UI surface. The default approval-gate worker registers approval::resolve. The wire schema is one function call:
The handler persists **approvals/
If you want a different policy engine (OPA, Cedar, your own DSL), write a worker that registers policy::check_permissions and returns** { decision, rule_id?, matched_constraint? }**. Disconnect the default policy worker (which is wrapped inside the harness meta-worker, so you’d disable that handler or run a stripped-down meta-worker). The turn-orchestrator’s consultBefore doesn’t know the difference. Same 5-second timeout, same fail-closed semantics, same wire shape.
The point of these examples isn’t the specific replacements. It’s the shape of the operation. Every harness layer in the iii stack is reachable through one or two function ids on the bus. Replacing a layer is writing a worker that registers those ids. The rest of the system stays.
The harness is a slider, not a fork in the road
The classic harness debate frames itself as thin vs thick. Anthropic’s thin loop versus LangGraph’s explicit DAG. The framing assumes you pick one side and live with it.
When the harness is composed of workers on the same bus, thin vs thick is just a count of how many workers you install. A thin harness is turn-orchestrator plus provider-anthropic plus auth-credentials plus a minimal harness meta-worker. That’s it. No approvals, no budgets, no policy engine, no hook fanout. Run anything. Trust the model. Useful for autonomous research agents, experimental loops, anything internal.
A thick harness is all thirteen workers plus context-compaction plus a custom policy worker plus a custom approval-gate plus a Slack-integrated approval surface plus the budget worker enforcing per-workspace caps. Useful for an agent running customer workflows where every tool call needs to be auditable and every model spend has to roll up to a finance dashboard.
The architectural distance between thin and thick isn’t a rewrite. It’s a config change. Same wire protocol, same trace shape, same observability story. The slider moves by adding and removing workers from your config.yaml. Everything else holds.
It applies inside a single worker too. The turn-orchestrator just shipped a refactor that collapsed its FSM from eleven states to seven, deleted the per-call turn::approval_resume::
This is the part the framework model can’t give you. A framework picks a position on the slider for you and locks you in. The worker model leaves the slider in your hand.
What this means in practice
If you’ve been running an agent on top of a framework and feeling the same boundary problems most teams hit at scale, the answer is probably not “rewrite the harness in our own framework.” The policy engine doesn’t extend the way you need. The approval UI is wired into the framework’s chat surface. The credential store can’t talk to your secrets manager. The budget tracker is in a sidecar database the trace can’t see. The answer is to switch to a substrate where the harness is decomposed in the first place.
The fastest way to feel the argument is to clone github.com/iii-hq/workers, pnpm install, pnpm build, and run the composite entry point. You’ll get the full fourteen-worker harness pointed at an iii engine. You can disable any worker by removing its entry from the boot list. You can swap any worker by writing a replacement that registers the same function ids. You can extend any worker by adding a subscriber to its hook topics. hook-fanout::publish_collect is the generic every iii hook builds on.
The docs live at iii.dev/docs. The engine is at github.com/iii-hq/iii. The worker registry is at workers.iii.dev. The harness bundle is at github.com/iii-hq/workers/harness.
The bet
A harness is not a thing you install. A harness is a set of jobs your system has to do for an agent to run durably, safely and observably. The framework era bundled those jobs together because nothing underneath gave you a way to compose them.
iii’s bet is that one primitive: a worker that connects to the engine over WebSocket and registers functions and triggers is small enough to absorb every one of those jobs separately, and that the resulting stack is more useful than any framework because every layer is independently replaceable.
You don’t adopt the iii harness. You install the workers you want, write the ones you need, and end up with a harness shaped exactly like your system. Same protocol on every layer. Same trace across every call. Same iii worker add for the parts you take from the registry as for the parts you publish yourself.
That’s what “build your own agent harness” looks like when the substrate is the right shape. Pick the workers. Write the missing ones. Compose. The harness is the composition.
Join us in building the perfect agent harness that the modern world needs: discord.gg/iiidev
iii is open source. Get started at iii.dev/docs. The harness workers are at github.com/iii-hq/workers and the engine is at github.com/iii-hq/iii.
— Mike Piccolo, Founder & CEO @iiidevs
Similar Articles
@Potatoloogs: https://x.com/Potatoloogs/status/2057391224592667051
This article deeply analyzes the concept of Agent Harness, which is the engineering infrastructure wrapped around an LLM, including 12 components such as orchestration loops, tool calling, memory systems, context management, etc. The article cites practices from companies like Anthropic, OpenAI, and LangChain, arguing for the critical role of the harness in production-grade AI agents.
@mfpiccolo: Kaffu's "rich man's toy" line is the one of the sharp thing I've read on harnesses this year. He's right about the symp…
The tweet discusses the problem of bloat in AI agent harnesses, agreeing with Kaffu's critique that harnesses become "rich man's toys," and advocates for a composable architecture of small, replaceable workers to reduce drift and keep systems cheap and debuggable.
@hwchase17: https://x.com/hwchase17/status/2053157547985834227
The article outlines a systematic 'Agent Development Lifecycle' (Build, Test, Deploy, Monitor) for creating and managing AI agents effectively, highlighting key frameworks like LangChain, LangGraph, and CrewAI.
@djfarrelly: https://x.com/djfarrelly/status/2052779234234380479
The article argues that AI agent development should rely on stable execution primitives rather than rigid frameworks, which frequently change with emerging orchestration patterns. It emphasizes durable steps, persistent state, parallel coordination, event-driven flow, and observability to prevent costly rewrites as best practices evolve.
"At what point does adding another agent actually hurt your system? Asking because my 6-agent pipeline is slower and less reliable than my old 2-agent one
A developer shares real-world experiences with AI orchestration frameworks (LangGraph, CrewAI, AutoGen), noting trade-offs between ease of prototyping and production reliability, and asks the community about handling failures, human-in-the-loop, and token costs.