@cellinlab: https://x.com/cellinlab/status/2064144608242679822

X AI KOLs Timeline 06/09/26, 12:36 AM News

loop-engineering ai-agents claude-code codex autonomous-agents prompt-engineering coding-agents

Summary

This article introduces the concept of Loop Engineering — instead of directly writing prompts for AI agents, it designs a system (loop) that recursively lets the agent iterate on tasks until completion. The article provides a detailed comparison of how Claude Code and Codex implement five building blocks: automations, worktrees, skills, sub-agents, etc. It suggests this could be the future trend of collaborating with coding agents, but also warns about token costs and AI slop issues.

https://t.co/mPd6We8obR

Original Article

View Cached Full Text

Cached at: 06/09/26, 12:46 PM

Loop Engineering

Loop Engineering is replacing “you personally writing prompts for agents.” Its core idea is: you no longer directly prompt the agent; instead, you design a system that prompts the agent.

Here, a “loop” can be understood as a recursive goal: you define a purpose, then let the AI iterate until the task is complete. It roughly consists of five building blocks, and both Claude Code and Codex already have all five capabilities.

I believe this might be how we collaborate with coding agents in the future. However, it’s still very early. I’m also skeptical, and you need to be very careful about token costs, because consumption can vary wildly between different usage patterns, especially depending on whether you are token-rich or token-poor. You still need a way to ensure quality doesn’t degrade, and concerns about “AI slop” are entirely justified. That said, let’s see what it actually is.

Recently someone said:

You should no longer prompt a coding agent. You should design loops, and let the loops prompt your agents.

Similarly, the head of Anthropic’s Claude Code said:

I no longer prompt Claude directly. I have a set of loops running that prompt Claude and decide what to do next. My job is to write loops.

So, what do these statements really mean?

For about the last two years, the basic way to get a coding agent to do something was to write a good prompt and provide enough context. You’d type something in, read its response, then type the next thing. The agent was a tool, and you held that tool, round after round.

That part is somewhat over, or at least some people think it’s ending.

Now, you build a small system. This system discovers work, distributes work, checks work, records completion, and decides what to do next. You let this system trigger agents, rather than triggering them yourself.

I previously wrote about a concept close to this: the harness. A harness is the runtime environment built for a single agent — the system that builds software. Loop Engineering sits above the harness. It’s like a harness, but it runs on a schedule, spawns little helpers, and feeds itself.

What surprised me is that this is no longer a “tool problem.”

A year ago, if you wanted to make a loop, you’d write a bunch of bash scripts and maintain them long-term. That was your own thing, and only you could use it.

But now, these capabilities are built directly into products. The abilities Steinberger listed map almost one-to-one to the Codex app; similarly, they map nearly one-to-one to Claude Code. Once you realize their shapes are the same, you stop agonizing over which tool to use and start designing a loop that works regardless of which tool you’re sitting in.

A loop needs five things, plus a place to remember state. Let’s list them, then map them out.

Automations: triggered automatically on a schedule, doing discovery and triage.
Worktrees: allow two agents working in parallel to avoid stepping on each other.
Skills: write down project knowledge so the agent doesn’t have to guess.
Plugins and connectors: connect the agent to tools you already use.
Sub-agents: let one agent propose ideas, and another agent review them.

And the sixth thing is memory.

This memory can be a markdown file, a Linear board, or any place outside a single conversation that records “what’s done” and “what’s next.”

It sounds too simple to be important. But it’s the same trick every long-running agent relies on. The model forgets everything between runs, so memory must live on disk, not just in context.

The agent forgets, but the repo does not.

Now both products have all five capabilities.

The names differ slightly between places, but the abilities are essentially the same. Let’s go through each one, because frankly, the details are what make a loop really run or quietly leak everywhere.

Automations

Automations are what make a loop a true loop, not just a one-shot task.

In the Codex app, you can create an automation in the Automations tab. You choose the project, the prompt it runs, the frequency, and whether it runs on your local checkout or on a background worktree.

Runs that find issues go into the Triage inbox; runs that find nothing are automatically archived, which is nice.

Internally, OpenAI uses them for boring tasks like daily issue triage, summarizing CI failures, writing commit briefings, finding bugs someone introduced last week.

An automation can also call a skill. That makes repetitive tasks more maintainable. You trigger a skill instead of pasting a wall of instructions that nobody will ever update into the schedule.

Claude Code reaches the same destination via scheduling and hooks.

You can use /loop to run a prompt or command at intervals; you can schedule a cron task; you can use hooks to trigger shell commands at certain points in the agent’s lifecycle; if you want it to keep running after you close your laptop, you can push the whole thing to GitHub Actions.

The essence is exactly the same: you define an autonomous task, give it a cadence, and have findings come to you instead of you checking everywhere.

There’s also an in-session primitive worth knowing, closer to the core of this article.

/loop runs repeatedly on a cadence.

/goal runs until a condition you write is actually met. After each round, a separate small model checks whether the task is done. That is, the agent writing code is not the one scoring itself.

You can give it a condition like:

all tests in test/auth pass and lint is clean

Then you can walk away.

Codex has the same thing, also called /goal. It continues across multiple rounds until a verifiable stop condition is met, and supports pause, resume, and clear.

Same primitive, both tools have it. That’s basically the pattern repeating throughout this article.

So this part surfaces work. The rest of the loop acts on that work.

Worktrees

As soon as you run more than one agent at a time, files start conflicting, and that becomes a point of failure.

Two agents modifying the same file at the same time is essentially as troublesome as two engineers committing the same piece of code without communicating beforehand.

git worktree solves this. It’s an independent working directory on its own branch, sharing the same repo history. So one agent’s changes literally cannot touch another agent’s checkout.

Codex has built-in worktree support, so multiple threads can work on the same repo simultaneously without stepping on each other.

Claude Code also provides the same isolation via git worktree. You can use the –worktree flag to open a session in an independent checkout, or set isolation: worktree on a subagent so each helper gets a fresh checkout that is cleaned up when done.

I’ve written about the “human” side of this before: worktrees remove mechanical conflicts, but you are still the bottleneck. The thing that determines how many agents you can run in parallel is not the tool — it’s your review bandwidth.

Skills

Skills let you avoid re-explaining the same project context from scratch every session like a goldfish.

Both tools use the same format: a folder containing a SKILL.md file with instructions and metadata, possibly with scripts, references, and assets.

Codex runs a skill when you invoke it with $ or /skills; it may also auto-invoke when your task matches the skill’s description. That’s why a tight, plain description is more useful than a clever but vague one.

Claude Code does the same, and I’ve written about this pattern before.

Skills are also where intent stops burning tokens over and over.

I’ve said before that every agent session starts cold. Any gap in your intent gets filled with a confident guess.

A skill writes that intent externally: project conventions, build steps, “we don’t do this because of a past incident,” etc. You write it once, and the agent reads it every run.

Without skills, every cycle of the loop has to re-derive your entire project from scratch.

With skills, it starts to have a bit of compound interest.

One distinction: a skill is an authoring format, while a plugin is how you distribute it.

When you want to share a skill across repos, or bundle several skills together, you wrap them into a plugin.

Codex does this. Claude Code does this too.

Connectors

A loop that can only see the filesystem is a very small loop.

Connectors, based on MCP, let the agent read your issue tracker, query databases, call staging APIs, send messages in Slack.

Both Codex and Claude Code support MCP, so a connector you write for one usually works in the other as well.

Plugins can also bundle connectors and skills together. That way your teammates just install your setup, instead of reconstructing everything from memory.

This is the difference between “the agent says: here’s the fix” and “the loop itself opens a PR, links the Linear ticket, and pings the channel when CI turns green.”

Connectors are why the loop can act in your real environment, not just tell you what it would do if it could.

Sub-agents

In a loop, the most useful structural design is splitting the “maker” from the “checker.”

The model that writes code is too kind when grading its own homework.

A second agent with different instructions, and sometimes a different model, catches problems the first agent talked itself into ignoring.

Codex only creates sub-agents when you ask for them. They run in parallel and then merge results into a single answer.

You can define your own agents as TOML files in .codex/agents/. Each file contains name, description, instructions, and optionally model and reasoning effort.

This way, your security reviewer can use a strong model with high effort, while your explorer can be a fast read-only agent.

Claude Code also does the same with sub-agents and agent teams in .claude/agents/, passing work between different agents.

The common split in both tools is:

One agent for exploration;
One agent for implementation;
One agent to verify results against a spec.

I’ve written about this from both angles before.

It’s especially important in a loop because the loop runs when you’re not watching. So a verifier you truly trust is the only reason you can walk away.

Of course, sub-agents consume more tokens because each one does its own model and tool work. So spend them where a second opinion is worth paying for.

This is basically what Claude Code’s /goal does under the hood: a new model judges whether the loop is done, not the one that did the work.

In other words, the maker/checker split is applied even to the stopping condition itself.

Putting It All Together

Glue these together, and a single-threaded task becomes a small control panel.

Here’s one shape I’ve been using.

Every morning, an automation runs on the repo.

Its prompt invokes a triage skill. That skill reads yesterday’s CI failures, open issues, recent commits, and writes findings into a markdown file or a Linear board.

For each finding worth addressing, this thread opens an isolated worktree, spawns a sub-agent to draft a fix, and then spawns a second sub-agent to review that draft against project skills and existing tests.

Connectors let the loop open a PR and update the ticket.

Anything the loop couldn’t handle lands in my triage inbox.

The state file is the backbone of the whole thing. It remembers what was attempted, what passed, and what’s still open. So the next morning’s run picks up from where today left off.

Look at what you actually did.

You designed it once.

You didn’t personally prompt any of those steps.

That’s what Steinberger’s point looks like in practice. And it’s the same loop whether you’re in Codex or Claude Code, because these pieces are essentially the same pieces.

Loops Change the Work, But Don’t Remove You from It

Loops change the work, but they don’t remove you from it.

And as loops get better, three problems become sharper, not easier.

First, verification still falls on you.

A loop running unattended is also making mistakes unattended.

The whole reason you separate verifier sub-agent from maker is to give the loop’s “done” some meaning. Even so, “done” is only a claim, not proof.

I keep repeating the same line:

Your job is to deliver code that you have verified works.

Second, your understanding still rots if you let it.

The faster the loop delivers code you didn’t write yourself, the larger the gap between the system that actually exists and the system you actually understand.

A smooth loop only widens that gap faster, unless you actually read what it produces.

Third, the most comfortable posture may also be the most dangerous.

When the loop starts running itself, it’s easy to stop owning your own judgment and just accept whatever it gives you.

I call that a dangerous state.

Designing the loop with judgment is the antidote.

Designing the loop to avoid thinking is the accelerant.

The same action produces opposite results.

This Might Be a Preview of How Work Evolves

I think this is a preview of how our way of working is about to evolve.

But if I don’t personally review the code, or completely rely on automated loops to fix issues, my product quality will definitely drop. I’m likely to fall into a continuous downward spiral, digging myself into a deeper hole.

That said, go set up your loops.

But don’t forget that directly prompting your agents is still effective. The key is finding the right balance.

Loops also produce completely different results depending on the person using them.

Two people can build exactly the same loop and get opposite outcomes.

One person uses it to go faster on work they deeply understand.

Another uses it to avoid understanding the work itself.

The loop doesn’t know the difference.

But you do.

That’s why loop design is harder than prompt engineering, not easier.

Cherny’s point isn’t that the work got simpler.

It’s that the lever moved.

Build the loop.

But build it like someone who still intends to be an engineer, not like someone who just presses the “go” button.

Original article 👉 Addy Osmani@addyosmani·13h ArticleLoop Engineering. Loop engineering is replacing yourself as the person who prompts the agent. You design the system that does it instead. A loop here can be thought of a recursive goal where you define a purpose and…1726353.6K500K

@cellinlab: https://x.com/cellinlab/status/2064144608242679822

Loop Engineering

Automations

Worktrees

Skills

Connectors

Sub-agents

Putting It All Together

Loops Change the Work, But Don’t Remove You from It

This Might Be a Preview of How Work Evolves

Similar Articles

@freeman1266: https://x.com/freeman1266/status/2064702757773496552

@aronhouyu: Loop Engineering - Many people still use Claude Code, Codex, or Cursor like a chatbot: drop a prompt → wait for response → copy it out → fix bugs → drop a new prompt... cycle...

@jasonzhou1993: https://x.com/jasonzhou1993/status/2067937943545897143

@justloveabit: https://x.com/justloveabit/status/2070338139441484053

Submit Feedback

Similar Articles

@freeman1266: https://x.com/freeman1266/status/2064702757773496552

@aronhouyu: Loop Engineering - Many people still use Claude Code, Codex, or Cursor like a chatbot: drop a prompt → wait for response → copy it out → fix bugs → drop a new prompt... cycle...

@jasonzhou1993: https://x.com/jasonzhou1993/status/2067937943545897143

@justloveabit: https://x.com/justloveabit/status/2070338139441484053

@grapeot: Loop Engineering has become a buzzword lately, but what truly matters is not techniques like cron, worktree, or running multiple agents in parallel. These are useful but are merely implementation layers. The more fundamental shift is: we are encoding the second-order management operations of an AI Manager into the system…