@shmidtqq: https://x.com/shmidtqq/status/2068704187492221405

X AI KOLs Timeline Tools

Summary

An in-depth guide to loop engineering for AI coding agents, explaining how to build automated loops that repeatedly prompt agents, verify results, and avoid runaway costs, illustrated with a case study of one engineer shipping 259 PRs in a month.

https://t.co/wXfbMqws2J
Original Article
View Cached Full Text

Cached at: 06/22/26, 05:37 AM

Loop Engineering: How One Loop Ships 259 PRs a Month

Last December, one engineer shipped 259 finished code changes in a single month (in the jargon they are called PRs, pull requests). His AI wrote every one. He says he never opened a code editor.

Peter Steinberger put it in two lines that more than eight million people have seen:

Peter Steinberger @steipete·Jun 8Here’s your monthly reminder that you shouldn’t be prompting coding agents anymore.

You should be designing loops that prompt your agents.1.8K2.8K19K8.3M

The flip side showed up the same month. Someone else’s loop ran unwatched for 11 days and burned $47,000 before anyone noticed. So there are two skills, and only the first gets taught: build a loop that does the work, and build the brakes that keep it from driving you off a cliff.

The loop in one picture

A loop is a small program wrapped around an agent. The agent takes a step. Something checks the result. Not done? It takes the next step. Round and round, until your condition comes true or you stop it.

One question separates a real loop from a money pit: is there an honest way to know the work was done right? A test that passes or fails. A build that compiles or does not. No check, and the agent just agrees with itself in circles and sends you the bill.

Step 0. The 30-second test: do you need a loop

A loop pays off under four conditions. Miss one, and it costs more than it returns.

  • The task repeats at least weekly. Otherwise it is a one-time script, not a loop.

  • Verification is automated. A test, a linter (a code-style checker), a build. No check, and you are back to reading every change by hand.

  • Your budget can absorb waste. A loop re-reads context and explores, so it burns money even on empty runs.

  • The agent has an engineer’s tools. Logs, an environment to reproduce a failure, the ability to run its own code.

Anatomy: the 6 parts of any loop

Strip the noise and a working loop is six parts. You no longer build them by hand: they ship inside the tools and map the same way onto Claude Code and the OpenAI Codex app.

1. State: the only thing the next run inherits

A model forgets everything when a run ends. The chat’s memory dies with it, so the memory has to live on disk. In practice that is one file.

Treat the loop like a night shift. You are judged not by what it did at three in the morning, but by the note on your desk at nine. Design the note, and half the loop designs itself.

2. Automations: what turns a run into a loop

A loop becomes a loop when it starts on its own. In Codex that is the Automations tab: project, prompt, schedule. In Claude Code it is three separate commands, and this is where popular retellings get it wrong, so to be exact:

The key one is /goal. The agent that wrote the code does not get to grade itself. Write the stop condition like a contract: “all tests in test/auth pass” is a contract, “make it better” is how you keep a loop spinning until payday.

3. Worktrees: parallel agents without chaos

Two agents writing to one file are two engineers editing the same lines in silence. A git worktree is the wall between them: a separate folder on its own branch, sharing the history.

The caveat the tool will not mention: worktrees remove the collision, not the bottleneck. The bottleneck is you. However many agents you start, your review speed decides how many you can trust.

4. Skills: your intent, written once, on the outside

The agent starts from scratch and fills any gap with a confident guess. A skill is your intent, written where the model reads it every time. The format is the same everywhere: a folder with a SKILL.md file.

Without skills, the loop relearns your project from a blank page. With them, it gets smarter every morning.

5. Connectors: so the loop acts, not just talks

A loop that only sees files is a toy. Connectors (the MCP standard) let the agent read your issue tracker, hit a staging server, post to chat.

This is the line between “here is how to fix it” and a loop that opens the pull request itself, links the ticket, and reports the tests green. Both tools speak MCP, so a connector for one usually drops into the other. Fastest payback: a code tracker, an issue tracker (Linear, Jira), chat (Slack), an error tracker.

6. Sub-agents: the maker apart from the checker

The most valuable move: split the one who writes from the one who checks. A model reviewing itself always gives a pass. A second agent, with different instructions and on a different model, catches what the first talked itself into.

One explores, one writes, one checks against the spec. That is what /goal runs under the hood. A second opinion costs money, since each agent runs its own model. Spend it where being wrong is expensive.

One whole loop

Put the six parts together and a single thread becomes a machine.

  • An automation runs each morning on the project.

  • The triage skill reads the overnight failures, issues, and commits and writes findings to STATUS.md.

  • For each finding, a separate worktree opens.

  • One sub-agent writes the fix, a second checks it against your tests.

  • Connectors open the pull request and update the ticket.

  • Whatever failed waits in the triage inbox. STATUS.md remembers the rest.

You wake up not to a wall of logs, but to a short note:

Brakes go on before the engine

That $47,000 bill was not an evil AI. It was two agents politely asking each other for more work: no step limit, no money ceiling, no stop condition. One cap would have killed it on day one.

The core rule: scope the loop by what it can break, not by what you want it to do. Blast radius first, task second. And price it honestly:

The technology is the same in all three cases. The only difference is whether the brakes are on.

The four deaths of a loop

  • Runaway recursion. Two agents feed each other work forever. Cure: a step cap and a money ceiling.

  • Silent death. The context filled up, the loop stalled, but it looks alive and reports “progress.” Cure: a heartbeat and a fresh context per phase.

  • Walking in circles. With no verifiable condition, the loop drifts from the goal. Green tests are a point it can reach. “Looks done” is not.

  • Comprehension debt. The faster the loop writes code you never read, the further you drift from your own project. Cure: a mandatory “a human read this” step you never skip.

A bonus with a name: the “Ralph loop.” The agent signals “done” too early and exits halfway. It happens where there is no real checker and no hard stop. A useful self-check metric: count not tokens, but the cost per accepted change. Below half accepted, and the loop runs at a loss.

Ordinary budget: 5 cheaper plays

The loop is built for people who do not pay for tokens. Here is how to get most of the value for less.

  • Spend the thinking before the tokens. “Build me auth” spins for 30 steps of guessing. A finished spec (login method, token lifetime, error states, definition of done) hits the target in one pass.

  • Plan cheap, execute expensive. Your top model should not read files. Give reading and planning to a small model, leave the hard part to the strong agent.

  • Turn on caching. The loop sends the same context every step, and caching is exactly what makes the repeat cheaper. A 15-minute change cuts repeated input by about 90%.

  • Engineer the context. The rule from the Devin team: delegate the reading, centralize the writing. Send the bulky search to a cheap sub-agent, the main one reads a short summary, not forty files.

  • Be the loop yourself. Three passes you steer beat thirty run alone. If you automate, cap it by step count.

When a loop is worth it: the finish line is green tests, the run is bounded, and an hour of your time beats the tokens. Overnight refactors and large backlogs you cannot clear by hand are exactly that shape. But that is a narrow slice of the work.

Build the loop. Stay the engineer

Two people build the same loop and get opposite results. One speeds up on work they understand to the bone. The other stops understanding the work at all. The loop cannot tell them apart. You can.

The leverage moved: from the prompt to the loop, from typing to judgment. That is a harder job, not a softer one.

So the move is small, on purpose. Tomorrow morning, take the most boring job you still do by hand: triaging failures, closing stale issues, chasing a flaky test. Wrap one capped loop around it. Brakes first. Small enough that you read every change.

Nobody who ships two hundred changes a month started with a hundred agents. They started with one loop they trusted. Build that one.

Similar Articles

@jasonzhou1993: https://x.com/jasonzhou1993/status/2067937943545897143

X AI KOLs Timeline

Loop engineering is the practice of designing systems where AI agents autonomously decide what to work on, execute, and iterate, going beyond manual prompting by building outer loops that compound across different domains. The article explains the two-layer agent harness and how sharing artifacts between loops creates compounding learning.

@omarsar0: https://x.com/omarsar0/status/2068008743153832264

X AI KOLs Following

The article explains the shift from manually prompting coding agents to designing automated loops that prompt them, detailing what these loops are, their historical evolution, and the components needed to build them in production.