@freeman1266: https://x.com/freeman1266/status/2064702757773496552

X AI KOLs Timeline 06/10/26, 01:34 PM Tools

loop-engineering claude-code codex agent-automation ai-programming development-workflow automation

Summary

This article introduces the concept of Loop Engineering, which involves designing automated systems that allow AI agents to work in autonomous loops, including elements such as automated tasks, work trees, skills, plugins, and sub-agents, thereby replacing manual prompting and improving development efficiency.

https://t.co/MqDFz5d3VY

Original Article

View Cached Full Text

Cached at: 06/10/26, 05:55 PM

What is Loop Engineering

At Sequoia Capital’s AI Ascent 2026, Claude Code’s Boris Cherny said something striking: “I don’t prompt Claude anymore. I have running loops that prompt Claude and decide what to do next. My job is to write the loop.”

He submits 150 PRs a day, all from his phone, without writing a single line of code himself.

This isn’t about showing off output volume. His point is that the entry point for writing code is no longer in the prompt.

Loop Engineering Debuts

For the past year, the way to get results from AI coding tools has been the same: you write a prompt, feed enough context, wait for the output, read the response, then write the next prompt. The agent is a tool you always hold in your hands, round after round.

If you’re still doing it this way, you’re already behind.

The new approach: you build a small system that discovers work, distributes work, checks results, records progress, and decides next steps. The system calls the agents — you no longer touch them directly.

This is Loop Engineering.

Google Chrome Engineering Director Addy Osmani puts it more bluntly: Loop Engineering means moving yourself out of the position of “the person who prompts the agent” and instead designing a system that does that for you.

At its core, a loop is a recursive goal — you define what you want, and the AI iterates until completion.

What a Loop Needs

A loop is roughly made of five pieces, and both Claude Code and Codex now have them all.

1. Automated Tasks — the heartbeat

With this, the loop runs itself, instead of you manually running it once then stopping.

Claude Code provides two core primitives:

/loop: runs repeatedly at a cadence, for periodic checks
/goal: runs until your written condition is actually met — after each round, a separate small model judges whether it’s done, so the agent writing the code isn’t the one grading it

/loop 30m /goal All tests in test/auth pass and lint is clean. Scan src/auth for new failures, propose fixes in claude/auth-fixes, open draft PR when goal condition holds.

This is the heart of loop engineering — separating the creator from the verifier — except here it’s applied to “when is it done.”

2. Worktrees — parallelism without chaos

As soon as you run multiple agents at once, files start conflicting. Two agents writing to the same file is exactly like two engineers committing to the same line of code without telling each other.

git worktree solves this: each gets an independent working directory, each on its own branch, sharing the same repository history. One agent’s changes don’t touch another’s checkout.

Claude Code provides --worktree flag and isolation: worktree on sub-agents to provide the same isolation — each helper agent gets a clean checkout and cleans up automatically when done.

3. Skills — no need to re-explain your project every time

Skills address this awkwardness: you don’t have to explain your project from scratch every time you start a new session, like you have amnesia.

It’s a folder with a SKILL.md containing instructions and metadata, plus optional scripts, reference materials, and assets.

Without Skills, each loop round has to rediscover your project; once written down, that knowledge accumulates. Conventions, build steps, “we don’t do that because of a past incident” — write it once, the agent reads it every time.

name: ci-triage
description: Classify CI failures (environment/flaky/real bug/dependency/infrastructure),
  draft fixes for simple issues, escalate the rest.
  Triggered on workflow run failure or during morning triage loop.

Classification rules:

- env: missing secrets, wrong env vars → manual
- flake: passes on retry, no code change → archive after one retry
- bug: deterministic failure related to recent commits → draft fix
- dependency: failure related to version upgrade → draft rollback

4. Plugins & Connectors — loops reach your real tools

A loop that only sees the filesystem is a miniature loop.

Connectors based on MCP let the agent read your issue tracker, query the database, call test APIs, and send messages to Slack.

The difference between “an agent that can say ‘here’s a fix’” and “a loop that opens a PR, links a Linear ticket, and chirps in the channel when CI goes green” is this layer.

The quickest returns come from: GitHub → Linear/Jira → Slack → Sentry.

5. Sub-agents — keep creators away from verifiers

The single most useful structural design in a loop: separate the code writer from the code verifier.

The model that writes the code grades its own homework too leniently. A second agent with different instructions — sometimes even a different model — catches problems the first agent convinced itself to ignore.

Claude Code uses sub-agents in .claude/agents/ and agent teams that pass work between them. Common division:

Explorer Agent → Implementer Agent → Verifier Agent (checks against spec)

Sub-agents burn more tokens, so use them where a second opinion truly matters.

A Sixth Thing: Memory

A Markdown file, or a Linear board — anything that lives outside a single conversation and remembers “what’s done, what’s left.”

It sounds too simple to mention, but every long-running agent relies on this. The model forgets everything on restart, so memory must live on disk, not in context. The agent forgets; the repository doesn’t.

Loop state · ci-triage

Last run
2026-06-09 03:30 UTC · 7 failures classified, 3 fixes drafted, 4 escalated

In progress
· claude/fix-auth-token-refresh — local tests pass, waiting on CI
· claude/fix-flaky-payment-webhook — retry pattern applied, monitoring

Done today
· claude/bump-axios-1.7.4 → merged (CI passed)

Escalated to human
· src/billing/refund.ts — tests fail in three ways, root cause unclear

Lessons learned (write here, not in chat)
· 2026-06-08: PowerShell has TLS 1.2 issues on this Windows runner, switched to bash

What a Loop Actually Looks Like

Put these five pieces together, and a conversation thread becomes a small console:

Every morning, an automated task runs on the repository. Its prompt calls the triage Skill, reads yesterday’s CI failures, open issues, recent commits, and writes findings into the state file.

For each finding worth acting on, the loop opens an isolated Worktree, assigns a sub-agent to draft a fix, then assigns a second sub-agent to review that draft against the project Skills and existing tests.

Connectors let it open PRs and update tickets. Anything the loop can’t handle falls into the triage inbox, waiting for you.

What did you do? You designed it, once. You didn’t prompt a single step in between.

Check Before You Build a Loop

Not every task fits a loop. Only go for it if all four conditions are met:

Condition	Explanation
Task repeats at least weekly	Otherwise setup cost never amortizes
Verification can be automated	Tests, type checks, builds, or linters can judge correctness automatically
Token budget is sufficient	Loops re-read context, retry, explore — more expensive than single conversations
Agent tooling is complete	Can see logs, run code, see runtime results

Good for loops: CI failure triage, dependency upgrade PRs, lint fixes, issue-to-PR drafts.

Not good: architecture rewrites, core code like auth/payments, production deployments, “make it look better” tasks requiring human judgment.

Three Things Loops Can’t Help You With

Verification is still your job. A loop running unattended is also a loop making mistakes unattended. It says “done” — that’s its claim, not the truth.

Understanding debt compounds faster. The faster loops ship code, the more code you haven’t written yourself, and the gap between “what’s in the codebase” and “what you truly understand” widens. If you don’t read what it produces, you’re taking out understanding debt at compound interest.

The most insidious is cognitive laziness. Once a loop runs on its own, it’s easy to stop exercising judgment and just accept whatever it returns. Designing a loop with your judgment makes it a helper; using it to avoid thinking makes it an accomplice.

The Leverage Point Has Shifted

Boris isn’t saying the work got easier. The work didn’t get easier — it just moved: from how to write prompts to how to design loops.

Two people can build identical loops and get opposite results. One uses it to accelerate work they already understand deeply. The other uses it to skip the “understanding deeply” part. The loop can’t tell the difference. You can.

So loop design is actually harder than prompt engineering.

A Few Different Takes

This narrative is smooth — smooth enough to poke a few holes in.

First, that “150 PRs a day from a phone.” That’s a tool author’s achievement — he has the complete mental model of Claude Code in his head, and the repo he’s running loops on is likely clean and tuned for loops. Using the throughput of a creator in an ideal environment to set expectations for your ten-year-old legacy codebase isn’t helpful. What actually holds most people back isn’t “can I write a loop” — it’s “does my repo deserve a loop.” For most enterprise codebases, the answer is no.

Second, the workload. The article implies loops save you work, but they really just shift it. You stop writing prompts, but you start maintaining Skills, tuning connectors, building verification scaffolding, and watching state files. “You only designed it once” never holds in reality — Skills go stale, connectors break, verification conditions get bypassed by the next requirement, state files slowly drift from the code. For teams whose tasks aren’t repetitive enough and are hard to verify automatically (most teams), the setup cost never amortizes. You’ve just traded prompt-tuning hours for loop-tuning hours.

Third, the “separate creator and verifier” design. I’m not that convinced. The problem is both sides are LLMs. The second agent isn’t an independent auditor — it’s a highly correlated one: same training data, same blind spots, same overconfidence. It can catch simple typos, but when it comes to “the whole direction is wrong” type systemic issues, both models are likely to nod and approve together. Having someone who makes the same mistakes review someone else who makes the same mistakes — that’s not quality assurance.

Verification itself deserves more thought. Passing automated checks doesn’t mean correct. Once the stop condition is “tests pass, lint clean,” Goodhart’s law kicks in: the loop will find every shortcut to turn tests green — loosen assertions, water down mocks, swallow exceptions with try/catch. You think you’re measuring “is the code right,” but you’re measuring “did the checker shut up.” The more mechanical the condition, the better the loop gets at satisfying it without solving the problem.

The most uncomfortable part: the engineering that’s actually valuable is exactly what loops can’t do. Judgment, architecture, trade-offs among bad options — the loops themselves explicitly exclude these. So they industrialize the busywork, generating a pile of low-value PRs waiting for human review. Once the speed of writing code exceeds the speed of reading code, the bottleneck shifts from “writing” to “reading,” and review capacity doesn’t magically increase because you built a loop. In the end, you’re probably not faster — you’re drowning in the merge queue your own loop generated.

Loops are great. But they only reward people who already know what they’re doing. Expect them to think for you, and they’ll happily think wrong alongside you.

@freeman1266: https://x.com/freeman1266/status/2064702757773496552

What is Loop Engineering

Loop Engineering Debuts

What a Loop Needs

A Sixth Thing: Memory

What a Loop Actually Looks Like

Check Before You Build a Loop

Three Things Loops Can’t Help You With

The Leverage Point Has Shifted

A Few Different Takes

Similar Articles

@king1818888: https://x.com/king1818888/status/2073999140770775117

@cellinlab: https://x.com/cellinlab/status/2064144608242679822

@justloveabit: https://x.com/justloveabit/status/2070338139441484053

@jasonzhou1993: https://x.com/jasonzhou1993/status/2067937943545897143

Submit Feedback

Similar Articles

@king1818888: https://x.com/king1818888/status/2073999140770775117

@cellinlab: https://x.com/cellinlab/status/2064144608242679822

@Xudong07452910: Open source project recommendation: loop-engineering — a practical framework that gives your AI coding agent self-looping and intelligent orchestration capabilities. loop-engineering is a very popular concept right now, offering practical patterns, starters, and CLI tools to help developers design systems…

@justloveabit: https://x.com/justloveabit/status/2070338139441484053

@jasonzhou1993: https://x.com/jasonzhou1993/status/2067937943545897143