<p>A few months ago I wrote about <a class="reference external" href="https://eli.thegreenplace.net/2026/rewriting-pycparser-with-the-help-of-an-llm/">using LLM agents to help restructuring one of my
Python projects</a>.
It's worth beginning by saying that the
rewrite has been successful by all reasonable measures; I've been able to
continue maintaining that project since then without an issue.</p>
<p>In this post, I want to discuss another project I've recently completed with
significant help from agents: <a class="reference external" href="https://eli.thegreenplace.net/2026/watgo-a-webassembly-toolkit-for-go/">watgo</a>. In
this project many things are different; most notably, it's a from-scratch
project rather than a rewrite, and it uses a different programming language
(Go). This post describes my experience working on the project, and some lessons
learned along the way.</p>
<div class="section" id="the-process">
<h2>The process</h2>
<p>This is a new project, so it required extensive design. I began by iterating on
the design with the agent, with a sketch of the API. For this purpose, I
recommend using a Markdown file <a class="reference external" href="https://github.com/eliben/watgo/blob/main/doc/notes.md">committed into the repository</a>
for future reference.</p>
<p>After that, I started asking the agent to write CLs <a class="footnote-reference" href="#footnote-1" id="footnote-reference-1">[1]</a> in a logical order that
made sense to me, keeping them small
and reviewable (more on this in the next section). Sometimes it's not easy to
have a small CL, and multiple rounds of revision may confuse the agent;
in this case, I commit the CL and then go back and ask the agent to modify
or refactor the code, as much as needed, with separate CLs. In the worst case,
the whole sequence can be reverted if I feel we've taken the wrong direction
(branches could also be helpful here for more complicated scenarios).</p>
<p>This point is worth reiterating: sometimes a single CL is a huge step forward,
but requires lots of review, cleanup and refactoring to be viable. I've had
multiple instances where an agent produced several days of work in a single
CL, but I then spent hours instructing it to clean up and refactor. Overall,
it's still a productivity gain, just not as much as some pundits would like us
to believe.</p>
</div>
<div class="section" id="keeping-the-human-in-the-loop">
<h2>Keeping the human in the loop</h2>
<p>Given the current state of agent capabilities, I think it's worth splitting
projects into two categories:</p>
<ol class="arabic simple">
<li>Low importance / prototype / throw away projects where deep code
understanding is unnecessary. These can be "vibe-coded" (submitting agent
code without even reviewing it).</li>
<li>High importance projects that I actually want to maintain; here, vibe-coding
is ill advised and I insist on reviewing and guiding all code the agent
writes before it's submitted (or shortly after, as discussed above).</li>
</ol>
<p>The <tt class="docutils literal">watgo</tt> projects is a clear example of (2): I certainly intend to maintain
this project in the long term, so I insist on code that I understand. With very
few exceptions, no code gets in without full review and often multiple rounds
of revisions.</p>
<p>Even if the cost for writing code went down, maintaining a project is so much
more than that. It's triaging and fixing bugs, it's thinking through what needs
to be done rather than how to do it, it's keeping the code healthy over time,
and so on. As <a class="reference external" href="https://www.goodreads.com/quotes/273375-everyone-knows-that-debugging-is-twice-as-hard-as-writing">Brian Kernighan said</a>:</p>
<blockquote>
Everyone knows that debugging is twice as hard as writing a program in the
first place. So if you're as clever as you can be when you write it, how will
you ever debug it?</blockquote>
<p>Maybe at some point agents will become good enough that projects in category
(2) can be implemented and maintained completely autonomously. Maybe. But
we're certainly not there yet. My hunch is that getting there will require
crossing the AGI line <a class="footnote-reference" href="#footnote-2" id="footnote-reference-2">[2]</a>, after which little in our world remains certain.</p>
<div class="section" id="practical-workflow">
<h3>Practical workflow</h3>
<p>If you're using an agent to send an actual PR and only review <em>that</em>, it's
difficult to be disciplined enough to actually perform a thorough review. I find
the following method to be more reliable:</p>
<p>I use a CLI agent running locally in my repository, and ask it to update the
code there. In parallel, I have a VSCode window open in the same project, where
I can:</p>
<ol class="arabic simple">
<li>Review the agent's changes using VSCode's diff view</li>
<li>Make my own tweaks and code changes if needed</li>
</ol>
<p>Once I'm pleased with the change, I manually create a commit.</p>
</div>
</div>
<div class="section" id="keeping-the-cls-small">
<h2>Keeping the CLs small</h2>
<p>As mentioned above, it's imperative to keep making progress in small chunks,
with small enough CLs that a human can fully understand in a single review. It's
very tempting to sprint ahead submitting thousands of lines of code every day,
but this temptation has to be avoided. Coding with an agent is like
speed-reading; yes, you're making more progress, but comprehension suffers
the faster you go.</p>
<p>Particularly for refactoring, agents still take the shortest route to
destination. It's important to guide them to think about the "big picture" at
all times, find all instances where X is better done as Y, not just a single
place noticed during a review. This is why it's sometimes OK to have
a CL submitted before you fully agree with everything, and go back to it later
for several refactoring rounds. Source control works amazingly well when
pair-coding with agents.</p>
</div>
<div class="section" id="testing-strategy">
<h2>Testing strategy</h2>
<p>It's a key point discussed in every "how to succeed with AI" article, but
still critical enough to reiterate here: a solid testing strategy is absolutely
crucial for success. Agents produce - by far - the best results when they have
a solid test suite to test their code against.</p>
<p>With the <a class="reference external" href="https://github.com/eliben/pycparser">pycparser</a> rewrite, I had
a large existing test suite. For <a class="reference external" href="https://github.com/eliben/watgo">watgo</a>,
the very first thing I did was think through how to adapt the test suites of
the <a class="reference external" href="https://github.com/WebAssembly/spec/">WASM spec</a> and of the
<a class="reference external" href="https://github.com/WebAssembly/wabt">wabt project</a> for my needs.</p>
<p>If your project doesn't have such tests to rely on, this should be your first
order of business - finding one, or building one from scratch. Beware of
self-reinforcing loops though; it's dangerous to trust agents for both the
tests and the implementations tested against them.</p>
</div>
<div class="section" id="language-choice-go-for-agent-written-projects">
<h2>Language choice - Go for agent-written projects</h2>
<p>Go is a fantastic language for agents to write, because it's designed to be
very readable by humans. The biggest strengths of Go
are exactly what makes the experience of reviewing agent code so positive:</p>
<ul class="simple">
<li>Go changes very infrequently, so you don't have to wonder "are we using the
most modern / idiomatic approach" or "what the hell is this construct"
as often as with other languages (looking at you, Python and TypeScript).</li>
<li>There are relatively few ways to accomplish the same thing in Go, further
lowering the mental burden.</li>
<li>The standard library is rich and there's much less need to keep abreast of
the package-everyone-uses du jour.</li>
<li>In general, Go is designed for readability, with a mild-but-still-strong type
system, uniform formatting, explicit error propagation and opinionated choices
already made for you.</li>
</ul>
<p>Since most of the time spent by humans when using agents is <em>reading</em> rather
than <em>writing</em> code, these effects compound and produce a great experience.
Recall the discussion of how some languages are optimized for writability (Perl)
while others are optimized for readability (Go)? Well, when working on a project
with an agent we live in a world of 99% reading vs. 1% writing, so this really
matters.</p>
<p>I find this aspect really crucial in light of the earlier points made in this
post - namely, keeping the human in the loop by understanding and reviewing
all of the agent's design choices and code.</p>
</div>
<div class="section" id="final-thoughts">
<h2>Final thoughts</h2>
<p>If you're working on a subject that's completely new to you, I would strongly
recommend <em>against</em> the approach described in this post. To really learn
something, you have to work through it from scratch, yourself, reading,
designing, writing the code. Agents don't change this basic fact; even before
agents, if you wanted to learn X, copying it from Stack Overflow or some other
project clearly wasn't the right way to go. Similarly, while agents can be used
as a prop for learning, they cannot learn <em>for you</em>.</p>
<p>As a corollary, junior engineers should exercise <em>extreme caution</em> when relying
on LLMs. There's no replacement to hard-won experience and the sweat and tears
of learning new, challenging topics. Learning is supposed to be hard; if it's
too easy, you're probably not learning.</p>
<p>For senior engineers, agents are a boon; it's a great tool to increase
productivity, avoid the boring stuff, and get unstuck from procrastination; but
only when used judiciously.</p>
<hr class="docutils" />
<table class="docutils footnote" frame="void" id="footnote-1" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-1">[1]</a></td><td>CL stands for Changelist, also known as a "patch" or a "diff" - basically
a standalone commit that touches one or more files. This term originates
from the source control systems Perforce and Subversion.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="footnote-2" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-2">[2]</a></td><td>Programming is the ultimate realization of thought; if machines can
design, produce, maintain and understand code better than humans, it
means they can start improving themselves, which is the definition of
<a class="reference external" href="https://en.wikipedia.org/wiki/Technological_singularity">singularity</a>.</td></tr>
</tbody>
</table>
</div>
# Thoughts on starting new projects with LLM agents
Source: [https://eli.thegreenplace.net/2026/thoughts-on-starting-new-projects-with-llm-agents](https://eli.thegreenplace.net/2026/thoughts-on-starting-new-projects-with-llm-agents)
A few months ago I wrote about[using LLM agents to help restructuring one of my Python projects](https://eli.thegreenplace.net/2026/rewriting-pycparser-with-the-help-of-an-llm/)\. It's worth beginning by saying that the rewrite has been successful by all reasonable measures; I've been able to continue maintaining that project since then without an issue\.
In this post, I want to discuss another project I've recently completed with significant help from agents:[watgo](https://eli.thegreenplace.net/2026/watgo-a-webassembly-toolkit-for-go/)\. In this project many things are different; most notably, it's a from\-scratch project rather than a rewrite, and it uses a different programming language \(Go\)\. This post describes my experience working on the project, and some lessons learned along the way\.
## The process
This is a new project, so it required extensive design\. I began by iterating on the design with the agent, with a sketch of the API\. For this purpose, I recommend using a Markdown file[committed into the repository](https://github.com/eliben/watgo/blob/main/doc/notes.md)for future reference\.
After that, I started asking the agent to write CLs[\[1\]](https://eli.thegreenplace.net/2026/thoughts-on-starting-new-projects-with-llm-agents#footnote-1)in a logical order that made sense to me, keeping them small and reviewable \(more on this in the next section\)\. Sometimes it's not easy to have a small CL, and multiple rounds of revision may confuse the agent; in this case, I commit the CL and then go back and ask the agent to modify or refactor the code, as much as needed, with separate CLs\. In the worst case, the whole sequence can be reverted if I feel we've taken the wrong direction \(branches could also be helpful here for more complicated scenarios\)\.
This point is worth reiterating: sometimes a single CL is a huge step forward, but requires lots of review, cleanup and refactoring to be viable\. I've had multiple instances where an agent produced several days of work in a single CL, but I then spent hours instructing it to clean up and refactor\. Overall, it's still a productivity gain, just not as much as some pundits would like us to believe\.
## Keeping the human in the loop
Given the current state of agent capabilities, I think it's worth splitting projects into two categories:
1. Low importance / prototype / throw away projects where deep code understanding is unnecessary\. These can be "vibe\-coded" \(submitting agent code without even reviewing it\)\.
2. High importance projects that I actually want to maintain; here, vibe\-coding is ill advised and I insist on reviewing and guiding all code the agent writes before it's submitted \(or shortly after, as discussed above\)\.
Thewatgoprojects is a clear example of \(2\): I certainly intend to maintain this project in the long term, so I insist on code that I understand\. With very few exceptions, no code gets in without full review and often multiple rounds of revisions\.
Even if the cost for writing code went down, maintaining a project is so much more than that\. It's triaging and fixing bugs, it's thinking through what needs to be done rather than how to do it, it's keeping the code healthy over time, and so on\. As[Brian Kernighan said](https://www.goodreads.com/quotes/273375-everyone-knows-that-debugging-is-twice-as-hard-as-writing):
> Everyone knows that debugging is twice as hard as writing a program in the first place\. So if you're as clever as you can be when you write it, how will you ever debug it?
Maybe at some point agents will become good enough that projects in category \(2\) can be implemented and maintained completely autonomously\. Maybe\. But we're certainly not there yet\. My hunch is that getting there will require crossing the AGI line[\[2\]](https://eli.thegreenplace.net/2026/thoughts-on-starting-new-projects-with-llm-agents#footnote-2), after which little in our world remains certain\.
### Practical workflow
If you're using an agent to send an actual PR and only review*that*, it's difficult to be disciplined enough to actually perform a thorough review\. I find the following method to be more reliable:
I use a CLI agent running locally in my repository, and ask it to update the code there\. In parallel, I have a VSCode window open in the same project, where I can:
1. Review the agent's changes using VSCode's diff view
2. Make my own tweaks and code changes if needed
Once I'm pleased with the change, I manually create a commit\.
## Keeping the CLs small
As mentioned above, it's imperative to keep making progress in small chunks, with small enough CLs that a human can fully understand in a single review\. It's very tempting to sprint ahead submitting thousands of lines of code every day, but this temptation has to be avoided\. Coding with an agent is like speed\-reading; yes, you're making more progress, but comprehension suffers the faster you go\.
Particularly for refactoring, agents still take the shortest route to destination\. It's important to guide them to think about the "big picture" at all times, find all instances where X is better done as Y, not just a single place noticed during a review\. This is why it's sometimes OK to have a CL submitted before you fully agree with everything, and go back to it later for several refactoring rounds\. Source control works amazingly well when pair\-coding with agents\.
## Testing strategy
It's a key point discussed in every "how to succeed with AI" article, but still critical enough to reiterate here: a solid testing strategy is absolutely crucial for success\. Agents produce \- by far \- the best results when they have a solid test suite to test their code against\.
With the[pycparser](https://github.com/eliben/pycparser)rewrite, I had a large existing test suite\. For[watgo](https://github.com/eliben/watgo), the very first thing I did was think through how to adapt the test suites of the[WASM spec](https://github.com/WebAssembly/spec/)and of the[wabt project](https://github.com/WebAssembly/wabt)for my needs\.
If your project doesn't have such tests to rely on, this should be your first order of business \- finding one, or building one from scratch\. Beware of self\-reinforcing loops though; it's dangerous to trust agents for both the tests and the implementations tested against them\.
## Language choice \- Go for agent\-written projects
Go is a fantastic language for agents to write, because it's designed to be very readable by humans\. The biggest strengths of Go are exactly what makes the experience of reviewing agent code so positive:
- Go changes very infrequently, so you don't have to wonder "are we using the most modern / idiomatic approach" or "what the hell is this construct" as often as with other languages \(looking at you, Python and TypeScript\)\.
- There are relatively few ways to accomplish the same thing in Go, further lowering the mental burden\.
- The standard library is rich and there's much less need to keep abreast of the package\-everyone\-uses du jour\.
- In general, Go is designed for readability, with a mild\-but\-still\-strong type system, uniform formatting, explicit error propagation and opinionated choices already made for you\.
Since most of the time spent by humans when using agents is*reading*rather than*writing*code, these effects compound and produce a great experience\. Recall the discussion of how some languages are optimized for writability \(Perl\) while others are optimized for readability \(Go\)? Well, when working on a project with an agent we live in a world of 99% reading vs\. 1% writing, so this really matters\.
I find this aspect really crucial in light of the earlier points made in this post \- namely, keeping the human in the loop by understanding and reviewing all of the agent's design choices and code\.
## Final thoughts
If you're working on a subject that's completely new to you, I would strongly recommend*against*the approach described in this post\. To really learn something, you have to work through it from scratch, yourself, reading, designing, writing the code\. Agents don't change this basic fact; even before agents, if you wanted to learn X, copying it from Stack Overflow or some other project clearly wasn't the right way to go\. Similarly, while agents can be used as a prop for learning, they cannot learn*for you*\.
As a corollary, junior engineers should exercise*extreme caution*when relying on LLMs\. There's no replacement to hard\-won experience and the sweat and tears of learning new, challenging topics\. Learning is supposed to be hard; if it's too easy, you're probably not learning\.
For senior engineers, agents are a boon; it's a great tool to increase productivity, avoid the boring stuff, and get unstuck from procrastination; but only when used judiciously\.
---
[\[1\]](https://eli.thegreenplace.net/2026/thoughts-on-starting-new-projects-with-llm-agents#footnote-reference-1)CL stands for Changelist, also known as a "patch" or a "diff" \- basically a standalone commit that touches one or more files\. This term originates from the source control systems Perforce and Subversion\.[\[2\]](https://eli.thegreenplace.net/2026/thoughts-on-starting-new-projects-with-llm-agents#footnote-reference-2)Programming is the ultimate realization of thought; if machines can design, produce, maintain and understand code better than humans, it means they can start improving themselves, which is the definition of[singularity](https://en.wikipedia.org/wiki/Technological_singularity)\.
The author reviews three months of experience using multi-agent collaboration, summarizing five main pain points (such as conflicts between agents, ignoring boundary conditions, self-censorship failure, difficulty in merging decisions, and exposing harder problems after compressed execution) and two insights (the high value of read-only review agents, and that agent conflicts expose ambiguous requirements), emphasizing the core decision-making role of humans in AI collaboration.
gwern proposed the 'Guardian Angel' approach, advocating for training an LLM digital twin that imitates the user themselves, in order to solve the principal-agent problem and security risks of general AI assistants, and provided a complete roadmap from alignment theory to technical implementation.
A staff engineer describes how LLM agents have evolved by 2026 to become reliable collaborators for coding, debugging, and codebase research, while humans retain responsibility for judgment and review.
A developer reflects on using AI agents and questions whether the apparent productivity gains are genuine or merely performative, noting that while tasks are completed faster, deep understanding and real value may be lost.
A developer with 45 years of experience is building a local-first harness for LLMs with multi-agent logic, soon to be open-sourced on GitHub, and asks the community what features would improve their local LLM experience.