Thoughts on starting new projects with LLM agents

Eli Bendersky News

Summary

基于作者使用LLM代理从零开始构建Go项目watgo的经验,讨论了在项目中有效利用AI代理的方法,强调了保持人工审查和指导的重要性。

<p>A few months ago I wrote about <a class="reference external" href="https://eli.thegreenplace.net/2026/rewriting-pycparser-with-the-help-of-an-llm/">using LLM agents to help restructuring one of my Python projects</a>. It's worth beginning by saying that the rewrite has been successful by all reasonable measures; I've been able to continue maintaining that project since then without an issue.</p> <p>In this post, I want to discuss another project I've recently completed with significant help from agents: <a class="reference external" href="https://eli.thegreenplace.net/2026/watgo-a-webassembly-toolkit-for-go/">watgo</a>. In this project many things are different; most notably, it's a from-scratch project rather than a rewrite, and it uses a different programming language (Go). This post describes my experience working on the project, and some lessons learned along the way.</p> <div class="section" id="the-process"> <h2>The process</h2> <p>This is a new project, so it required extensive design. I began by iterating on the design with the agent, with a sketch of the API. For this purpose, I recommend using a Markdown file <a class="reference external" href="https://github.com/eliben/watgo/blob/main/doc/notes.md">committed into the repository</a> for future reference.</p> <p>After that, I started asking the agent to write CLs <a class="footnote-reference" href="#footnote-1" id="footnote-reference-1">[1]</a> in a logical order that made sense to me, keeping them small and reviewable (more on this in the next section). Sometimes it's not easy to have a small CL, and multiple rounds of revision may confuse the agent; in this case, I commit the CL and then go back and ask the agent to modify or refactor the code, as much as needed, with separate CLs. In the worst case, the whole sequence can be reverted if I feel we've taken the wrong direction (branches could also be helpful here for more complicated scenarios).</p> <p>This point is worth reiterating: sometimes a single CL is a huge step forward, but requires lots of review, cleanup and refactoring to be viable. I've had multiple instances where an agent produced several days of work in a single CL, but I then spent hours instructing it to clean up and refactor. Overall, it's still a productivity gain, just not as much as some pundits would like us to believe.</p> </div> <div class="section" id="keeping-the-human-in-the-loop"> <h2>Keeping the human in the loop</h2> <p>Given the current state of agent capabilities, I think it's worth splitting projects into two categories:</p> <ol class="arabic simple"> <li>Low importance / prototype / throw away projects where deep code understanding is unnecessary. These can be &quot;vibe-coded&quot; (submitting agent code without even reviewing it).</li> <li>High importance projects that I actually want to maintain; here, vibe-coding is ill advised and I insist on reviewing and guiding all code the agent writes before it's submitted (or shortly after, as discussed above).</li> </ol> <p>The <tt class="docutils literal">watgo</tt> projects is a clear example of (2): I certainly intend to maintain this project in the long term, so I insist on code that I understand. With very few exceptions, no code gets in without full review and often multiple rounds of revisions.</p> <p>Even if the cost for writing code went down, maintaining a project is so much more than that. It's triaging and fixing bugs, it's thinking through what needs to be done rather than how to do it, it's keeping the code healthy over time, and so on. As <a class="reference external" href="https://www.goodreads.com/quotes/273375-everyone-knows-that-debugging-is-twice-as-hard-as-writing">Brian Kernighan said</a>:</p> <blockquote> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?</blockquote> <p>Maybe at some point agents will become good enough that projects in category (2) can be implemented and maintained completely autonomously. Maybe. But we're certainly not there yet. My hunch is that getting there will require crossing the AGI line <a class="footnote-reference" href="#footnote-2" id="footnote-reference-2">[2]</a>, after which little in our world remains certain.</p> <div class="section" id="practical-workflow"> <h3>Practical workflow</h3> <p>If you're using an agent to send an actual PR and only review <em>that</em>, it's difficult to be disciplined enough to actually perform a thorough review. I find the following method to be more reliable:</p> <p>I use a CLI agent running locally in my repository, and ask it to update the code there. In parallel, I have a VSCode window open in the same project, where I can:</p> <ol class="arabic simple"> <li>Review the agent's changes using VSCode's diff view</li> <li>Make my own tweaks and code changes if needed</li> </ol> <p>Once I'm pleased with the change, I manually create a commit.</p> </div> </div> <div class="section" id="keeping-the-cls-small"> <h2>Keeping the CLs small</h2> <p>As mentioned above, it's imperative to keep making progress in small chunks, with small enough CLs that a human can fully understand in a single review. It's very tempting to sprint ahead submitting thousands of lines of code every day, but this temptation has to be avoided. Coding with an agent is like speed-reading; yes, you're making more progress, but comprehension suffers the faster you go.</p> <p>Particularly for refactoring, agents still take the shortest route to destination. It's important to guide them to think about the &quot;big picture&quot; at all times, find all instances where X is better done as Y, not just a single place noticed during a review. This is why it's sometimes OK to have a CL submitted before you fully agree with everything, and go back to it later for several refactoring rounds. Source control works amazingly well when pair-coding with agents.</p> </div> <div class="section" id="testing-strategy"> <h2>Testing strategy</h2> <p>It's a key point discussed in every &quot;how to succeed with AI&quot; article, but still critical enough to reiterate here: a solid testing strategy is absolutely crucial for success. Agents produce - by far - the best results when they have a solid test suite to test their code against.</p> <p>With the <a class="reference external" href="https://github.com/eliben/pycparser">pycparser</a> rewrite, I had a large existing test suite. For <a class="reference external" href="https://github.com/eliben/watgo">watgo</a>, the very first thing I did was think through how to adapt the test suites of the <a class="reference external" href="https://github.com/WebAssembly/spec/">WASM spec</a> and of the <a class="reference external" href="https://github.com/WebAssembly/wabt">wabt project</a> for my needs.</p> <p>If your project doesn't have such tests to rely on, this should be your first order of business - finding one, or building one from scratch. Beware of self-reinforcing loops though; it's dangerous to trust agents for both the tests and the implementations tested against them.</p> </div> <div class="section" id="language-choice-go-for-agent-written-projects"> <h2>Language choice - Go for agent-written projects</h2> <p>Go is a fantastic language for agents to write, because it's designed to be very readable by humans. The biggest strengths of Go are exactly what makes the experience of reviewing agent code so positive:</p> <ul class="simple"> <li>Go changes very infrequently, so you don't have to wonder &quot;are we using the most modern / idiomatic approach&quot; or &quot;what the hell is this construct&quot; as often as with other languages (looking at you, Python and TypeScript).</li> <li>There are relatively few ways to accomplish the same thing in Go, further lowering the mental burden.</li> <li>The standard library is rich and there's much less need to keep abreast of the package-everyone-uses du jour.</li> <li>In general, Go is designed for readability, with a mild-but-still-strong type system, uniform formatting, explicit error propagation and opinionated choices already made for you.</li> </ul> <p>Since most of the time spent by humans when using agents is <em>reading</em> rather than <em>writing</em> code, these effects compound and produce a great experience. Recall the discussion of how some languages are optimized for writability (Perl) while others are optimized for readability (Go)? Well, when working on a project with an agent we live in a world of 99% reading vs. 1% writing, so this really matters.</p> <p>I find this aspect really crucial in light of the earlier points made in this post - namely, keeping the human in the loop by understanding and reviewing all of the agent's design choices and code.</p> </div> <div class="section" id="final-thoughts"> <h2>Final thoughts</h2> <p>If you're working on a subject that's completely new to you, I would strongly recommend <em>against</em> the approach described in this post. To really learn something, you have to work through it from scratch, yourself, reading, designing, writing the code. Agents don't change this basic fact; even before agents, if you wanted to learn X, copying it from Stack Overflow or some other project clearly wasn't the right way to go. Similarly, while agents can be used as a prop for learning, they cannot learn <em>for you</em>.</p> <p>As a corollary, junior engineers should exercise <em>extreme caution</em> when relying on LLMs. There's no replacement to hard-won experience and the sweat and tears of learning new, challenging topics. Learning is supposed to be hard; if it's too easy, you're probably not learning.</p> <p>For senior engineers, agents are a boon; it's a great tool to increase productivity, avoid the boring stuff, and get unstuck from procrastination; but only when used judiciously.</p> <hr class="docutils" /> <table class="docutils footnote" frame="void" id="footnote-1" rules="none"> <colgroup><col class="label" /><col /></colgroup> <tbody valign="top"> <tr><td class="label"><a class="fn-backref" href="#footnote-reference-1">[1]</a></td><td>CL stands for Changelist, also known as a &quot;patch&quot; or a &quot;diff&quot; - basically a standalone commit that touches one or more files. This term originates from the source control systems Perforce and Subversion.</td></tr> </tbody> </table> <table class="docutils footnote" frame="void" id="footnote-2" rules="none"> <colgroup><col class="label" /><col /></colgroup> <tbody valign="top"> <tr><td class="label"><a class="fn-backref" href="#footnote-reference-2">[2]</a></td><td>Programming is the ultimate realization of thought; if machines can design, produce, maintain and understand code better than humans, it means they can start improving themselves, which is the definition of <a class="reference external" href="https://en.wikipedia.org/wiki/Technological_singularity">singularity</a>.</td></tr> </tbody> </table> </div>
Original Article
View Cached Full Text

Cached at: 06/08/26, 03:31 AM

# Thoughts on starting new projects with LLM agents Source: [https://eli.thegreenplace.net/2026/thoughts-on-starting-new-projects-with-llm-agents](https://eli.thegreenplace.net/2026/thoughts-on-starting-new-projects-with-llm-agents) A few months ago I wrote about[using LLM agents to help restructuring one of my Python projects](https://eli.thegreenplace.net/2026/rewriting-pycparser-with-the-help-of-an-llm/)\. It's worth beginning by saying that the rewrite has been successful by all reasonable measures; I've been able to continue maintaining that project since then without an issue\. In this post, I want to discuss another project I've recently completed with significant help from agents:[watgo](https://eli.thegreenplace.net/2026/watgo-a-webassembly-toolkit-for-go/)\. In this project many things are different; most notably, it's a from\-scratch project rather than a rewrite, and it uses a different programming language \(Go\)\. This post describes my experience working on the project, and some lessons learned along the way\. ## The process This is a new project, so it required extensive design\. I began by iterating on the design with the agent, with a sketch of the API\. For this purpose, I recommend using a Markdown file[committed into the repository](https://github.com/eliben/watgo/blob/main/doc/notes.md)for future reference\. After that, I started asking the agent to write CLs[\[1\]](https://eli.thegreenplace.net/2026/thoughts-on-starting-new-projects-with-llm-agents#footnote-1)in a logical order that made sense to me, keeping them small and reviewable \(more on this in the next section\)\. Sometimes it's not easy to have a small CL, and multiple rounds of revision may confuse the agent; in this case, I commit the CL and then go back and ask the agent to modify or refactor the code, as much as needed, with separate CLs\. In the worst case, the whole sequence can be reverted if I feel we've taken the wrong direction \(branches could also be helpful here for more complicated scenarios\)\. This point is worth reiterating: sometimes a single CL is a huge step forward, but requires lots of review, cleanup and refactoring to be viable\. I've had multiple instances where an agent produced several days of work in a single CL, but I then spent hours instructing it to clean up and refactor\. Overall, it's still a productivity gain, just not as much as some pundits would like us to believe\. ## Keeping the human in the loop Given the current state of agent capabilities, I think it's worth splitting projects into two categories: 1. Low importance / prototype / throw away projects where deep code understanding is unnecessary\. These can be "vibe\-coded" \(submitting agent code without even reviewing it\)\. 2. High importance projects that I actually want to maintain; here, vibe\-coding is ill advised and I insist on reviewing and guiding all code the agent writes before it's submitted \(or shortly after, as discussed above\)\. Thewatgoprojects is a clear example of \(2\): I certainly intend to maintain this project in the long term, so I insist on code that I understand\. With very few exceptions, no code gets in without full review and often multiple rounds of revisions\. Even if the cost for writing code went down, maintaining a project is so much more than that\. It's triaging and fixing bugs, it's thinking through what needs to be done rather than how to do it, it's keeping the code healthy over time, and so on\. As[Brian Kernighan said](https://www.goodreads.com/quotes/273375-everyone-knows-that-debugging-is-twice-as-hard-as-writing): > Everyone knows that debugging is twice as hard as writing a program in the first place\. So if you're as clever as you can be when you write it, how will you ever debug it? Maybe at some point agents will become good enough that projects in category \(2\) can be implemented and maintained completely autonomously\. Maybe\. But we're certainly not there yet\. My hunch is that getting there will require crossing the AGI line[\[2\]](https://eli.thegreenplace.net/2026/thoughts-on-starting-new-projects-with-llm-agents#footnote-2), after which little in our world remains certain\. ### Practical workflow If you're using an agent to send an actual PR and only review*that*, it's difficult to be disciplined enough to actually perform a thorough review\. I find the following method to be more reliable: I use a CLI agent running locally in my repository, and ask it to update the code there\. In parallel, I have a VSCode window open in the same project, where I can: 1. Review the agent's changes using VSCode's diff view 2. Make my own tweaks and code changes if needed Once I'm pleased with the change, I manually create a commit\. ## Keeping the CLs small As mentioned above, it's imperative to keep making progress in small chunks, with small enough CLs that a human can fully understand in a single review\. It's very tempting to sprint ahead submitting thousands of lines of code every day, but this temptation has to be avoided\. Coding with an agent is like speed\-reading; yes, you're making more progress, but comprehension suffers the faster you go\. Particularly for refactoring, agents still take the shortest route to destination\. It's important to guide them to think about the "big picture" at all times, find all instances where X is better done as Y, not just a single place noticed during a review\. This is why it's sometimes OK to have a CL submitted before you fully agree with everything, and go back to it later for several refactoring rounds\. Source control works amazingly well when pair\-coding with agents\. ## Testing strategy It's a key point discussed in every "how to succeed with AI" article, but still critical enough to reiterate here: a solid testing strategy is absolutely crucial for success\. Agents produce \- by far \- the best results when they have a solid test suite to test their code against\. With the[pycparser](https://github.com/eliben/pycparser)rewrite, I had a large existing test suite\. For[watgo](https://github.com/eliben/watgo), the very first thing I did was think through how to adapt the test suites of the[WASM spec](https://github.com/WebAssembly/spec/)and of the[wabt project](https://github.com/WebAssembly/wabt)for my needs\. If your project doesn't have such tests to rely on, this should be your first order of business \- finding one, or building one from scratch\. Beware of self\-reinforcing loops though; it's dangerous to trust agents for both the tests and the implementations tested against them\. ## Language choice \- Go for agent\-written projects Go is a fantastic language for agents to write, because it's designed to be very readable by humans\. The biggest strengths of Go are exactly what makes the experience of reviewing agent code so positive: - Go changes very infrequently, so you don't have to wonder "are we using the most modern / idiomatic approach" or "what the hell is this construct" as often as with other languages \(looking at you, Python and TypeScript\)\. - There are relatively few ways to accomplish the same thing in Go, further lowering the mental burden\. - The standard library is rich and there's much less need to keep abreast of the package\-everyone\-uses du jour\. - In general, Go is designed for readability, with a mild\-but\-still\-strong type system, uniform formatting, explicit error propagation and opinionated choices already made for you\. Since most of the time spent by humans when using agents is*reading*rather than*writing*code, these effects compound and produce a great experience\. Recall the discussion of how some languages are optimized for writability \(Perl\) while others are optimized for readability \(Go\)? Well, when working on a project with an agent we live in a world of 99% reading vs\. 1% writing, so this really matters\. I find this aspect really crucial in light of the earlier points made in this post \- namely, keeping the human in the loop by understanding and reviewing all of the agent's design choices and code\. ## Final thoughts If you're working on a subject that's completely new to you, I would strongly recommend*against*the approach described in this post\. To really learn something, you have to work through it from scratch, yourself, reading, designing, writing the code\. Agents don't change this basic fact; even before agents, if you wanted to learn X, copying it from Stack Overflow or some other project clearly wasn't the right way to go\. Similarly, while agents can be used as a prop for learning, they cannot learn*for you*\. As a corollary, junior engineers should exercise*extreme caution*when relying on LLMs\. There's no replacement to hard\-won experience and the sweat and tears of learning new, challenging topics\. Learning is supposed to be hard; if it's too easy, you're probably not learning\. For senior engineers, agents are a boon; it's a great tool to increase productivity, avoid the boring stuff, and get unstuck from procrastination; but only when used judiciously\. --- [\[1\]](https://eli.thegreenplace.net/2026/thoughts-on-starting-new-projects-with-llm-agents#footnote-reference-1)CL stands for Changelist, also known as a "patch" or a "diff" \- basically a standalone commit that touches one or more files\. This term originates from the source control systems Perforce and Subversion\.[\[2\]](https://eli.thegreenplace.net/2026/thoughts-on-starting-new-projects-with-llm-agents#footnote-reference-2)Programming is the ultimate realization of thought; if machines can design, produce, maintain and understand code better than humans, it means they can start improving themselves, which is the definition of[singularity](https://en.wikipedia.org/wiki/Technological_singularity)\.

Similar Articles

@knoYee_: https://x.com/knoYee_/status/2062780637677752366

X AI KOLs Timeline

The author reviews three months of experience using multi-agent collaboration, summarizing five main pain points (such as conflicts between agents, ignoring boundary conditions, self-censorship failure, difficulty in merging decisions, and exposing harder problems after compressed execution) and two insights (the high value of read-only review agents, and that agent conflicts expose ambiguous requirements), emphasizing the core decision-making role of humans in AI collaboration.

@GoSailGlobal: https://x.com/GoSailGlobal/status/2068879365711032708

X AI KOLs Timeline

gwern proposed the 'Guardian Angel' approach, advocating for training an LLM digital twin that imitates the user themselves, in order to solve the principal-agent problem and security risks of general AI assistants, and provided a complete roadmap from alignment theory to technical implementation.

LLMs and performative productivity

Lobsters Hottest

A developer reflects on using AI agents and questions whether the apparent productivity gains are genuine or merely performative, noting that while tasks are completed faster, deep understanding and real value may be lost.

Local LLM Peeps

Reddit r/LocalLLaMA

A developer with 45 years of experience is building a local-first harness for LLMs with multi-agent logic, soon to be open-sourced on GitHub, and asks the community what features would improve their local LLM experience.