@morganlinton: https://x.com/morganlinton/status/2069794086220116452

X AI KOLs Timeline News

Summary

A CTO shares insights on the execution layer deficiency in agentic coding workflows, tracing the concept back to 1975 and discussing modern model choices.

https://t.co/JHNQlBHUuj
Original Article
View Cached Full Text

Cached at: 06/25/26, 01:20 PM

What I Learned Rethinking the Execution Layer in our Agentic Coding Workflow

I moved our engineering team over to agentic coding workflows in early 2025, not fully, but enough to really begin the foundational change. Then in Q4, we fully ripped off the bandaid and went all in, and as everyone knows, and @mattshumer_ nailed in his article, everything changed in February of this year, and I was even more glad we made the moves we did, as early as we did.

If you haven’t read Matt’s article, and you’re still thinking in, or living in, the world of pre-February 2026, read this before going any further. It is required reading, imho, to understand where we are today with modern reasoning models from companies like OpenAI, Anthropic, and now SpaceXAI.

Matt Shumer@mattshumer_·Feb 11 ArticleSomething Big Is HappeningThink back to February 2020. If you were paying close attention, you might have noticed a few people talking about a virus spreading overseas. But most of us weren’t paying close attention. The stock…6.5K41K118K87M

So now we have an entirely new class of models, and with them, a new agentic coding workflow that is often divided into three layers:

  • Orchestration

  • Execution

  • Code review

Before I go any further, I want to share a little history here, because while this may sound like a new concept, it’s actually a very old concept, and, believe it or not, it came from an AI lab, at Stanford, in 1975, yes, 1975, 51 years ago.

This concept was originally conceived by Earl D. Sacerdoti, and published in a paper titled, “**The nonlinear nature of plans.” **

And yes, you can still read the paper today, it is published in a number of different places online, here’s one place you can read it.

Over the next fifty years, countless papers have been published covering this topic and continuing to elaborate on it. The most recent paper that applies this more directly to the core agentic coding workflow that has become fairly standard today, comes from Purdue University, in October of last year, in a paper titled, “Verification-Aware Planning for Multi-Agent Systems”

This paper both talks about the current model for agentic coding, i.e. breaking it down into three layers, and proposes an optimization called VeriMap to further optimize. If you want to read this paper, here’s the link.

I’ll stop with the history lesson here, but my point is, this concept has been around since the 70’s, but over the last couple of years, it has been applied to modern agentic coding workflows, and people continue to think about this a lot.

For me, as a CTO, leading a team of engineers, I am able to see how all of this stuff comes together in practice. And, I have learned a lot, particularly over the last year about how this workflow has, what I see as a real deficiency at the execution layer, which I’m currently working with my team to test different solution paths.

So what’s the problem with the execution layer?

The problem with the current framework is (typically) one model is selected for orchestration, let’s say something like Opus that is good with planning. Then another model is selected for execution, maybe Composer 2.5, maybe GLM 5.2, and then a third model is used to run a validation/code review step, lately I’ve been liking Grok Build for this.

Now you can change the models at each location, based on the complexity of the task. So suppose you have a less complex task, you might be able to use Opus with Medium reasoning effort, and then you pick Composer 2.5 or maybe DeepSeek v4 at the execution layer. And in some cases, this will do the trick.

The challenge is, for more complex tasks, the choice is often to use a model like Opus 4.8 High or Extra, or GPT 5.5 High or xHigh at the execution layer. And while, yes, these models are typically better at doing the more complex tasks, they also think a lot more and use a lot more tokens.

And here’s the aha moment I had a few months ago.

Pretty much all complex tasks, aren’t uniformly complex in all aspects of the task itself. In fact, it’s usually only a part of the task that’s highly complex, within it are often also relatively easy, and a nice chunk of medium difficulty coding that needs to be done.

By picking just one model at the execution layer, and saying something like, “well this is a really complex task, so we need Opus 4.8 Max to make sure it’s done right,” you end up doing the easy, medium, and difficult components, all with a model that thinks for a long time, calls a ton of tools, and uses an insane amount of tokens.

In reality, for that complex task. Yes, you likely do need the horsepower of an Opus 4.8 Max or GPT 5.5 xHigh, but only for 20% of it. The rest can probably be done with two other models, one for the easier stuff, and one for the less easy, but still not crazy hard stuff.

Over the last few months I have been experimenting a lot with this new pattern, and using three models at the execution layer for more complex tasks. I have found very meaningful gains, both in speed and resulting code quality, and with significantly decreased token use.

To make this possible, I have been working on some more detailed customizations at the orchestration layer, because it shouldn’t be my puny human brain figuring out how to best separate a task into three components, an insanely smart, and deep-thinking llm can beat me at this any day, and can learn better over time.

But the pattern I’m using now is. Create a plan, then pass this plan onto an Execution Layer Orchestrator, this orchestrator knows what models I have available to use, and then picks not just the model, but the effort level to run the model, after dividing the plan into three separate slices, based on difficulty and what it thinks is the thinking depth that is needed.

I don’t know what the heck to call this methodology yet, for now I’m just calling it Execution Model Router. I’m sure I’ll come up with a better name over time.

What I can say is, if you’re leading an engineering team, and having everyone just use Opus High or GPT 5.5 High for everything, I can promise you, you are overspending on tokens, and also going slower as a result of over-using thinking depth in these models. But then, if you go a step further, and decide, okay, let’s use a different model for orchestration, execution, and code review, I’d argue you’re still not optimizing model use correctly, specifically at the execution layer.

Just look at what percent of the time your team is using a model with high thinking depth to write code. While you should be using the horsepower of an Opus 4.8 High or Extra, or GPT 5.5 High or xHigh to write some code, if that’s all you’re using at the execution layer, there’s a real optimization path you can take.

Of course, when you have a workflow deployed to your team, making changes can be tricky, but at the same time, what if you found out you’re using 2x - 3x the number of tokens you need to, to get the same code quality, and faster? These are the token efficiencies I think people will actually see putting this into practice, and the speed gains are massive.

Dividing the execution layer into three parts like this, also means you don’t need to use models in “fast” mode nearly as much, if ever. Instead, models like Opus 4.8 Medium, or as I shared a couple of weeks ago Fable Low, are very fast, and while you wouldn’t throw them at high-complexity tasks, they can hand routine tasks very well, and at a fraction of the cost/tokens, and yes, much faster than a higher effort level/thinking depth.

Okay, I wanted to get this out of my head and into the world, and since I’ve been working on this for a few months now, and seen the impact, I am confident there’s something here.

More than anything, I hope any engineering leaders read this and think more deeply about how their teams are thinking about models and effort levels at the execution layer. If your team just always uses a model in high effort mode, this is an easy place to start thinking more deeply.

More to come, I’ve been tinkering with some interesting tooling around this, and hopefully can share more soon, but for now, this is what’s on my mind, and I’m excited to explore it further.

Similar Articles

@mvanhorn: https://x.com/mvanhorn/status/2063865685558903149

X AI KOLs Following

The article explains the concept of 'loops' in AI coding, where developers write programs that prompt coding agents instead of manually prompting, as popularized by Peter Steinberger and Boris Cherny, and discusses how this shift represents a new abstraction layer in AI-assisted development.

@unicodef1wn: https://x.com/unicodef1wn/status/2070179071548395916

X AI KOLs Timeline

A thread explaining how Anthropic's dynamic workflows in Claude Code allow Claude to build custom harnesses for complex tasks, preventing failure modes like agentic laziness, self-preferential bias, and goal drift by splitting work across separate agents. It includes practical examples and patterns for users to implement.