The author reflects that the primary bottleneck in running AI agents is not the model's capability but the human's ability to precisely define what 'good' or 'done' means, drawing parallels to managing people.
Been running agents on real work for a while now. Coding agents in a loop, content extraction, a few internal jobs that just churn away. And the same pattern keeps showing up, so I want to put it to people who are actually doing this rather than theorising about it. The loop in practice looks like this. I set a goal and a standard. The agent runs, checks its own output against that standard, and keeps going until it either hits the bar or burns through the token budget I gave it. Plan, act, check, repeat. When it works, it's genuinely good. When it doesn't, almost every time I trace it back, the failure isn't the model being thick. It's me. I didn't say what "done" actually meant tightly enough for the loop to know when it had got there. That's the bit I keep snagging on. A self-checking loop is only as good as the thing it checks against. If "make this better" is the standard, the agent will happily decide it's better and stop. If the standard is specific and testable, it'll grind until it actually meets it or runs out of budget. So the scarce skill stops being prompting tricks or picking the right framework. It becomes being able to say what good looks like precisely enough that a loop can self-assess without me in the seat. There's a model going round at the moment that splits an AI-native company into three layers: humans doing strategy and taste, agents doing execution, and a shared context layer in between that both read and write to. I broadly buy it. But running it day to day, it collapses for me. The "context layer" isn't a separate tier, it's just the medium. The actual job is one thing: a human conveying the standard clearly enough that the execution layer can run unsupervised against it. And here's where it gets uncomfortable for me. That skill, knowing when to step in and when to leave it alone, communicating the standard without ambiguity, being consistent so the agent isn't chasing a moving target, that's not a new agentic discipline. That's just managing people. Same job. I'm not orchestrating agents so much as managing a worker who is fast, tireless, literal, and has no instinct for what I actually meant. So the genuine question, and I don't think I've landed it: is "AI native" mostly just decent management pointed at agents? If yes, that's a bit deflating, because most of us were never that good at the management version either. The loop just exposes it faster. A vague brief to a human gets you a few days of drift before anyone notices. A vague brief to an agent gets you the same drift by lunchtime, on repeat, until the budget's gone. Or maybe I've got the seam wrong and there's something genuinely new in the agent version that doesn't map back onto managing people. That's the part I can't settle. For people running agents on real work, not demos: where does the analogy break for you? Is the hard skill defining "good" tightly, or is it somewhere else entirely? Still early and figuring this out, so I'd rather hear where I'm wrong than get agreement.
The article argues that the primary failure point for AI agents in production is not the model itself, but the lack of infrastructure such as stop buttons, billing oversight, and traceability for tool calls.
The author reflects on the challenges of moving AI agents from prototype to production, concluding that reliable orchestration and safeguarding mechanics are more critical than incremental model improvements.
The article argues that effective AI agents require restraint and explicit 'stop conditions' rather than endless autonomy, highlighting Ling-2.6-1T as a model suited for conservative planning roles.
A reflection on how AI coding agents often stall while waiting for human approval, highlighting that human availability can be a bigger bottleneck than model capability.
A developer recounts how many challenges in building AI agents actually stem from workflow and state management issues, not model intelligence, emphasizing the need for robust state handling and observability.