Running agents all day, I keep noticing the bottleneck is me defining "good", not the model

Reddit r/AI_Agents 06/10/26, 12:00 PM News

ai-agents agent-loop supervision human-in-the-loop management defining-good

Summary

The author reflects that the primary bottleneck in running AI agents is not the model's capability but the human's ability to precisely define what 'good' or 'done' means, drawing parallels to managing people.

Been running agents on real work for a while now. Coding agents in a loop, content extraction, a few internal jobs that just churn away. And the same pattern keeps showing up, so I want to put it to people who are actually doing this rather than theorising about it. The loop in practice looks like this. I set a goal and a standard. The agent runs, checks its own output against that standard, and keeps going until it either hits the bar or burns through the token budget I gave it. Plan, act, check, repeat. When it works, it's genuinely good. When it doesn't, almost every time I trace it back, the failure isn't the model being thick. It's me. I didn't say what "done" actually meant tightly enough for the loop to know when it had got there. That's the bit I keep snagging on. A self-checking loop is only as good as the thing it checks against. If "make this better" is the standard, the agent will happily decide it's better and stop. If the standard is specific and testable, it'll grind until it actually meets it or runs out of budget. So the scarce skill stops being prompting tricks or picking the right framework. It becomes being able to say what good looks like precisely enough that a loop can self-assess without me in the seat. There's a model going round at the moment that splits an AI-native company into three layers: humans doing strategy and taste, agents doing execution, and a shared context layer in between that both read and write to. I broadly buy it. But running it day to day, it collapses for me. The "context layer" isn't a separate tier, it's just the medium. The actual job is one thing: a human conveying the standard clearly enough that the execution layer can run unsupervised against it. And here's where it gets uncomfortable for me. That skill, knowing when to step in and when to leave it alone, communicating the standard without ambiguity, being consistent so the agent isn't chasing a moving target, that's not a new agentic discipline. That's just managing people. Same job. I'm not orchestrating agents so much as managing a worker who is fast, tireless, literal, and has no instinct for what I actually meant. So the genuine question, and I don't think I've landed it: is "AI native" mostly just decent management pointed at agents? If yes, that's a bit deflating, because most of us were never that good at the management version either. The loop just exposes it faster. A vague brief to a human gets you a few days of drift before anyone notices. A vague brief to an agent gets you the same drift by lunchtime, on repeat, until the budget's gone. Or maybe I've got the seam wrong and there's something genuinely new in the agent version that doesn't map back onto managing people. That's the part I can't settle. For people running agents on real work, not demos: where does the analogy break for you? Is the hard skill defining "good" tightly, or is it somewhere else entirely? Still early and figuring this out, so I'd rather hear where I'm wrong than get agreement.

Original Article

Running agents all day, I keep noticing the bottleneck is me defining "good", not the model

Similar Articles

Your agent isn't failing because of the model, it's failing because nobody built a stop button

After months of building agents, I've changed my mind about what matters most.

The best agent model is the one that knows when to stop

Do AI agents spend more time waiting for humans than actually working?

Most of our “agent” problems turned out to be workflow/state problems

Submit Feedback

Similar Articles

Your agent isn't failing because of the model, it's failing because nobody built a stop button

After months of building agents, I've changed my mind about what matters most.

The best agent model is the one that knows when to stop

Do AI agents spend more time waiting for humans than actually working?

Most of our “agent” problems turned out to be workflow/state problems