72% of teams are running coding agents in production. Most of them can't say which agent they'd trust with a critical path change at 11pm, or why.

Reddit r/AI_Agents News

Summary

While 72% of teams use coding agents in production, most lack formal governance or empirical data on agent reliability. The article argues for session-level tracking over policy frameworks to ensure trust in critical deployments.

There's a governance gap stat making the rounds this week: 72% of firms are in production with agentic AI, 60% have no formal governance in place. Most of the discussion treats this as a policy problem, org charts, risk frameworks, sign-off procedures. That's not wrong, but I think it's the wrong layer to start at. The layer underneath the policy question is this: can your team actually answer, for any given coding agent instance you're running, what that instance has demonstrated it can be trusted to do? Not "what is this model good at" in the general sense. What has this specific instance, running in your environment, on your codebase, shown it can handle reliably, and what has it consistently gotten wrong? Most teams I've talked to can't answer that. The routing decisions are based on whoever used the agent last, what they remember working, and occasionally a benchmark rank that says nothing about performance in your specific context. That's not governance. It's informed guessing. The evidence that would actually support a governance decision, ie session traces, behavioral data per instance, scores across dimensions like reasoning quality, constraint compliance, and handling ambiguity, most teams aren't capturing it. You get the output. The session disappears. So you end up with a team that's in production with agents but couldn't reconstruct, for any critical deployment that went wrong, what the agent actually did step by step and whether it behaved consistently with prior sessions. For those running agents, how are you handling this? Are you capturing session-level data, or operating on output and vibes?
Original Article

Similar Articles

I analyzed how 50+ AI teams debug production agent failures and got surprised

Reddit r/AI_Agents

Based on interviews with 50+ AI teams, the author highlights that production agent failures often stem from minor prompt or configuration issues rather than deep model problems. The article advocates for adopting software engineering practices like versioning, A/B testing, and experiment tracking to improve reliability.

AI agent management tools by governance layer not by feature list

Reddit r/AI_Agents

An analysis highlighting that most enterprise AI agent security investments focus on model layer guardrails and observability, leaving critical gaps at the access and protocol layers. Citing a 2026 report, 75% of enterprise AI agents remain unsecured due to near-zero coverage in these layers.