Unpopular opinion: most production AI agents are flying blind and their developers don't know it

Reddit r/AI_Agents News

Summary

A developer argues that most production AI agents lack essential observability like session traces and cost tracking, comparing it to deploying a web app without monitoring. The article questions whether agent observability is an unsolved problem.

Talked to several dev agencies building LangChain/LangGraph agents for clients lately, plus seen a lot more in threads here and on r/LangChain. A pattern keeps showing up: zero production observability. No session traces. No per-session cost tracking. No alerting when the agent starts behaving differently. The usual answer: "we check the OpenAI dashboard" or "our client would tell us if something was wrong." This is insane to me. We wouldn't deploy a web app without Sentry and uptime monitoring. But somehow AI agents — which are way more unpredictable — get deployed with nothing. Is this just early days and everyone knows it? Or is observability for agents genuinely an unsolved problem? Curious what production setups actually look like at companies doing this seriously.
Original Article

Similar Articles

I analyzed how 50+ AI teams debug production agent failures and got surprised

Reddit r/AI_Agents

Based on interviews with 50+ AI teams, the author highlights that production agent failures often stem from minor prompt or configuration issues rather than deep model problems. The article advocates for adopting software engineering practices like versioning, A/B testing, and experiment tracking to improve reliability.

Most agent observability feels like crash footage

Reddit r/AI_Agents

The author argues that current agent observability provides a trace of actions but lacks runtime justification for why actions were permitted, which is critical for production deployments involving money, data, or communications.

The Real Truth About AI Agents

Reddit r/AI_Agents

An experienced practitioner shares hard-won lessons from deploying 25+ AI agents to production, arguing that memory, orchestration, and auditability matter far more than model choice. The article details common failure modes like context loss and silent cost loops, and recommends a stack including Claude Sonnet 4, Pydantic AI, and dedicated memory layers like Octopodas.