"At what point does adding another agent actually hurt your system? Asking because my 6-agent pipeline is slower and less reliable than my old 2-agent one

Reddit r/AI_Agents News

Summary

A developer shares real-world experiences with AI orchestration frameworks (LangGraph, CrewAI, AutoGen), noting trade-offs between ease of prototyping and production reliability, and asks the community about handling failures, human-in-the-loop, and token costs.

I've been evaluating orchestration frameworks for the past few months and I'm getting tired of benchmark posts and YouTube tutorials that all conveniently end right before deployment. Here's where I landed after actually shipping a few things: **LangGraph** \- solid for stateful workflows where you need explicit control over the graph. The checkpointing is genuinely useful. But the debugging story is rough. When something breaks mid-graph in production, tracing back what state you were in is painful unless you've built your own observability layer on top. **CrewAI** \- great for prototyping fast. Role-based agents feel intuitive to set up. But I hit a wall when I needed anything non-standard. The abstraction that makes it easy early on becomes a ceiling. Also had reliability issues with longer tasks - agents would go off-script in ways that were hard to reproduce. **AutoGen** \- haven't shipped this one, only used it in demos. The conversational multi-agent loop looks impressive but I genuinely don't know how you'd put guardrails around it in a real production environment. Happy to be wrong on this. What I actually use now is a lighter custom setup for anything customer-facing, and LangGraph only when I need durable state across long-running tasks. Curious what others have actually shipped - not what looked good in a notebook. Specifically interested in: 1. How you handle failures mid-workflow? 2. Whether you're using any of these with human-in-the-loop steps 3. Token costs at scale - did the framework choice affect this at all? Thanks in advance
Original Article

Similar Articles

The Real Truth About AI Agents

Reddit r/AI_Agents

An experienced practitioner shares hard-won lessons from deploying 25+ AI agents to production, arguing that memory, orchestration, and auditability matter far more than model choice. The article details common failure modes like context loss and silent cost loops, and recommends a stack including Claude Sonnet 4, Pydantic AI, and dedicated memory layers like Octopodas.