How are teams handling prompt QA at scale?

Reddit r/AI_Agents News

Summary

A practitioner at a company handling ~40k conversations/month describes the bottleneck of manual prompt QA and asks how teams are using automated systems to detect regressions and user frustration in production.

Curious how teams are handling prompt QA once volume gets high. We’re at \~40k conversations/month and currently have PMs manually reading transcripts to figure out: * what broke * where users get frustrated * which prompt/workflow changes helped or hurt The annoying part is the review workload scales almost linearly with conversation volume. We ship a lot of prompt updates every month, so keeping quality high is becoming a real bottleneck. I keep feeling there *has* to be a better way than “read more transcripts.” Are people actually using automated systems to surface issues/regressions in production? Like: * “this flow started failing more after version X” * “users in this branch churn more” * “these conversations became longer after the prompt change” Not looking for vendor pitches honestly — more interested in what’s genuinely working in production.
Original Article

Similar Articles