I built an AI support agent where the main metric is unsafe auto-action rate, not just accuracy

Reddit r/AI_Agents Tools

Summary

A technical walkthrough of building a telecom customer support agent that prioritizes safety metrics over classifier accuracy, using a deterministic access gate, scoped tool execution, and route-level evaluation.

**I built a production-shaped AI customer support agent for telecom, and the biggest lesson was that classifier accuracy is not enough.** I recently finished **RelayOps v1.2**, a telecom/subscription customer-support agent built as a vertical slice of a production system. The goal was not to build another chatbot. I wanted to test what it takes to make an agent safer around customer data, billing, tool access, and hallucinated offers. What it includes: * deterministic access gate before any model * scoped tool execution for account/device actions * fine-tuned Qwen2.5-1.5B LoRA intent classifier * hybrid RAG with citations * guardrails for invented offers/prices and PII * human escalation for billing/payment/plan changes * adversarial agent evals * live Streamlit demo on Railway * public Hugging Face adapter The most useful part was moving from **classifier accuracy** to **route-level safety metrics**. A classifier can be wrong and still safe if the router escalates. The dangerous case is when a wrong prediction causes an unsafe auto-action. For v1.2, I added a 100-case adversarial routing eval: * classifier accuracy: 0.880 * macro-F1: 0.872 * safe-route rate: 1.000 * route-correct rate: 0.890 * unsafe auto-action: 0.000 * billing escape: 0.000 That changed how I think about agent evaluation. For production-style agents, the question is not only: “Did the model classify correctly?” It is also: “Did the system still make the safe decision?” Would love feedback on the eval design, especially the route-level safety metrics.
Original Article

Similar Articles

how to fix ai agent reliability?

Reddit r/AI_Agents

Discusses the challenge of moving AI agents from sandbox to production, highlighting high sensitivity causing noise, and proposes solutions like secondary evaluators, heuristics, and cascading architectures. Asks the community about their approaches to filtering.

AI safety is arguing about the wrong boundary

Reddit r/AI_Agents

This article argues that the AI safety debate is misdirected, focusing on model alignment and internal controls instead of the critical boundary: external admission authority over agent execution. It warns that systems capable of self-authorizing high-impact actions (e.g., deploying code, moving money) pose a fundamental risk that logging and monitoring cannot mitigate.