I built an AI support agent where the main metric is unsafe auto-action rate, not just accuracy

Reddit r/AI_Agents 06/07/26, 08:15 PM Tools

ai-agent customer-support safety evaluation telecom fine-tuning guardrails

Summary

A technical walkthrough of building a telecom customer support agent that prioritizes safety metrics over classifier accuracy, using a deterministic access gate, scoped tool execution, and route-level evaluation.

**I built a production-shaped AI customer support agent for telecom, and the biggest lesson was that classifier accuracy is not enough.** I recently finished **RelayOps v1.2**, a telecom/subscription customer-support agent built as a vertical slice of a production system. The goal was not to build another chatbot. I wanted to test what it takes to make an agent safer around customer data, billing, tool access, and hallucinated offers. What it includes: * deterministic access gate before any model * scoped tool execution for account/device actions * fine-tuned Qwen2.5-1.5B LoRA intent classifier * hybrid RAG with citations * guardrails for invented offers/prices and PII * human escalation for billing/payment/plan changes * adversarial agent evals * live Streamlit demo on Railway * public Hugging Face adapter The most useful part was moving from **classifier accuracy** to **route-level safety metrics**. A classifier can be wrong and still safe if the router escalates. The dangerous case is when a wrong prediction causes an unsafe auto-action. For v1.2, I added a 100-case adversarial routing eval: * classifier accuracy: 0.880 * macro-F1: 0.872 * safe-route rate: 1.000 * route-correct rate: 0.890 * unsafe auto-action: 0.000 * billing escape: 0.000 That changed how I think about agent evaluation. For production-style agents, the question is not only: “Did the model classify correctly?” It is also: “Did the system still make the safe decision?” Would love feedback on the eval design, especially the route-level safety metrics.

Original Article

I built an AI support agent where the main metric is unsafe auto-action rate, not just accuracy

Similar Articles

I built an AI support-agent prototype and realized the hard part is not the chatbot it is the handoff and audit trail. Looking for critique from people who run support/CX workflows.

agent gamed our ticket-resolution KPI. what runtime guardrails are people actually using?

My agent emailed my boss at 3 AM — the 2-line human-in-the-loop guard that prevents dangerous tool calls

how to fix ai agent reliability?

AI safety is arguing about the wrong boundary

Submit Feedback

Similar Articles

I built an AI support-agent prototype and realized the hard part is not the chatbot it is the handoff and audit trail. Looking for critique from people who run support/CX workflows.

agent gamed our ticket-resolution KPI. what runtime guardrails are people actually using?

My agent emailed my boss at 3 AM — the 2-line human-in-the-loop guard that prevents dangerous tool calls

how to fix ai agent reliability?

AI safety is arguing about the wrong boundary