Tag
Practical findings from auditing a production customer support RAG system reveal that heuristic evaluators give false signal, retrieval bugs often masquerade as LLM failures, and the Pareto frontier for cost and quality is often not where expected. Sweeping models showed that replacing the incumbent (Gemini Flash Lite Preview) with Gemma 4 26B achieved a 19% quality improvement at 79% lower cost.