Tag
A tweet promoting a GitHub repository containing over 300 real ML system design case studies from major companies like Google, Amazon, Microsoft, and Netflix, aiming to teach how production ML systems are actually built.
This article discusses the challenges of evaluating and monitoring LLM-based agents in production, covering offline evals, prompt engineering pitfalls, observability tools, review queues, labeling, clustering, topic classification, and cost-effective layering of human review, LLM-as-a-judge, and small classifiers.