Tag
The author describes losing faith in public AI model benchmarks due to vendor-created metrics, self-reported parameters, and lack of independent verification, and advocates for building custom evaluation sets from real production traffic to make more relevant model comparisons.
OrcaRouter is a learning-based LLM router that dynamically routes prompts to appropriate models based on quality, cost, speed, and reliability, improving over time with production traffic.