oracle-routing

#oracle-routing

The Capability Frontier: Benchmarks Miss 82% of Model Performance

arXiv cs.AI ↗ · 4d ago Cached

The paper introduces the Capability Frontier, a Pareto frontier over models that corrects for biases in single-model and single-run evaluations, showing that standard benchmarks miss up to 82% of model performance and that collective LLM capabilities are substantially underestimated.

0 favorites 0 likes

oracle-routing

The Capability Frontier: Benchmarks Miss 82% of Model Performance

Submit Feedback