oracle-routing

Tag

Cards List
#oracle-routing

The Capability Frontier: Benchmarks Miss 82% of Model Performance

arXiv cs.AI · 4d ago Cached

The paper introduces the Capability Frontier, a Pareto frontier over models that corrects for biases in single-model and single-run evaluations, showing that standard benchmarks miss up to 82% of model performance and that collective LLM capabilities are substantially underestimated.

0 favorites 0 likes
← Back to home

Submit Feedback