low-rank-matrix-completion

#low-rank-matrix-completion

You Don't Need to Run Every Eval

arXiv cs.LG ↗ · 2d ago Cached

This research paper demonstrates that the scores of frontier AI models across 133 benchmarks are approximately rank-2, meaning only two latent factors explain over 90% of variation. The authors introduce BenchPress, a logit-space matrix completion method that predicts a model's full scorecard from just a few benchmarks, significantly reducing the cost of evaluation.

0 favorites 0 likes

low-rank-matrix-completion

You Don't Need to Run Every Eval

Submit Feedback