stereology

Tag

Cards List
#stereology

The Evaluation Blind Spot: A Stereological Theory of Benchmark Coverage for Large Language Models

arXiv cs.LG · 4d ago Cached

This paper applies stereological theory to LLM benchmarks, revealing that current leaderboards measure only 3–5 independent dimensions, creating geometric blind spots that dominate statistical noise. It provides theoretical bounds on benchmark coverage and a submodular algorithm for efficient benchmark selection.

0 favorites 0 likes
← Back to home

Submit Feedback