Tag
This paper applies stereological theory to LLM benchmarks, revealing that current leaderboards measure only 3–5 independent dimensions, creating geometric blind spots that dominate statistical noise. It provides theoretical bounds on benchmark coverage and a submodular algorithm for efficient benchmark selection.