Tag
This paper introduces ECC, an algorithm that calibrates semantic embeddings with limited model comparisons to cluster queries by latent capability requirements, improving LLM capability ranking quality by over 17 percentage points over baselines.
This paper proposes Evidence-Calibrated Query Clustering (ECC), an algorithm that aligns semantic embeddings with latent LLM capability demands using posterior model comparisons and Bradley-Terry modeling, significantly improving capability ranking quality for LLM evaluation.