Tag
This Systematization of Knowledge paper proposes a unified Multi-Trait Multi-Method (MTMM) geometric framework for evaluating Large Language Models, unifying disparate metrics into a shared latent coordinate space to address construct validity issues in current benchmarks.
This paper critiques the 'Proxy Presumption' in NLP, where geometric embedding properties are incorrectly equated with social constructs. It introduces the Construct Validity Protocol and Counterfactual Neutralization methods to ensure rigorous validation of social measures derived from semantic embeddings.