benchmark-design

Tag

Cards List
#benchmark-design

The Evaluation Trap: Benchmark Design as Theoretical Commitment

arXiv cs.AI · 3d ago Cached

This paper identifies the 'evaluation trap' where AI benchmarks inadvertently stabilize dominant paradigms by narrowing what counts as progress, and introduces Epistematics, a meta-evaluative methodology to ensure evaluation criteria discriminate true capability from proxy behaviors.

0 favorites 0 likes
← Back to home

Submit Feedback