dark-current

Tag

Cards List
#dark-current

LLM Judges Have Dark Current: A Psychometric Datasheet for LLM-as-a-Judge Evaluation

arXiv cs.CL · yesterday Cached

This paper introduces a psychometric datasheet protocol for evaluating LLM judges as measurement instruments, measuring dark current, positional false preference, stable cross-sensitivity, and target sensitivity. A case study on three open-weight models reveals significant differences in judge quality and behavior.

0 favorites 0 likes
← Back to home

Submit Feedback