Tag
This paper formalizes pairwise reference alignment as a model-level ordinal observable, defining a statistic to measure agreement between a model's scoring and a reference preference distribution, with finite-sample estimators and an empirical study on Qwen2.5 models and RewardBench.