hypothesis-testing

#hypothesis-testing

Margin-Adaptive Confidence Ranking for Reliable LLM Judgement

arXiv cs.LG ↗ · 21h ago Cached

This paper introduces a margin-based confidence ranking method for LLM-as-a-judge systems, learning a dedicated estimator to ensure monotonicity between confidence and human-disagreement risk, with generalization guarantees and improved ranking accuracy across datasets.

0 favorites 0 likes

hypothesis-testing

Margin-Adaptive Confidence Ranking for Reliable LLM Judgement

Submit Feedback