hypothesis-testing

Tag

Cards List
#hypothesis-testing

Margin-Adaptive Confidence Ranking for Reliable LLM Judgement

arXiv cs.LG · 21h ago Cached

This paper introduces a margin-based confidence ranking method for LLM-as-a-judge systems, learning a dedicated estimator to ensure monotonicity between confidence and human-disagreement risk, with generalization guarantees and improved ranking accuracy across datasets.

0 favorites 0 likes
← Back to home

Submit Feedback