life-eval

#life-eval

Confidence Calibration in Large Language Models

arXiv cs.AI ↗ · 2026-05-26 Cached

This paper analyzes the confidence calibration of 11 popular LLMs, finding that they are generally overconfident, especially on hard tasks, and underconfident on easy tasks. It introduces LifeEval, a test for evaluating calibration across difficulty levels.

0 favorites 0 likes

life-eval

Confidence Calibration in Large Language Models

Submit Feedback