underconfidence

Tag

Cards List
#underconfidence

Confidence Calibration in Large Language Models

arXiv cs.AI · 2026-05-26 Cached

This paper analyzes the confidence calibration of 11 popular LLMs, finding that they are generally overconfident, especially on hard tasks, and underconfident on easy tasks. It introduces LifeEval, a test for evaluating calibration across difficulty levels.

0 favorites 0 likes
← Back to home

Submit Feedback