Large Language Models Are Overconfident in Their Own Responses
Summary
This paper investigates why instruction-tuned LLMs are overconfident in their own responses, identifying an 'ownership bias' that gives higher confidence to self-generated answers. It proposes a simple inference-time strategy to reframe the model's answer as user input, improving calibration by up to 26% without retraining.
View Cached Full Text
Cached at: 06/11/26, 01:37 PM
Paper page - Large Language Models Are Overconfident in Their Own Responses
Source: https://huggingface.co/papers/2606.03437
Abstract
Instruction tuning degrades calibration in large language models, with chat templates exacerbating overconfidence through ownership bias, which can be mitigated by reframing model responses as user input during confidence assessment.
Prior work has shown thatinstruction-tuned large language models(LLMs) are less well calibrated than their base pre-trained counterparts. However, little is known about the frequently usedchat template’s effect on thecalibrationof conversational LLMs. In this work, we investigate the mechanisms driving this miscalibrationby decoupling the effects of the post-training algorithm and the chat format. We find that, while instruction tuning fundamentally harmscalibration, thechat templateaggravates the issue through an “ownership bias” -- models are significantly more confident in their own answers than in identical answers provided by a user. Extensive experiments across six recent open-weight LLMs, three benchmarks, and threeconfidence elicitationmethods show that models assign up to 26% higher confidence to their own responses. Leveraging this insight, we propose a simple inference-time strategy: framing the model’s answer as user input duringconfidence elicitation. This approach significantly reducesoverconfidenceand improvescalibrationby up to 26% without the need forretraining, narrowing the gap between base and instruction-tuned models.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2606\.03437
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.03437 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.03437 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.03437 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Confidence Calibration in Large Language Models
This paper analyzes the confidence calibration of 11 popular LLMs, finding that they are generally overconfident, especially on hard tasks, and underconfident on easy tasks. It introduces LifeEval, a test for evaluating calibration across difficulty levels.
Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R]
This research presents probe-targeted fine-tuning (LoRA) to make LLMs verbally express their internal confidence, achieving causal control over confidence outputs and demonstrating that models often know when they are right or wrong but fail to articulate it.
Can LLMs Take Retrieved Information with a Grain of Salt?
This paper investigates how large language models adapt to the certainty of retrieved information, identifying systematic limitations in handling uncertainty. It proposes an interaction strategy that reduces obedience errors by 25% without modifying model weights.
Hallucination as Commitment Failure: Larger LLMs Misfire Despite Knowing the Answer
This paper investigates the phenomenon where large language models hallucinate despite having the correct answer available in their generation-time distribution. By introducing a semantic notion of answer availability, the authors show that 16-47% of instruction-tuned model hallucinations occur when the correct concept is already represented, and that this rate increases with scale. They identify that instruction tuning sharpens answer commitment, making helpfulness and confident hallucination two sides of the same coin.
A better method for identifying overconfident large language models
MIT researchers developed a new method for identifying overconfident LLMs by measuring cross-model disagreement across similar models, rather than relying solely on self-consistency metrics. This approach better captures epistemic uncertainty and more accurately identifies unreliable predictions in high-stakes applications.