Large Language Models Are Overconfident in Their Own Responses

Hugging Face Daily Papers Papers

Summary

This paper investigates why instruction-tuned LLMs are overconfident in their own responses, identifying an 'ownership bias' that gives higher confidence to self-generated answers. It proposes a simple inference-time strategy to reframe the model's answer as user input, improving calibration by up to 26% without retraining.

Prior work has shown that instruction-tuned large language models (LLMs) are less well calibrated than their base pre-trained counterparts. However, little is known about the frequently used chat template's effect on the calibration of conversational LLMs. In this work, we investigate the mechanisms driving this miscalibration by decoupling the effects of the post-training algorithm and the chat format. We find that, while instruction tuning fundamentally harms calibration, the chat template aggravates the issue through an "ownership bias" -- models are significantly more confident in their own answers than in identical answers provided by a user. Extensive experiments across six recent open-weight LLMs, three benchmarks, and three confidence elicitation methods show that models assign up to 26% higher confidence to their own responses. Leveraging this insight, we propose a simple inference-time strategy: framing the model's answer as user input during confidence elicitation. This approach significantly reduces overconfidence and improves calibration by up to 26% without the need for retraining, narrowing the gap between base and instruction-tuned models.
Original Article
View Cached Full Text

Cached at: 06/11/26, 01:37 PM

Paper page - Large Language Models Are Overconfident in Their Own Responses

Source: https://huggingface.co/papers/2606.03437

Abstract

Instruction tuning degrades calibration in large language models, with chat templates exacerbating overconfidence through ownership bias, which can be mitigated by reframing model responses as user input during confidence assessment.

Prior work has shown thatinstruction-tuned large language models(LLMs) are less well calibrated than their base pre-trained counterparts. However, little is known about the frequently usedchat template’s effect on thecalibrationof conversational LLMs. In this work, we investigate the mechanisms driving this miscalibrationby decoupling the effects of the post-training algorithm and the chat format. We find that, while instruction tuning fundamentally harmscalibration, thechat templateaggravates the issue through an “ownership bias” -- models are significantly more confident in their own answers than in identical answers provided by a user. Extensive experiments across six recent open-weight LLMs, three benchmarks, and threeconfidence elicitationmethods show that models assign up to 26% higher confidence to their own responses. Leveraging this insight, we propose a simple inference-time strategy: framing the model’s answer as user input duringconfidence elicitation. This approach significantly reducesoverconfidenceand improvescalibrationby up to 26% without the need forretraining, narrowing the gap between base and instruction-tuned models.

View arXiv pageView PDFAdd to collection

Get this paper in your agent:

hf papers read 2606\.03437

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.03437 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.03437 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.03437 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Confidence Calibration in Large Language Models

arXiv cs.AI

This paper analyzes the confidence calibration of 11 popular LLMs, finding that they are generally overconfident, especially on hard tasks, and underconfident on easy tasks. It introduces LifeEval, a test for evaluating calibration across difficulty levels.

Can LLMs Take Retrieved Information with a Grain of Salt?

arXiv cs.CL

This paper investigates how large language models adapt to the certainty of retrieved information, identifying systematic limitations in handling uncertainty. It proposes an interaction strategy that reduces obedience errors by 25% without modifying model weights.

Hallucination as Commitment Failure: Larger LLMs Misfire Despite Knowing the Answer

arXiv cs.CL

This paper investigates the phenomenon where large language models hallucinate despite having the correct answer available in their generation-time distribution. By introducing a semantic notion of answer availability, the authors show that 16-47% of instruction-tuned model hallucinations occur when the correct concept is already represented, and that this rate increases with scale. They identify that instruction tuning sharpens answer commitment, making helpfulness and confident hallucination two sides of the same coin.

A better method for identifying overconfident large language models

MIT News — Artificial Intelligence

MIT researchers developed a new method for identifying overconfident LLMs by measuring cross-model disagreement across similar models, rather than relying solely on self-consistency metrics. This approach better captures epistemic uncertainty and more accurately identifies unreliable predictions in high-stakes applications.