The First Token Knows: Single-Decode Confidence for Hallucination Detection

Hugging Face Daily Papers 05/06/26, 12:00 AM Papers

Summary

This paper introduces a method for detecting hallucinations in large language models by leveraging the confidence of the first generated token, requiring only a single decode step.

Self-consistency detects hallucinations by generating multiple sampled answers to a question and measuring agreement, but this requires repeated decoding and can be sensitive to lexical variation. Semantic self-consistency improves this by clustering sampled answers by meaning using natural language inference, but it adds both sampling cost and external inference overhead. We show that first-token confidence, phi_first, computed from the normalized entropy of the top-K logits at the first content-bearing answer token of a single greedy decode, matches or modestly exceeds semantic self-consistency on closed-book short-answer factual question answering. Across three 7-8B instruction-tuned models and two benchmarks, phi_first achieves a mean AUROC of 0.820, compared with 0.793 for semantic agreement and 0.791 for standard surface-form self-consistency. A subsumption test shows that phi_first is moderately to strongly correlated with semantic agreement, and combining the two signals yields only a small AUROC improvement over phi_first alone. These results suggest that much of the uncertainty information captured by multi-sample agreement is already available in the model's initial token distribution. We argue that phi_first should be reported as a default low-cost baseline before invoking sampling-based uncertainty estimation.

Original Article

View Cached Full Text

Cached at: 05/08/26, 07:56 AM

Paper page - The First Token Knows: Single-Decode Confidence for Hallucination Detection

Source: https://huggingface.co/papers/2605.05166 Get this paper in your agent:

hf papers read 2605\.05166

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.05166 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.05166 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.05166 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

The First Token Knows: Single-Decode Confidence for Hallucination Detection

Paper page - The First Token Knows: Single-Decode Confidence for Hallucination Detection

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

HalluSAE: Detecting Hallucinations in Large Language Models via Sparse Auto-Encoders

Hallucination Detection via Activations of Open-Weight Proxy Analyzers

From Architecture to Output: Structural Origins of Hallucination in Large Language Models and the Amplifying Role of Data

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

Mind the Unseen Mass: Unmasking LLM Hallucinations via Soft-Hybrid Alphabet Estimation

Submit Feedback

Similar Articles

HalluSAE: Detecting Hallucinations in Large Language Models via Sparse Auto-Encoders

Hallucination Detection via Activations of Open-Weight Proxy Analyzers

From Architecture to Output: Structural Origins of Hallucination in Large Language Models and the Amplifying Role of Data

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

Mind the Unseen Mass: Unmasking LLM Hallucinations via Soft-Hybrid Alphabet Estimation