Quickest Detection of Hallucination Onset: Delay Bounds and Learned CUSUM Statistics
Summary
Reformulates token-level hallucination detection as a quickest change detection problem, establishing theoretical lower bounds on detection delay and showing that causal recurrent models achieve near-optimal performance, outperforming linear baselines.
View Cached Full Text
Cached at: 06/15/26, 04:59 PM
Paper page - Quickest Detection of Hallucination Onset: Delay Bounds and Learned CUSUM Statistics
Source: https://huggingface.co/papers/2606.12476
Abstract
Token-level hallucination detection is reformulated as a quickest change detection problem, revealing fundamental limits on detection delay and demonstrating superior performance through causal recurrent modeling.
Token-level hallucination detectors are evaluated as classifiers, by AUC over all tokens, yet a streaming monitor is judged by its reaction time: the number of tokens that pass between the onset of a hallucination and the alarm. We formulatehallucination onset detectionas aquickest change detectionproblem. A first-orderMarkov modelof the latent faithful/hallucinated state, validated on RAGTruth, places the task inside classicalchange-point theoryand yieldsLorden’s lower boundondetection delay: about 1.3 tokens at afalse-alarm rateof 0.01. We then show that a causalrecurrent labeleracts as aCUSUMwith a learned increment; at a matchedfalse-alarm rateit detects in 11-13 tokens, against 31 for a linear per-token baseline, and a controlled decomposition attributes most of this advantage to a better per-token score rather than to temporal accumulation. Aninformation-rate optimalitytheorem ofDonsker-Varadhan typeexplains the remaining order-of-magnitude gap: the learned score realizes only 1/4.5 of the divergence the features carry, a deficit that recalibration cannot remove, with the remainder a finite-horizon effect. Classification metrics conceal this delay structure; sequential analysis makes it measurable
View arXiv pageView PDFGitHub0Add to collection
Get this paper in your agent:
hf papers read 2606\.12476
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.12476 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.12476 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.12476 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
PARALLAX: Separating Genuine Hallucination Detection from Benchmark Construction Artifacts
This paper reveals that much of the reported progress in LLM hallucination detection is due to benchmark construction artifacts, where ground-truth answers are embedded in prompts, allowing a simple text-similarity baseline to achieve near-perfect scores. Through a large-scale controlled evaluation, the authors show that most methods perform near chance under proper controls, except for supervised probes on upper-layer hidden states such as SAPLMA and their proposed DRIFT.
Hallucination as an Anomaly: Dynamic Intervention via Probabilistic Circuits
This paper presents PCNet, a probabilistic circuit trained as a tractable density estimator on LLM residual streams to detect hallucinations as geometric anomalies. It also introduces PC-LDCD, a dynamic correction method that only intervenes on hallucinated tokens, achieving near-perfect detection and reduced corruption rates.
Max-pooling Network Revisited: Analyzing the Role of Semantic Probability in Multiple Instance Learning for Hallucination Detection
This paper analyzes hallucination detection in LLMs, proposing a max-pooling approach that improves efficiency by eliminating costly semantic consistency computations while maintaining competitive performance.
Automatic Layer Selection for Hallucination Detection
This paper proposes automatic layer selection for hallucination detection in LLMs and introduces First Effective Peak of Intrinsic Dimension (FEPoID), a training-free criterion that consistently identifies optimal intermediate layers, outperforming existing heuristics.
Zero-source LLM Hallucination Detection with Human-like Criteria Probing
Proposes HCPD, a zero-source hallucination detection method that uses a human-like criteria probing mechanism to decompose judgments into interpretable criteria, outperforming state-of-the-art baselines.