hallucination-detection

#hallucination-detection

@AlphaSignalAI: https://x.com/AlphaSignalAI/status/2054617475484938719

X AI KOLs Timeline ↗ · 2026-05-13 Cached

Academic Research Skills is the first installable Claude Code workflow that packages a multi-agent pipeline to detect and prevent hallucinated citations in academic papers, addressing a problem where 146,932 hallucinated citations were counted in 2025 preprints.

0 favorites 0 likes

#hallucination-detection

Max-pooling Network Revisited: Analyzing the Role of Semantic Probability in Multiple Instance Learning for Hallucination Detection

arXiv cs.CL ↗ · 2026-05-12 Cached

This paper analyzes hallucination detection in LLMs, proposing a max-pooling approach that improves efficiency by eliminating costly semantic consistency computations while maintaining competitive performance.

0 favorites 0 likes

#hallucination-detection

Source or It Didn't Happen: A Multi-Agent Framework for Citation Hallucination Detection

arXiv cs.CL ↗ · 2026-05-12 Cached

This paper introduces CiteTracer, a multi-agent framework for detecting citation hallucinations in LLM-generated scientific writing, achieving high accuracy on synthetic and real-world benchmarks.

0 favorites 0 likes

#hallucination-detection

Do Benchmarks Underestimate LLM Performance? Evaluating Hallucination Detection With LLM-First Human-Adjudicated Assessment

arXiv cs.CL ↗ · 2026-05-12 Cached

This paper investigates whether standard benchmarks underestimate LLM performance by re-evaluating hallucination detection datasets using an LLM-first, human-adjudicated assessment method. The study finds that incorporating LLM reasoning into the adjudication process improves agreement and suggests that model-assisted re-evaluation yields more reliable benchmarks for ambiguity-prone tasks.

0 favorites 0 likes

#hallucination-detection

Sanity Checks for Long-Form Hallucination Detection

arXiv cs.CL ↗ · 2026-05-12 Cached

This paper introduces a controlled-invariance methodology and two oracle tests (Force and Remove) to determine if LLM hallucination detectors rely on reasoning traces or final answer artifacts. It proposes TRACT, a lightweight scorer using lexical features, which demonstrates robust performance independent of answer-level cues.

0 favorites 0 likes

#hallucination-detection

Hallucination Detection via Activations of Open-Weight Proxy Analyzers

arXiv cs.CL ↗ · 2026-05-11 Cached

This paper introduces a proxy-analyzer framework that detects hallucinations in large language models by analyzing internal activations of small, open-weight models rather than the generator itself. The method achieves superior performance on benchmarks like RAGTruth compared to existing methods like ReDeEP, demonstrating that model size is less critical than the analysis approach.

0 favorites 0 likes

#hallucination-detection

Hallucination as an Anomaly: Dynamic Intervention via Probabilistic Circuits

arXiv cs.CL ↗ · 2026-05-08 Cached

This paper presents PCNet, a probabilistic circuit trained as a tractable density estimator on LLM residual streams to detect hallucinations as geometric anomalies. It also introduces PC-LDCD, a dynamic correction method that only intervenes on hallucinated tokens, achieving near-perfect detection and reduced corruption rates.

0 favorites 0 likes

#hallucination-detection

The First Token Knows: Single-Decode Confidence for Hallucination Detection

Hugging Face Daily Papers ↗ · 2026-05-06 Cached

This paper introduces a method for detecting hallucinations in large language models by leveraging the confidence of the first generated token, requiring only a single decode step.

0 favorites 0 likes

#hallucination-detection

Mind the Unseen Mass: Unmasking LLM Hallucinations via Soft-Hybrid Alphabet Estimation

arXiv cs.CL ↗ · 2026-04-22 Cached

Researchers introduce SHADE, a hybrid estimator that combines Good-Turing coverage with graph-spectral cues to quantify semantic uncertainty and detect LLM hallucinations when only a few black-box samples are available.

0 favorites 0 likes

#hallucination-detection

HalluSAE: Detecting Hallucinations in Large Language Models via Sparse Auto-Encoders

arXiv cs.CL ↗ · 2026-04-21 Cached

Researchers from Beihang University and other institutions propose HalluSAE, a framework using sparse autoencoders and phase transition theory to detect hallucinations in LLMs by modeling generation as trajectories through a potential energy landscape and identifying critical transition zones where factual errors occur.

0 favorites 0 likes

#hallucination-detection

TPA: Next Token Probability Attribution for Detecting Hallucinations in RAG

arXiv cs.CL ↗ · 2026-04-20 Cached

TPA proposes a novel method for detecting hallucinations in RAG systems by attributing next-token probabilities to seven distinct sources (Query, RAG Context, Past Token, Self Token, FFN, Final LayerNorm, Initial Embedding) and aggregating by Part-of-Speech tags. The approach achieves state-of-the-art performance across five LLMs including Llama2, Llama3, Mistral, and Qwen.

0 favorites 0 likes

#hallucination-detection

Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper introduces FRANQ, a method for detecting hallucinations in Retrieval-Augmented Generation (RAG) systems by applying distinct uncertainty quantification techniques to distinguish between factuality and faithfulness to retrieved context. The authors construct a new dataset annotated for both factuality and faithfulness, and demonstrate that FRANQ outperforms existing approaches in detecting factual errors across multiple datasets and LLMs.

0 favorites 0 likes

#hallucination-detection

RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration

arXiv cs.CL ↗ · 2026-04-20 Cached

RAGognizer introduces a hallucination-aware fine-tuning approach that integrates a lightweight detection head into LLMs for joint optimization of language modeling and hallucination detection in RAG systems. The paper presents RAGognize, a dataset of naturally occurring closed-domain hallucinations with token-level annotations, and demonstrates state-of-the-art hallucination detection while reducing hallucination rates without degrading language quality.

0 favorites 0 likes

#hallucination-detection

Learning Uncertainty from Sequential Internal Dispersion in Large Language Models

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper introduces SIVR (Sequential Internal Variance Representation), a supervised framework for detecting hallucinations in LLMs by analyzing token-wise and layer-wise variance patterns in hidden states without relying on strict architectural assumptions. The method aggregates full sequence variance features to learn temporal patterns of factual errors and demonstrates improved generalization with smaller training sets.

0 favorites 0 likes

#hallucination-detection

Introducing SimpleQA

OpenAI Blog ↗ · 2024-10-30 Cached

OpenAI introduces SimpleQA, a new factuality benchmark dataset with 4,326 short fact-seeking questions designed to evaluate frontier language models on their ability to provide accurate answers without hallucination. The dataset achieves high quality through dual independent annotation, rigorous criteria, and achieves only ~3% estimated error rate, with GPT-4o scoring less than 40%.

0 favorites 0 likes

hallucination-detection

Submit Feedback