Tag
Introduces Magnet, a multi-agent goal-driven narrative engine for long-form story generation with persona-grounded characters, and Atlas, a graph-based pipeline for detecting hallucinations in generated narratives. The framework improves coherence and reduces hallucinations compared to single-model baselines and IBSEN.
This paper introduces a unified benchmark for span-level hallucination detection in RAG systems that extends beyond natural language to code, tool output, and structured documents, and presents a fine-tuned Qwen3.5-2B detector that outperforms existing methods on these new domains while remaining competitive on standard NLP benchmarks.
Proposes CORTEX, a token-level hallucination detection method for RAG that compares LLM internal representations with and without retrieved documents to identify ungrounded spans. It improves fine-grained localization of hallucinations in long-form RAG outputs.
This paper presents a factorised study of probe-based uncertainty estimation in LLMs, showing that raw hidden states and attention features perform well in-domain but structured features are more robust under distribution shift, and provides pretrained probes as off-the-shelf baselines.
GAVEL introduces a new task for verifying, explaining, and localizing errors in image-text pairs, along with a dataset and benchmark. A supervised baseline shows improvements over strong closed-source models.
MedBench v5 is a dynamic, process-oriented benchmark for clinical multimodal models that integrates hallucination detection and stress testing, moving beyond static QA to evaluate reasoning and stability under information-flow stressors.
Proposes HCPD, a zero-source hallucination detection method that uses a human-like criteria probing mechanism to decompose judgments into interpretable criteria, outperforming state-of-the-art baselines.
This paper extends optimal transport-based hallucination detection to all decoder layers in NMT and abstractive summarization, finding that detection is concentrated in early layers and that the geometric signal transfers poorly to summarization due to faithfulness failures not detectable via attention concentration.
Reformulates token-level hallucination detection as a quickest change detection problem, establishing theoretical lower bounds on detection delay and showing that causal recurrent models achieve near-optimal performance, outperforming linear baselines.
A paper accepted at ICML 2026 introduces predictable hallucination via an information-budget abstention gate, and releases ntkMirror, a training-free open-weight implementation that reduces hallucination by abstaining when information is insufficient, achieving 0.0–0.7% hallucination at ~24% abstention.
OpenHalDet is a unified benchmark for hallucination detection in LLMs, standardizing evaluation across diverse generation scenarios and supporting black-box, gray-box, and white-box detection methods.
Proposes Evidence Graph Consistency (EGC), a framework using graph-based structural consistency for hallucination detection in RAG, revealing that effectiveness varies across model families.
This paper demonstrates that Whisper's hallucination failures on silence, noise, or music can be detected and mitigated purely from internal activations using sparse autoencoders, achieving large reductions in hallucination rate without fine-tuning.
This paper introduces CHARM, a framework for detecting and mitigating cascading hallucinations in multi-step agentic RAG pipelines, where early-stage errors propagate and amplify across reasoning steps. CHARM achieves an 89.4% cascade detection rate and 82.1% error propagation reduction across multiple benchmarks with low latency overhead.
KG-Guard is a lightweight graph-based framework for detecting hallucinations in LLM-based knowledge base question answering. It treats the LLM as a black box and uses a graph encoder with a MLP classifier to identify hallucinated answer nodes, outperforming baselines while having far fewer parameters.
FLaG is a lightweight framework for hallucination detection in LLMs that models correctness via latent evidence groups and energy-based routing, achieving SOTA performance across benchmarks.
LLM-FACETS is an open-source evaluation framework designed to help practitioners assess LLM transparency and accountability with a focus on privacy and data flow transparency. It provides a browser interface, plugin architecture, and supports multiple auditing mechanisms including token-level log-probability visualization and RAG Triad metrics.
Introduces HDSR and HDSR-PL, methods that use hallucination detectors to guide iterative self-refinement and preference learning, achieving up to 48% reduction in hallucinations for clinical summarization using Llama and Gemma models on MIMIC-IV-Note.
This paper presents a neuro-symbolic verification architecture for LLM outputs in high-stakes domains, combining formal symbolic methods with neural semantic analysis. Evaluated on a medical device damage assessment system, it achieves over 83% hallucination detection for structured entities and 30% reduction in report creation time.
This paper proposes automatic layer selection for hallucination detection in LLMs and introduces First Effective Peak of Intrinsic Dimension (FEPoID), a training-free criterion that consistently identifies optimal intermediate layers, outperforming existing heuristics.