Our ICML paper on predictable hallucination (information-budget abstention gate), + ntkMirror: a training-free open-weight implementation we're releasing today

Reddit r/LocalLLaMA Papers

Summary

A paper accepted at ICML 2026 introduces predictable hallucination via an information-budget abstention gate, and releases ntkMirror, a training-free open-weight implementation that reduces hallucination by abstaining when information is insufficient, achieving 0.0–0.7% hallucination at ~24% abstention.

Our paper, *Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication*, was accepted at ICML 2026. Paper: [https://arxiv.org/abs/2509.11208](https://arxiv.org/abs/2509.11208) **The idea:** in evidence-grounded QA, the order you present exchangeable evidence in changes the model's answer probability (permutation dispersion). We treat order as a nuisance variable, derive the Expectation-level Decompression Law (EDFL) relating expected information budget to achievable reliability, and turn it into a fixed ISR=1 answer/abstain gate with no threshold tuning. When information is insufficient, the model abstains instead of guessing. In the paper's pre-specified held-out audit, the gate reaches 0.0–0.7% hallucination at \~24% abstention (80.5% accuracy on attempts), with the ISR=1 boundary fixed by theory rather than tuned. **What we're releasing today (ntkMirror):** a training-free implementation of that gate for local open-weight models. It scores each claim under multiple evidence orderings (order-marginal verifier, exact tied-branch scoring), computes ISR from the per-permutation probabilities, and gates answer/abstain. No fine-tuning, no second model, runs on your own weights offline. We also ship a fused kernel that batches the permutation forwards: bit-identical to the naive loop at fp32, 2.6–10× faster. **New results (not in the paper):** run as a hallucination detector across small local models, AUROC on VitaminC / BoolQ / SciFact: |Model|VitaminC|BoolQ|SciFact| |:-|:-|:-|:-| |Qwen2.5-0.5B|0.78|0.69|0.80| |Qwen2.5-1.5B|0.69|0.78|0.91| |Gemma E4B|0.88|0.84|0.96| |Qwen2.5-7B|0.90|0.87|0.94| Separation scales with model size, strongest on SciFact and the larger models. Used as a gate on balanced data, the grounded fraction of accepted claims rises from 50% to roughly 75–90% depending on model/dataset, at the cost of dropping \~10–20% of valid claims. The kernel doesn't affect accuracy (AUROC gap ≤0.008); it just makes the gate cheap. Please let me know if you find it useful [https://github.com/leochlon/ntkmirror](https://github.com/leochlon/ntkmirror)
Original Article

Similar Articles

Hallucination as an Anomaly: Dynamic Intervention via Probabilistic Circuits

arXiv cs.CL

This paper presents PCNet, a probabilistic circuit trained as a tractable density estimator on LLM residual streams to detect hallucinations as geometric anomalies. It also introduces PC-LDCD, a dynamic correction method that only intervenes on hallucinated tokens, achieving near-perfect detection and reduced corruption rates.

PARALLAX: Separating Genuine Hallucination Detection from Benchmark Construction Artifacts

arXiv cs.CL

This paper reveals that much of the reported progress in LLM hallucination detection is due to benchmark construction artifacts, where ground-truth answers are embedded in prompts, allowing a simple text-similarity baseline to achieve near-perfect scores. Through a large-scale controlled evaluation, the authors show that most methods perform near chance under proper controls, except for supervised probes on upper-layer hidden states such as SAPLMA and their proposed DRIFT.

RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration

arXiv cs.CL

RAGognizer introduces a hallucination-aware fine-tuning approach that integrates a lightweight detection head into LLMs for joint optimization of language modeling and hallucination detection in RAG systems. The paper presents RAGognize, a dataset of naturally occurring closed-domain hallucinations with token-level annotations, and demonstrates state-of-the-art hallucination detection while reducing hallucination rates without degrading language quality.