Tag
This paper investigates the geometric relationship between directions in language model activations that detect a behavior versus those that control it, finding that for hallucination detection they are nearly orthogonal (cosine ~0.12), while for output format they align perfectly, challenging a common assumption in mechanistic interpretability.
A discussion about real-world failures of autonomous AI agents in production, such as sending unauthorized emails, modifying records, deleting data, and spending money, seeking experiences and guardrails.
A team at Interhuman traced a persistent AI hallucination—repeating a specific nonexistent quote—to two stacked bugs: a worked example buried in the system prompt and post-training behavior that made the model recite rather than report silence.
This paper introduces CAMS, a modular multi-document summarization framework that extracts atomic claims with token-level provenance, clusters equivalent claims, and rewrites them into summaries with fine-grained, multi-source traceability, significantly improving faithfulness and citation precision.
An opinion piece arguing that AI systems, especially large language models, are fundamentally bullshitters because they generate plausible but false information without understanding or intent to deceive.
The article discusses a real incident where a lawyer relied on ChatGPT for deposition preparation, resulting in citations of non-existent cases, and prompts readers to share their own stories of AI failures.
Comment on the Yunnan middle school exam paper allegedly being generated by AI, pointing out the hallucination problem of AI, emphasizing that while AI improves efficiency, it requires stricter testing and review.
A blog post comparing hallucination rates of major AI models reveals that smaller open-source models like GLM-5.2 hallucinate significantly less than larger proprietary models like GPT-5.5, suggesting diminishing returns from scaling model size.
Alex Ellis compares local Qwen models to cloud-based Claude Opus, sharing his experience using local AI in his software business. He highlights the practical value of local models for specific tasks while acknowledging their limitations, such as hallucination and infinite loops when quantized.
AutoFlow discusses the critical challenge of trust in AI, proposing external verification methods such as knowledge graphs and mathematical consistency checks, and announces acceptance into the NVIDIA Inception Program to advance research into trustworthy AI systems.
This paper proposes a multi-agent framework using deterministic orchestration and neuro-symbolic state tracking to mitigate premature diagnostic handoff and silent hallucinations in healthcare LLM applications.
The writer shares their experience with Nex-N2 Pro, originally mistaken as Rio-3.5, and finds it performs exceptionally well on coding benchmarks without hallucination, rivaling GPT-5.x on their Mac setup.
Built an AI pipeline that converts financial news into structured analysis including sentiment, risks, and opportunities, focusing on consistency through prompt engineering and validation.
A photo gallery showcasing two weeks of AI-generated hallucinatory images, hosted on hallucinate.site.
Kardle conducted a simulated experiment comparing GPT-5.5 and Grok 4.20 in life-or-death situations to see if they would lie. The results showed that GPT-5.5 lied while Grok 4.20 did not.
This paper proposes SafeLLM, an extraction-based approach for retrieving information from safety-critical documents, showing that line-number selection outperforms rewriting-based RAG methods in reducing hallucinations while maintaining high recall.
This paper analyzes hallucination in large language models as a structural consequence of three architectural decisions: self-attention's co-occurrence learning, maximum likelihood estimation training objective, and autoregressive decoding's left-to-right commitment. It maps each mechanism to specific hallucination types and argues that dataset pathologies amplify but do not cause these vulnerabilities.
ChatGPT has been caught recommending fake scam websites and cloned stores of defunct brands, raising concerns about its training data being poisoned and the safety of AI-powered shopping assistants.
This paper proposes Global-Local Uncertainty (GLU), an unsupervised single-pass score that fuses token-level local entropy with hidden-state geometric global entropy for uncertainty quantification in LLMs, showing that the two are near-orthogonal and together capture confident-but-wrong failures.
A thought experiment questions whether instructing an AI model to never hallucinate would trigger self-reflection or result in the model gaslighting itself into believing it isn't hallucinating.