Tag
An opinion piece arguing that AI systems, especially large language models, are fundamentally bullshitters because they generate plausible but false information without understanding or intent to deceive.
The article discusses a real incident where a lawyer relied on ChatGPT for deposition preparation, resulting in citations of non-existent cases, and prompts readers to share their own stories of AI failures.
Comment on the Yunnan middle school exam paper allegedly being generated by AI, pointing out the hallucination problem of AI, emphasizing that while AI improves efficiency, it requires stricter testing and review.
A blog post comparing hallucination rates of major AI models reveals that smaller open-source models like GLM-5.2 hallucinate significantly less than larger proprietary models like GPT-5.5, suggesting diminishing returns from scaling model size.
Alex Ellis compares local Qwen models to cloud-based Claude Opus, sharing his experience using local AI in his software business. He highlights the practical value of local models for specific tasks while acknowledging their limitations, such as hallucination and infinite loops when quantized.
AutoFlow discusses the critical challenge of trust in AI, proposing external verification methods such as knowledge graphs and mathematical consistency checks, and announces acceptance into the NVIDIA Inception Program to advance research into trustworthy AI systems.
This paper proposes a multi-agent framework using deterministic orchestration and neuro-symbolic state tracking to mitigate premature diagnostic handoff and silent hallucinations in healthcare LLM applications.
The writer shares their experience with Nex-N2 Pro, originally mistaken as Rio-3.5, and finds it performs exceptionally well on coding benchmarks without hallucination, rivaling GPT-5.x on their Mac setup.
Built an AI pipeline that converts financial news into structured analysis including sentiment, risks, and opportunities, focusing on consistency through prompt engineering and validation.
A photo gallery showcasing two weeks of AI-generated hallucinatory images, hosted on hallucinate.site.
Kardle conducted a simulated experiment comparing GPT-5.5 and Grok 4.20 in life-or-death situations to see if they would lie. The results showed that GPT-5.5 lied while Grok 4.20 did not.
This paper proposes SafeLLM, an extraction-based approach for retrieving information from safety-critical documents, showing that line-number selection outperforms rewriting-based RAG methods in reducing hallucinations while maintaining high recall.
This paper analyzes hallucination in large language models as a structural consequence of three architectural decisions: self-attention's co-occurrence learning, maximum likelihood estimation training objective, and autoregressive decoding's left-to-right commitment. It maps each mechanism to specific hallucination types and argues that dataset pathologies amplify but do not cause these vulnerabilities.
ChatGPT has been caught recommending fake scam websites and cloned stores of defunct brands, raising concerns about its training data being poisoned and the safety of AI-powered shopping assistants.
This paper proposes Global-Local Uncertainty (GLU), an unsupervised single-pass score that fuses token-level local entropy with hidden-state geometric global entropy for uncertainty quantification in LLMs, showing that the two are near-orthogonal and together capture confident-but-wrong failures.
A thought experiment questions whether instructing an AI model to never hallucinate would trigger self-reflection or result in the model gaslighting itself into believing it isn't hallucinating.
Discusses a common failure mode in AI agents where the model confidently claims to have performed an action (e.g., sending an email) without actually executing the required tool call, and asks the community how they detect and handle such silent failures in production.
An author created a new fictional identity with zero web presence and found that AI models cited it correctly within 6 days despite a firewall blocking all AI crawlers from the website, revealing that AIs stitch together information from Knowledge Graphs and third-party mentions rather than direct crawling.
An opinion piece argues that current AI research tools like Perplexity and Gemini are flawed due to hallucinations, and advocates for using AI with a curated siloed knowledge base of credible books to ensure grounded truth and prevent distorted worldviews from harming future generations.
A practitioner discusses the calibration vs. utility tradeoff in LLM agents, sharing experience with a verifier-based pipeline that reduces hallucinated tool calls by ~60% but introduces latency costs and drops easy correct answers.