Tag
The article details three common failure modes for legal AI systems in production: treating all sources as equally credible, failing to handle conflicting legal opinions, and lacking firm-specific institutional knowledge. It suggests solutions such as authority weighting, disagreement detection, and annotation layers to build trust and utility.
A Maine attorney faces sanctions, including mandatory training, for relying on AI in a court filing which resulted in citation errors and mischaracterizations of case law.
The article highlights practical system-level failures in AI agent workflows, such as context bleed and hallucinated details, arguing that these are often infrastructure issues rather than model defects.
This paper presents a unified geometric framework for understanding transformer memory failures, distinguishing between conflict arbitration and hallucination through hidden-state attractor basins. It demonstrates that geometric margin is a superior diagnostic for detecting these failures compared to output entropy, particularly as model scale increases.
This paper proposed Distribution-Aligned Adversarial Distillation (DisAAD), a method that uses a lightweight proxy model to estimate uncertainty in black-box LLMs with only 1% of the original model size, achieving reliable quantification without requiring internal parameters or multiple sampling.
This paper identifies and formalizes 'recorruption' in multimodal RAG, where adding accurate context causes models to abandon correct predictions due to attentional collapse (visual blindness and positional bias). The authors propose BAIR, a parameter-free inference-time framework that restores visual saliency and penalizes textual distractors, improving reliability across medical, fairness, and geospatial benchmarks.
A user discusses frustrations with the reliability and consistency of free AI models when used as educational tutors, questioning whether paid versions offer significantly better performance for learning technical concepts.
GPT-5.5 sets new state-of-the-art in benchmarks but struggles with hallucination; Kimi K2.6 leads open LLMs; also discusses AI's strain on climate pledges and strategic thinking in LLMs.
Rutgers researchers trace citation hallucination in LLMs to sparse field-specific neurons, showing causal intervention can suppress fake references.
A novice asks for recommendations on small language models and prompting strategies to build an employee note summarization engine under 2000 tokens, after experiencing hallucinations with Qwen2.5-7B-Instruct.
Researchers propose PRISM, a diagnostic benchmark that breaks down LLM hallucinations into four dimensions (knowledge missing/errors, reasoning errors, instruction-following errors) across three generation stages (memory, instruction, reasoning), evaluating 24 LLMs to reveal trade-offs in mitigation strategies.
This paper investigates prompt-induced hallucinations in vision-language models through mechanistic analysis, identifying specific attention heads responsible for the models' tendency to favor textual prompts over visual evidence. The authors demonstrate that ablating these PIH-heads reduces hallucinations by at least 40% without additional training, revealing model-specific mechanisms underlying this failure mode.
This paper presents causal evidence that hallucination in autoregressive language models results from early trajectory commitment governed by asymmetric attractor dynamics, using same-prompt bifurcation and activation patching experiments on Qwen2.5-1.5B to show that hallucinated trajectories diverge at the first token and exhibit strong causal asymmetry across model layers.
A user documented a sequence in which Gemini detected a real $280M KelpDAO/AAVE crypto exploit mid-conversation, retracted it as a hallucination under user skepticism, then reconfirmed it once mainstream coverage caught up — illustrating how AI anti-hallucination overcorrection can cause models to retract accurate information.
Jerry Liu discusses challenges with using Vision Language Models for PDF parsing, particularly around ensuring text correctness and maintaining proper reading order while avoiding hallucinations.
DeepMind introduces FACTS Grounding, a comprehensive benchmark with 1,719 examples for evaluating how accurately large language models ground their responses in source material and avoid hallucinations. The benchmark includes a public dataset and an online Kaggle leaderboard tracking LLM performance on factual accuracy and grounding tasks.
TruthfulQA is a benchmark of 817 questions across 38 categories designed to measure whether language models generate truthful answers. The study found that the best model achieved only 58% truthfulness compared to 94% for humans, and larger models were generally less truthful—suggesting scaling alone is insufficient for improving truthfulness.