uncertainty

#uncertainty

Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

Hugging Face Daily Papers ↗ · 2026-05-28 Cached

This paper identifies harmful continuations in answer-correct long chain-of-thought training traces for LLM SFT, characterized by uncertainty-geometry mismatches, and proposes a lightweight boundary proxy method to remove them.

0 favorites 0 likes

#uncertainty

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

Hugging Face Daily Papers ↗ · 2026-05-28 Cached

The paper introduces SpatialUncertain, a benchmark to evaluate whether vision-language models recognize when they cannot answer spatial questions due to occlusion or perspective ambiguity, revealing overconfidence and poor abstention behavior.

0 favorites 0 likes

#uncertainty

Managing Uncertainty in LLM-Generated Procedural Knowledge for Virtual Laboratory Planning

arXiv cs.AI ↗ · 2026-05-27 Cached

This paper presents a prototype framework for managing uncertainty in LLM-generated procedural knowledge for virtual laboratory planning, using structured domain representations to repair uncertain procedural steps.

0 favorites 0 likes

#uncertainty

Scientists trained an AI model using an IBM quantum computer — and it answered questions correctly that the base model couldn't

Reddit r/artificial ↗ · 2026-05-26 Cached

Researchers used an IBM quantum computer to reduce uncertainty in an AI model, achieving the first demonstration of quantum enhancement in a pretrained large language model, allowing it to answer questions correctly where the base model failed.

0 favorites 0 likes

#uncertainty

@rohanpaul_ai: New Google paper says LLMs should stop pretending certainty and instead clearly show when they are unsure. Hallucinatio…

X AI KOLs Following ↗ · 2026-05-25 Cached

A new Google paper argues that LLMs should focus on expressing uncertainty honestly rather than aiming for perfect factuality, proposing 'faithful uncertainty' to build trust.

0 favorites 0 likes

#uncertainty

$ECUAS_n$: A family of metrics for principled evaluation of uncertainty-augmented systems

arXiv cs.AI ↗ · 2026-05-22 Cached

This paper proposes a family of metrics called ECUAS_n for principled evaluation of uncertainty-augmented systems that output both predictions and uncertainty scores. The authors argue that existing evaluation approaches are inadequate and formulate these metrics as proper scoring rules for decision-making under uncertainty.

0 favorites 0 likes

#uncertainty

Precision Tracked Transformer via Kalman Filtering, Kriging and Process Noise

arXiv cs.LG ↗ · 2026-05-20

The paper introduces the Bayesian Filtering Transformer (BFT), which incorporates uncertainty into Transformers via precision-weighted attention and Kalman update residuals, improving performance on sequential recommendation and noisy LLM fine-tuning.

0 favorites 0 likes

#uncertainty

Not all uncertainty is alike: volatility, stochasticity, and exploration

arXiv cs.AI ↗ · 2026-05-20 Cached

This paper demonstrates that volatility and stochasticity, both sources of uncertainty, drive optimal exploration in opposite directions: volatility increases exploration while stochasticity suppresses it. The authors extend the Gittins index framework to Gaussian state-space bandits and introduce CAUSE, a closed-form exploration bonus that outperforms standard strategies.

0 favorites 0 likes

#uncertainty

AI in medicine will fail on calibration long before it fails on eloquence.

Reddit r/artificial ↗ · 2026-05-18

The article argues that AI in medicine may fail due to poor calibration and inability to express uncertainty, rather than lack of eloquence, and calls for features that build trust.

0 favorites 0 likes

#uncertainty

When Evidence Conflicts: Uncertainty and Order Effects in Retrieval-Augmented Biomedical Question Answering

arXiv cs.CL ↗ · 2026-05-15 Cached

This paper evaluates six open-weight LLMs on biomedical QA under conflicting evidence conditions, revealing accuracy drops and prediction flips, and proposes a conflict-aware abstention score that improves selective accuracy.

0 favorites 0 likes

#uncertainty

@dotey: https://x.com/dotey/status/2055097242755706984

X AI KOLs Timeline ↗ · 2026-05-15 Cached

Senior developers often fail to communicate effectively with business teams because they overemphasize code complexity, while business teams truly care about eliminating uncertainty. The article suggests developers use "Can we try a faster approach?" to align both sides, and points out that although AI can write code quickly, humans still take responsibility.

0 favorites 0 likes

#uncertainty

TwinTrack: Post-hoc Multi-Rater Calibration for Medical Image Segmentation

Hugging Face Daily Papers ↗ · 2026-04-17 Cached

TwinTrack is a post-hoc calibration framework for pancreatic cancer segmentation that aligns ensemble model probabilities with the empirical mean human response across multiple annotators, improving interpretability and calibration metrics on multi-rater benchmarks.

0 favorites 0 likes

#uncertainty

Why language models hallucinate

OpenAI Blog ↗ · 2025-09-05 Cached

OpenAI publishes research explaining that language models hallucinate because standard training and evaluation procedures reward guessing over acknowledging uncertainty, and proposes that evaluation metrics should prioritize honesty about limitations over raw accuracy.

0 favorites 0 likes

uncertainty

Submit Feedback