confidence-calibration

#confidence-calibration

Early-Token Confidence Predicts Reasoning Quality in Multi-Agent LLM Debate

arXiv cs.CL ↗ · 2026-06-10 Cached

This paper investigates whether early-token confidence signals from LLM decoding can predict reasoning quality in multi-agent debate systems, finding that confidence in the first few generated tokens is the strongest predictor of rubric-based essay scores.

0 favorites 0 likes

#confidence-calibration

The best AI “science critics” are also the most overconfident — a benchmark on calibration vs. skill

Reddit r/artificial ↗ · 2026-06-05

The article introduces the Refute benchmark, which tests LLMs on critiquing science paper summaries and measures their calibration. Results show that the best critic models are often the most overconfident when wrong.

0 favorites 0 likes

#confidence-calibration

A right answer from your agent doesn't mean it did the right thing

Reddit r/AI_Agents ↗ · 2026-06-01

The article discusses the pitfalls of evaluating AI agents solely based on their final answers, emphasizing the importance of inspecting intermediate steps, tool calls, and reasoning to catch confidently wrong outputs. It suggests using automated scoring and trace replays to measure and improve agent behavior.

0 favorites 0 likes

#confidence-calibration

Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R]

Reddit r/MachineLearning ↗ · 2026-05-29

This research presents probe-targeted fine-tuning (LoRA) to make LLMs verbally express their internal confidence, achieving causal control over confidence outputs and demonstrating that models often know when they are right or wrong but fail to articulate it.

0 favorites 0 likes

#confidence-calibration

Confidence Calibration in Large Language Models

arXiv cs.AI ↗ · 2026-05-26 Cached

This paper analyzes the confidence calibration of 11 popular LLMs, finding that they are generally overconfident, especially on hard tasks, and underconfident on easy tasks. It introduces LifeEval, a test for evaluating calibration across difficulty levels.

0 favorites 0 likes

#confidence-calibration

MARGIN: Runtime Confidence Calibration for Multi-Agent Foundation Model Coordination

arXiv cs.LG ↗ · 2026-05-25 Cached

MARGIN is a runtime confidence calibration method for multi-agent foundation model systems that learns per-agent calibration factors online, improving pairwise resolution from below random to 70-89% on hard benchmarks, requiring no held-out data or retraining.

0 favorites 0 likes

#confidence-calibration

Expectation Consistency Loss: Rethink Confidence Calibration under Covariate Shift

arXiv cs.LG ↗ · 2026-05-22 Cached

This paper introduces the Expectation Consistency Loss (ECL), a theoretically grounded loss function for calibrating classifier confidence under covariate shift, derived from a necessary and sufficient condition called the Expectation Consistency Condition.

0 favorites 0 likes

#confidence-calibration

we gave an AI autonomy over real business decisions with real money for eight months. the thing we learned that surprised us most was not about capability.

Reddit r/ArtificialInteligence ↗ · 2026-05-17

After eight months of real-world deployment, PayWithLocus found that the hardest problem for their autonomous AI system is not capability but confidence: the AI executes confidently wrong decisions in novel situations, highlighting a metacognitive gap that current architectures don't address.

0 favorites 0 likes

#confidence-calibration

Confidence-Aware Alignment Makes Reasoning LLMs More Reliable

arXiv cs.AI ↗ · 2026-05-11 Cached

This paper introduces CASPO, a framework for aligning token-level confidence with step-wise logical correctness in large reasoning models using iterative Direct Preference Optimization. It also proposes Confidence-aware Thought (CaT) for dynamically pruning uncertain reasoning branches during inference to improve reliability and efficiency.

0 favorites 0 likes

#confidence-calibration

Domain-level metacognitive monitoring in frontier LLMs: A 33-model atlas

arXiv cs.CL ↗ · 2026-05-11 Cached

This study presents a 33-model atlas analyzing domain-level metacognitive monitoring in frontier LLMs using MMLU benchmarks, revealing significant variations in confidence calibration across different knowledge domains that are obscured by aggregate metrics.

0 favorites 0 likes

#confidence-calibration

The First Token Knows: Single-Decode Confidence for Hallucination Detection

Hugging Face Daily Papers ↗ · 2026-05-06 Cached

This paper introduces a method for detecting hallucinations in large language models by leveraging the confidence of the first generated token, requiring only a single decode step.

0 favorites 0 likes

confidence-calibration

Submit Feedback