uncertainty-calibration

Tag

Cards List
#uncertainty-calibration

SAGE: Answer-Conditioned Uncertainty Targets for Verbal Uncertainty Alignment

arXiv cs.CL · 3d ago Cached

SAGE proposes a group-level uncertainty target that constructs an answer-conditioned uncertainty geometry over sampled responses to improve verbal uncertainty alignment in LLMs, and introduces GUPO for training. Experiments across reasoning tasks show improved uncertainty ranking and reduced overconfidence.

0 favorites 0 likes
#uncertainty-calibration

UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning

arXiv cs.LG · 5d ago Cached

UNIQ introduces a conformal calibration method for offline reinforcement learning that adapts conservatism per-state based on uncertainty, improving over IQL on some D4RL benchmarks while maintaining memory efficiency.

0 favorites 0 likes
#uncertainty-calibration

UCCI: Calibrated Uncertainty for Cost-Optimal LLM Cascade Routing

arXiv cs.LG · 2026-05-20 Cached

UCCI proposes a calibration-first router for LLM cascades that uses isotonic regression to map token-level margin uncertainty to error probability, achieving a 31% cost reduction on a production NER workload while maintaining micro-F1=0.91 and reducing expected calibration error from 0.12 to 0.03.

0 favorites 0 likes
#uncertainty-calibration

Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty

arXiv cs.CL · 2026-05-13 Cached

This paper introduces Agent-BRACE, a method that decouples LLM agents into belief state and policy models to handle long-horizon tasks in partially observable environments. By verbalizing state uncertainty, it achieves significant performance improvements over baselines while maintaining constant context window size.

0 favorites 0 likes
#uncertainty-calibration

BitCal-TTS: Bit-Calibrated Test-Time Scaling for Quantized Reasoning Models

arXiv cs.AI · 2026-05-08 Cached

This paper introduces BitCal-TTS, a runtime controller that improves accuracy and reduces premature halting in quantized reasoning models by calibrating confidence signals during test-time scaling.

0 favorites 0 likes
← Back to home

Submit Feedback