Tag
SAGE proposes a group-level uncertainty target that constructs an answer-conditioned uncertainty geometry over sampled responses to improve verbal uncertainty alignment in LLMs, and introduces GUPO for training. Experiments across reasoning tasks show improved uncertainty ranking and reduced overconfidence.
UNIQ introduces a conformal calibration method for offline reinforcement learning that adapts conservatism per-state based on uncertainty, improving over IQL on some D4RL benchmarks while maintaining memory efficiency.
UCCI proposes a calibration-first router for LLM cascades that uses isotonic regression to map token-level margin uncertainty to error probability, achieving a 31% cost reduction on a production NER workload while maintaining micro-F1=0.91 and reducing expected calibration error from 0.12 to 0.03.
This paper introduces Agent-BRACE, a method that decouples LLM agents into belief state and policy models to handle long-horizon tasks in partially observable environments. By verbalizing state uncertainty, it achieves significant performance improvements over baselines while maintaining constant context window size.
This paper introduces BitCal-TTS, a runtime controller that improves accuracy and reduces premature halting in quantized reasoning models by calibrating confidence signals during test-time scaling.