calibration

#calibration

Log-Likelihood, Simpson's Paradox, and the Detection of Machine-Generated Text

arXiv cs.CL ↗ · yesterday Cached

This paper addresses the degradation of likelihood-based machine-generated text detectors by identifying a Simpson's paradox in token-score aggregation. It proposes a learned local calibration step that significantly improves detection performance across various models and datasets.

0 favorites 0 likes

#calibration

Teaching AI models to say “I’m not sure”

MIT News — Artificial Intelligence ↗ · 2026-04-22 Cached

MIT CSAIL researchers introduce RLCR, a method using Brier scores in reinforcement learning to train AI models to output calibrated confidence estimates, significantly reducing overconfidence without sacrificing accuracy.

0 favorites 0 likes

#calibration

The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation

Hugging Face Daily Papers ↗ · 2026-04-18 Cached

This paper identifies that on-policy distillation (OPD) in language models leads to severe overconfidence due to information mismatch between training and deployment, and proposes CaOPD, a calibration-aware framework that improves both performance and confidence reliability.

0 favorites 0 likes

#calibration

TwinTrack: Post-hoc Multi-Rater Calibration for Medical Image Segmentation

Hugging Face Daily Papers ↗ · 2026-04-17 Cached

TwinTrack is a post-hoc calibration framework for pancreatic cancer segmentation that aligns ensemble model probabilities with the empirical mean human response across multiple annotators, improving interpretability and calibration metrics on multi-rater benchmarks.

0 favorites 0 likes

#calibration

Teaching models to express their uncertainty in words

OpenAI Blog ↗ · 2022-05-28 Cached

OpenAI researchers demonstrate that GPT-3 can learn to express calibrated uncertainty about its answers in natural language without using model logits, introducing the CalibratedMath benchmark suite to evaluate this capability. The approach shows robust generalization under distribution shift and represents the first evidence of models expressing well-calibrated verbal uncertainty about their own predictions.

0 favorites 0 likes

calibration

Log-Likelihood, Simpson's Paradox, and the Detection of Machine-Generated Text

Teaching AI models to say “I’m not sure”

The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation

TwinTrack: Post-hoc Multi-Rater Calibration for Medical Image Segmentation

Teaching models to express their uncertainty in words

Submit Feedback