Tag
The paper proposes a verifiable label-free reward for training calibrated probabilistic forecasters using reinforcement learning, avoiding the calibration degradation that occurs when rewarding single outcomes. Applied to NFL win probability, a 7B model trained with this reward achieves calibration comparable to the betting market.