reflection-gap

#reflection-gap

Closing the Reflection Gap: A Free Calibration Bonus for Agentic RL

arXiv cs.AI ↗ · 2026-06-15 Cached

LLM agents often mis-assess their own performance after observing environment feedback, a problem called the reflection gap. RefGRPO addresses this by augmenting RL with a free calibration bonus and dynamic scheduling, reducing underconfidence from 44.4% to 7.7% and improving task accuracy on text-to-SQL benchmarks.

0 favorites 0 likes

reflection-gap

Closing the Reflection Gap: A Free Calibration Bonus for Agentic RL

Submit Feedback