internalization

#internalization

ICRL: Learning to Internalize Self-Critique with Reinforcement Learning

arXiv cs.AI ↗ · 2026-05-18 Cached

This paper introduces ICRL, a framework that jointly trains a solver and critic with reinforcement learning to internalize critique guidance, enabling the solver to improve without external critique. It uses distribution calibration and role-wise group advantage estimation, achieving 6-7 point gains over GRPO on agentic and mathematical reasoning tasks.

0 favorites 0 likes

internalization

ICRL: Learning to Internalize Self-Critique with Reinforcement Learning

Submit Feedback