Tag
MIT researchers propose Pedagogical RL, a new reinforcement learning method that uses a teacher model with privileged information and a spike-aware learnability reward to significantly improve sample efficiency and convergence speed over existing methods like GRPO and OPSD.
Humans are training teacher models to teach student models in a step-by-step manner, penalizing leaps, to improve model intelligence.