Tag
MIT introduces Pedagogical RL, a method that trains a teacher to produce trajectories that are learnable for a student by penalizing surprising steps, improving RL training efficiency.
Describes a training technique involving spike-aware pedagogy rewards that penalize implausible jumps, and surprisal-gated imitation where the student learns easy tokens quickly and hard ones slowly.