Tag
Introduces pedagogical RL, a paradigm where privileged self-teachers are trained to generate correct and easy-to-follow rollouts, showing it is a relatively easy RL problem.