Tag
Introduces pedagogical RL, a paradigm where privileged self-teachers are trained to generate correct and easy-to-follow rollouts, showing it is a relatively easy RL problem.
The author argues that while the 'bitter lesson' and 'no free lunch' intuitions are misleading in isolation, they provide the correct perspective when combined.