Tag
Noah Ziems expresses excitement about their recent work in Pedagogical RL, which aims to transform data collection for complex agentic tasks like coding.
Introduces pedagogical RL, a paradigm where privileged self-teachers are trained to generate correct and easy-to-follow rollouts, showing it is a relatively easy RL problem.