Tag
A new paper from Meta, CMU, and other labs presents Self-play SWE-RL, a method where coding agents train themselves by manufacturing and fixing bugs in real codebases, achieving significant gains on SWE-bench benchmarks without relying on human-written tasks.
Tairan He joined OpenAI after completing his PhD at CMU, with a research background primarily focused on robotics.
CMU Advanced NLP lecture clarifies how reinforcement learning optimizes whole-output rewards (correctness, helpfulness, safety) rather than next-token prediction used in pretraining/fine-tuning.