Tag
PopuLoRA introduces a population-based asymmetric self-play framework for RLVR post-training of LLMs, where teacher and student LoRA adapters co-evolve to generate increasingly complex problems, overcoming the self-calibration limitation of single-agent self-play.