population-based-training

#population-based-training

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-Play

arXiv cs.AI ↗ · 2026-05-19 Cached

PopuLoRA introduces a population-based asymmetric self-play framework for RLVR post-training of LLMs, where teacher and student LoRA adapters co-evolve to generate increasingly complex problems, overcoming the self-calibration limitation of single-agent self-play.

0 favorites 0 likes

population-based-training

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-Play

Submit Feedback