population-based-training

Tag

Cards List
#population-based-training

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-Play

arXiv cs.AI · 2026-05-19 Cached

PopuLoRA introduces a population-based asymmetric self-play framework for RLVR post-training of LLMs, where teacher and student LoRA adapters co-evolve to generate increasingly complex problems, overcoming the self-calibration limitation of single-agent self-play.

0 favorites 0 likes
← Back to home

Submit Feedback