population-based-training

#population-based-training

PopuLoRA: 用于推理自我博弈的LLM种群协同进化

arXiv cs.AI ↗ · 2026-05-19 缓存

PopuLoRA 提出了一种基于种群的非对称自我博弈框架，用于 LLM 的 RLVR 后训练。在该框架中，教师和学生 LoRA 适配器协同进化，生成日益复杂的问题，从而克服了单智能体自我博弈的自我校准限制。

0 人收藏 0 人点赞