Tag
This paper introduces AIPO, a reinforcement learning framework that enhances LLM reasoning by allowing the model to actively consult collaborative agents during exploration to overcome capability boundaries.