Tag
This paper proposes the Hybrid Open-Ended Tri-Evolution (HOTE) framework, which uses hybrid-mode reinforcement learning to evolve a proposer, solver, and judge collaboratively for deep research tasks, achieving state-of-the-art results with an 8B model surpassing larger static models.