Tag
The paper proposes FBOS-RL, a feedback-driven bi-objective synergistic reinforcement learning framework that improves training efficiency and performance ceiling over GRPO in LLM alignment and reasoning by using feedback-guided exploration and two mutually reinforcing training objectives: Exploitation-oriented Policy Alignment and Exploration-oriented Capability Cultivation.