capability-cultivation

#capability-cultivation

FBOS-RL: Feedback-Driven Bi-Objective Synergistic Reinforcement Learning

arXiv cs.LG ↗ · 2026-05-21 Cached

The paper proposes FBOS-RL, a feedback-driven bi-objective synergistic reinforcement learning framework that improves training efficiency and performance ceiling over GRPO in LLM alignment and reasoning by using feedback-guided exploration and two mutually reinforcing training objectives: Exploitation-oriented Policy Alignment and Exploration-oriented Capability Cultivation.

0 favorites 0 likes

capability-cultivation

FBOS-RL: Feedback-Driven Bi-Objective Synergistic Reinforcement Learning

Submit Feedback