Tag
GD^2PO introduces a conflict-aware filtering mechanism to mitigate multi-reward conflicts in reinforcement learning for large language models, preventing signal cancellation and accelerating training efficiency.