group-dynamic-optimization

Tag

Cards List
#group-dynamic-optimization

GD^2PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization

Hugging Face Daily Papers · 2d ago Cached

GD^2PO introduces a conflict-aware filtering mechanism to mitigate multi-reward conflicts in reinforcement learning for large language models, preventing signal cancellation and accelerating training efficiency.

0 favorites 0 likes
← Back to home

Submit Feedback