conflict-aware-filtering

Tag

Cards List
#conflict-aware-filtering

GD^2PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization

Hugging Face Daily Papers · 6d ago Cached

GD^2PO introduces a conflict-aware filtering mechanism to mitigate multi-reward conflicts in reinforcement learning for large language models, preventing signal cancellation and accelerating training efficiency.

0 favorites 0 likes
← Back to home

Submit Feedback