grouped-policy-optimization

Tag

Cards List
#grouped-policy-optimization

GAGPO: Generalized Advantage Grouped Policy Optimization

arXiv cs.AI · yesterday Cached

GAGPO proposes a critic-free RL method that uses a non-parametric grouped value proxy for step-level credit assignment in multi-turn agentic tasks, outperforming strong baselines on ALFWorld and WebShop.

0 favorites 0 likes
← Back to home

Submit Feedback