discrete-policy

Tag

Cards List
#discrete-policy

Guidance Contrastive Token Credit Assignment for Discrete Policy Optimization

Hugging Face Daily Papers · 6d ago Cached

This paper introduces Guidance Contrastive Policy Optimization (GCPO), a novel algorithm that enables per-token credit assignment in reinforcement learning by contrasting model predictions under positive and negative prompts, consistently outperforming GRPO and DAPO baselines on text-to-image generation and chain-of-thought reasoning benchmarks.

0 favorites 0 likes
← Back to home

Submit Feedback