group-relative-policy-optimization

#group-relative-policy-optimization

Beyond Scalar Distances: Semantic Attribute Gradients from Frozen MLLMs for Visual Embeddings

Hugging Face Daily Papers ↗ · 2026-06-13 Cached

SAGA framework uses frozen multimodal large language models to provide attribute-aware supervision for vision encoders via Group Relative Policy Optimization, improving zero-shot image retrieval by 3–6 points on fine-grained benchmarks.

0 favorites 0 likes

#group-relative-policy-optimization

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

Hugging Face Daily Papers ↗ · 2026-06-02 Cached

The S2L-PO framework uses smaller models as natural explorers to enhance policy diversity in GRPO for training large language models. It achieves faster convergence and improves accuracy on mathematical reasoning benchmarks while reducing rollout compute.

0 favorites 0 likes

group-relative-policy-optimization

Beyond Scalar Distances: Semantic Attribute Gradients from Frozen MLLMs for Visual Embeddings

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

Submit Feedback