prompt-reweighting

#prompt-reweighting

CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning

arXiv cs.LG ↗ · 2026-05-26 Cached

This paper introduces CurveRL, a principled distribution-aware prompt reweighting approach for reinforcement learning with verifiable rewards (RLVR) that improves LLM reasoning by assigning weights based on the rank and density of pass rates rather than their absolute values, consistently outperforming GRPO and other baselines.

0 favorites 0 likes

prompt-reweighting

CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning

Submit Feedback