ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

Hugging Face Daily Papers 05/01/26, 12:00 AM Papers

Summary

This paper introduces ResRL, a method to boost LLM reasoning by decoupling semantic distributions between positive and negative responses through negative sample projection. It aims to maintain generation diversity while improving performance on various benchmarks.

Reinforcement Learning with Verifiable Rewards (RLVR) enhances reasoning of Large Language Models (LLMs) but usually exhibits limited generation diversity due to the over-incentivization of positive rewards. Although methods like Negative Sample Reinforcement (NSR) mitigate this issue by upweighting penalty from negative samples, they may suppress the semantic distributions shared between positive and negative responses. To boost reasoning ability without losing diversity, this paper proposes negative sample projection Residual Reinforcement Learning (ResRL) that decouples similar semantic distributions among positive and negative responses. We theoretically link Lazy Likelihood Displacement (LLD) to negative-positive head-gradient interference and derive a single-forward proxy that upper-bounds representation alignment to guide conservative advantage reweighting. ResRL then projects negative-token hidden representations onto an SVD-based low-rank positive subspace and uses projection residuals to modulate negative gradients, improving reasoning while preserving diversity and outperforming strong baselines on average across twelve benchmarks spanning Mathematics, Code, Agent Tasks, and Function Calling. Notably, ResRL surpasses NSR on mathematical reasoning by 9.4\% in Avg@16 and 7.0\% in Pass@128. Code is available at https://github.com/1229095296/ResRL.git.

Original Article

View Cached Full Text

Cached at: 05/08/26, 08:04 AM

Paper page - ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

Source: https://huggingface.co/papers/2605.00380 Published on May 1

Submitted byhttps://huggingface.co/lin1111987

zihanon May 7

Abstract

ResRL improves LLM reasoning by decoupling semantic distributions between positive and negative responses through negative sample projection, maintaining diversity while outperforming existing methods on multiple benchmarks.

Reinforcement Learning with Verifiable Rewards(RLVR) enhances reasoning ofLarge Language Models(LLMs) but usually exhibits limited generation diversity due to the over-incentivization of positive rewards. Although methods like Negative Sample Reinforcement (NSR) mitigate this issue by upweighting penalty from negative samples, they may suppress the semantic distributions shared between positive and negative responses. To boost reasoning ability without losing diversity, this paper proposesnegative sample projection Residual Reinforcement Learning(ResRL) that decouples similar semantic distributions among positive and negative responses. We theoretically linkLazy Likelihood Displacement(LLD) to negative-positive head-gradient interference and derive a single-forward proxy that upper-boundsrepresentation alignmentto guide conservativeadvantage reweighting. ResRL then projects negative-token hidden representations onto anSVD-based low-rank positive subspaceand uses projection residuals to modulate negative gradients, improving reasoning while preserving diversity and outperforming strong baselines on average across twelve benchmarks spanning Mathematics, Code, Agent Tasks, and Function Calling. Notably, ResRL surpasses NSR on mathematical reasoning by 9.4\% in Avg@16 and 7.0\% in Pass@128. Code is available at https://github.com/1229095296/ResRL.git.

View arXiv page View PDF GitHub8 Add to collection

Get this paper in your agent:

hf papers read 2605\.00380

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.00380 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.00380 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.00380 in a Space README.md to link it from this page.

ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

Paper page - ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper1

Similar Articles

ExpRL: Exploratory RL for LLM Mid-Training

Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning

Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs

Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR

Learning to Refine Hidden States for Reliable LLM Reasoning

Submit Feedback

Similar Articles

ExpRL: Exploratory RL for LLM Mid-Training

Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning

Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs

Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR

Learning to Refine Hidden States for Reliable LLM Reasoning