CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Hugging Face Daily Papers 05/27/26, 12:00 AM Papers

reasoning contrastive-reflection language-models self-improvement non-parametric efficiency interpretability

Summary

Contrastive Reflection (CORE) is a non-parametric algorithm that generates concise, interpretable insights from comparing successful and unsuccessful reasoning traces, enabling faster and more efficient self-improvement for language models with fewer samples and rollouts than existing methods.

Language models can use verifiable rewards to improve at a wide variety of reasoning tasks. However, both parametric (e.g. RLVR) and non-parametric (e.g. prompt optimization) approaches to doing so typically require hundreds of training samples and thousands of model rollouts, making them expensive in the best case and intractable in the worst. To address this challenge, we introduce Contrastive Reflection (CORE), a non-parametric learning algorithm that compares past reasoning traces to generate insights: short natural-language descriptions of reasoning strategies and constraints that capture differences between successful and unsuccessful problem attempts. Across four reasoning tasks, we demonstrate that CORE enables more rapid improvement than both parametric (GRPO) and non-parametric (GEPA, episodic RAG, and MemRL) methods, while using fewer rollouts. Under fixed rollout budgets with as few as five training samples, we then show that CORE also achieves comparable or greater performance gains than each baseline. Finally, we highlight how CORE is also substantially more context-efficient than non-parametric baselines, requiring fewer prompt tokens while storing learned knowledge as compact, interpretable natural-language insights. Our results therefore suggest that distilling contrasts between successful and unsuccessful reasoning traces into abstract and useful insights can provide a more efficient and interpretable route to model self-improvement than weight updates, prompt optimization, or direct reuse of stored reasoning traces.

Original Article

View Cached Full Text

Cached at: 06/08/26, 07:17 PM

Paper page - CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Source: https://huggingface.co/papers/2605.28742

Abstract

Contrastive Reflection (CORE) improves language model reasoning by analyzing differences between successful and unsuccessful attempts to generate concise, interpretable insights that enable faster and more efficient self-improvement compared to traditional parametric and non-parametric approaches.

Language models can useverifiable rewardsto improve at a wide variety ofreasoning tasks. However, both parametric (e.g. RLVR) and non-parametric (e.g. prompt optimization) approaches to doing so typically require hundreds oftraining samplesand thousands ofmodel rollouts, making them expensive in the best case and intractable in the worst. To address this challenge, we introduceContrastive Reflection(CORE), a non-parametric learning algorithm that compares pastreasoning tracesto generate insights: shortnatural-language descriptionsofreasoning strategiesandconstraintsthat capture differences between successful and unsuccessful problem attempts. Across fourreasoning tasks, we demonstrate that CORE enables more rapid improvement than both parametric (GRPO) and non-parametric (GEPA,episodic RAG, andMemRL) methods, while using fewer rollouts. Under fixedrollout budgetswith as few as fivetraining samples, we then show that CORE also achieves comparable or greater performance gains than each baseline. Finally, we highlight how CORE is also substantially more context-efficient than non-parametric baselines, requiring fewerprompt tokenswhile storing learned knowledge as compact, interpretable natural-language insights. Our results therefore suggest that distilling contrasts between successful and unsuccessfulreasoning tracesinto abstract and useful insights can provide a more efficient and interpretable route to modelself-improvementthan weight updates, prompt optimization, or direct reuse of storedreasoning traces.

View arXiv page View PDF Project page GitHub2 Add to collection

Get this paper in your agent:

hf papers read 2605\.28742

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.28742 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.28742 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.28742 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Paper page - CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

ReflectMT: Internalizing Reflection for Efficient and High-Quality Machine Translation

CORE: Conflict-Oriented Reasoning for General Multimodal Manipulation Detection

Submit Feedback

Similar Articles

Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

ReflectMT: Internalizing Reflection for Efficient and High-Quality Machine Translation

CORE: Conflict-Oriented Reasoning for General Multimodal Manipulation Detection