Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

Hugging Face Daily Papers 05/04/26, 12:00 AM Papers

reasoning distillation multi-teacher chain-of-thought step-wise decoding

Summary

CoRD is a collaborative multi-teacher decoding framework that synthesizes reasoning trajectories through predictive perplexity scoring and beam search, enabling efficient distillation of large reasoning models with high-quality outputs and generalized performance.

Distilling large reasoning models is essential for making Long-CoT reasoning practical, as full-scale inference remains computationally prohibitive. Existing curation-based approaches select complete reasoning traces post-hoc, overlooking collaboration among heterogeneous teachers and lacking dynamic exploration, which leads to redundant sampling and missed complementary reasoning. We introduce CoRD, a collaborative multi-teacher decoding framework that performs step-wise reasoning synthesis guided by predictive perplexity-based scoring and beam search. This enables heterogeneous LRMs to jointly construct coherent reasoning trajectories while efficiently preserving diverse, high-potential hypotheses. Experiments show that CoRD produces higher-quality reasoning data and achieves near teacher-level student performance with fewer, structured supervision signals, without substantial efficiency overhead. CoRD further generalizes well to out-of-domain and open-ended settings. The dataset and model are available at https://github.com/DISL-Lab/CoRD{https://github.com/DISL-Lab/CoRD}.

Original Article

View Cached Full Text

Cached at: 05/18/26, 10:25 AM

Paper page - Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

Source: https://huggingface.co/papers/2605.02290

Abstract

Distilling large reasoning modelsis essential for makingLong-CoT reasoningpractical, as full-scale inference remains computationally prohibitive. Existing curation-based approaches select complete reasoning traces post-hoc, overlooking collaboration amongheterogeneous teachersand lacking dynamic exploration, which leads to redundant sampling and missed complementary reasoning. We introduce CoRD, acollaborative multi-teacher decodingframework that performs step-wise reasoning synthesis guided bypredictive perplexity-based scoringandbeam search. This enables heterogeneous LRMs to jointly construct coherentreasoning trajectorieswhile efficiently preserving diverse, high-potential hypotheses. Experiments show that CoRD produces higher-quality reasoning data and achieves near teacher-level student performance with fewer,structured supervision signals, without substantial efficiency overhead. CoRD further generalizes well to out-of-domain and open-ended settings. The dataset and model are available at https://github.com/DISL-Lab/CoRD{https://github.com/DISL-Lab/CoRD}.

View arXiv page View PDF GitHub1 Add to collection

Get this paper in your agent:

hf papers read 2605\.02290

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.02290 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.02290 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.02290 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

Paper page - Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

Improving Reasoning Capabilities in Small Models through Mixture-of-Layers Distillation with Stepwise Attention on Key Information

CoRA: Confidence-Rationale Alignment for Reliable Chain-of-Thought Reasoning

LoRi: Low-Rank Distillation for Implicit Reasoning

OpenCoF: Learning to Reason Through Video Generation

Submit Feedback

Similar Articles

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

Improving Reasoning Capabilities in Small Models through Mixture-of-Layers Distillation with Stepwise Attention on Key Information

CoRA: Confidence-Rationale Alignment for Reliable Chain-of-Thought Reasoning

LoRi: Low-Rank Distillation for Implicit Reasoning

OpenCoF: Learning to Reason Through Video Generation