提示级蒸馏：一种高效推理的非参数化模型微调替代方案

Hugging Face Daily Papers 2026/06/02 00:00 论文

reasoning chain-of-thought fine-tuning prompt-level-distillation teacher-student efficiency interpretability

摘要

提示级蒸馏（PLD）从教师模型中提取推理模式，转化为结构化指令用于学生模型的系统提示，在不增加微调开销的情况下提升推理任务性能。

高级推理通常需要思维链提示，虽然准确，但会带来高昂的延迟和大量测试时推理成本。标准的替代方案是微调小型模型，但这往往牺牲可解释性，同时带来显著的资源和运维开销。为解决这些局限，我们引入了提示级蒸馏（PLD）。我们从教师模型中提取显式推理模式，并将其组织成结构化指令列表，用于学生模型的系统提示。在Gemma-3 4B上评估，PLD将StereoSet的宏F1分数从57%提升至90.0%，Contract-NLI从67%提升至83%，同时将LogiQA准确率提升至70%。在Mistral Small 3.1上获得类似结果，展示了跨架构的泛化能力，使这些紧凑模型能够以可忽略的延迟开销匹配前沿性能。这些富有表现力的指令使决策过程透明化，允许对逻辑进行完全人工验证，使该方法非常适合法律、金融、内容审核等受监管行业，以及高容量用例和边缘设备。

查看原文

查看缓存全文

缓存时间: 2026/06/16 11:31

Paper page - Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Source: https://huggingface.co/papers/2602.21103

摘要

Prompt-Level Distillation 从教师模型中提取推理模式，以增强学生模型性能，同时保持可解释性并降低延迟。

高级推理通常需要 Chain-of-Thought prompting (https://huggingface.co/papers?q=Chain-of-Thought%20prompting)，该方法准确但延迟过高且测试时推理成本巨大。标准的替代方案——fine-tuning (https://huggingface.co/papers?q=fine-tuning) 小模型，往往牺牲可解释性，同时引入大量资源和运营开销。为解决这些限制，我们引入了 Prompt-Level Distillation (https://huggingface.co/papers?q=Prompt-Level%20Distillation) (PLD)。我们从 Teacher model (https://huggingface.co/papers?q=Teacher%20model) 中提取显式推理模式，并将其组织成结构化指令列表，作为 Student model (https://huggingface.co/papers?q=Student%20model) 的 System Prompt (https://huggingface.co/papers?q=System%20Prompt)。使用 Gemma-3 4B 进行评估，PLD 将 StereoSet 上的 Macro F1 scores (https://huggingface.co/papers?q=Macro%20F1%20scores) 从 57% 提升至 90.0%，Contract-NLI 从 67% 提升至 83%，同时将 LogiQA (https://huggingface.co/papers?q=LogiQA) 准确率提升至 70%。在 Mistral Small 3.1 上的类似结果证明了 cross-architecture generalizability (https://huggingface.co/papers?q=cross-architecture%20generalizability)，使这些紧凑型模型能够以极低的延迟开销媲美前沿性能。这些表达性指令使决策过程透明化，允许逻辑的完全人工验证，使其成为法律、金融和内容审核等受监管行业，以及高容量用例和边缘设备的理想选择。

View arXiv page (https://arxiv.org/abs/2602.21103)View PDF (https://arxiv.org/pdf/2602.21103)Add to collection (https://huggingface.co/login?next=%2Fpapers%2F2602.21103)

Get this paper in your agent:

hf papers read 2602\.21103

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2602.21103 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2602.21103 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2602.21103 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollection (https://huggingface.co/new-collection)to link it from this page.

提示级蒸馏：一种高效推理的非参数化模型微调替代方案

Paper page - Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

摘要

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

相似文章

通过混合策略蒸馏进行推理压缩

通过混合层蒸馏和关键信息的逐步注意力改进小模型的推理能力

用于LLM推理的自适应教师暴露自蒸馏方法

授之以渔而非授之以鱼：面向多模态策略优化的特权引导式蒸馏

教师令牌何时可靠？基于位置加权的在线策略自蒸馏方法在推理中的应用

提交意见反馈