Tag
An educational overview of knowledge distillation, covering its history, core concepts like softmax and temperature, types, scaling laws, and practical examples including DeepSeek-R1.
Sign-Gated On-Policy Distillation (SG-OPD) enhances standard on-policy distillation by using a binary verifier as a trust signal for teacher supervision, improving performance on competition-level math reasoning benchmarks.
Z-Reward is a teacher-student framework that decouples complex reasoning from efficient reward deployment for text-to-image training. It achieves 89.6% human preference accuracy with a 27B teacher and 88.6% with a 9B student, outperforming prior methods.
Prompt-Level Distillation (PLD) extracts reasoning patterns from teacher models into structured instructions for student model system prompts, improving performance on reasoning tasks without fine-tuning overhead.
This paper proposes a principled offline reasoning distillation framework that corrects teacher-student distribution drift, improving reasoning accuracy on math benchmarks without requiring online rollouts.
This paper introduces TESSY, a teacher-student cooperative framework for fine-tuning reasoning models that generates on-policy SFT data by decoupling generation into capability tokens (from teacher) and style tokens (from student), addressing catastrophic forgetting issues when using off-policy teacher data.
OpenAI proposes Teacher–Student Curriculum Learning (TSCL), a framework where a Teacher algorithm automatically selects subtasks for a Student to learn complex tasks, optimizing based on learning curve slope and preventing forgetting. The approach matches or surpasses hand-crafted curricula on decimal addition and Minecraft navigation tasks, enabling solutions previously impossible with direct training.