temperature

#temperature

Rethinking the Role of Temperature in Large Language Model Distillation

arXiv cs.LG ↗ · 4d ago Cached

This paper reexamines the role of temperature in large language model distillation, revealing that temperature asymmetrically benefits forward KL divergence over reverse KL, allowing simple KL methods to match state-of-the-art distillation approaches at higher temperatures.

0 favorites 0 likes

#temperature

Consistently Informative Soft-Label Temperature for Knowledge Distillation

arXiv cs.LG ↗ · 2026-05-21 Cached

Proposes CIST, a method that assigns separate sample-wise adaptive temperatures to teacher and student in knowledge distillation, producing consistently informative soft labels and relaxing rigid logit-scale matching. Experiments on vision and language tasks show consistent improvements over standard KD.

0 favorites 0 likes

temperature

Rethinking the Role of Temperature in Large Language Model Distillation

Consistently Informative Soft-Label Temperature for Knowledge Distillation

Submit Feedback