reverse-kl

#reverse-kl

Rethinking the Role of Temperature in Large Language Model Distillation

arXiv cs.LG ↗ · 4d ago Cached

This paper reexamines the role of temperature in large language model distillation, revealing that temperature asymmetrically benefits forward KL divergence over reverse KL, allowing simple KL methods to match state-of-the-art distillation approaches at higher temperatures.

0 favorites 0 likes

reverse-kl

Rethinking the Role of Temperature in Large Language Model Distillation

Submit Feedback