forward-kl

Tag

Cards List
#forward-kl

Rethinking the Role of Temperature in Large Language Model Distillation

arXiv cs.LG · 4d ago Cached

This paper reexamines the role of temperature in large language model distillation, revealing that temperature asymmetrically benefits forward KL divergence over reverse KL, allowing simple KL methods to match state-of-the-art distillation approaches at higher temperatures.

0 favorites 0 likes
← Back to home

Submit Feedback