teacher-exposure

Tag

Cards List
#teacher-exposure

Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

Hugging Face Daily Papers · 2026-05-12 Cached

Adaptive Teacher Exposure for Self-Distillation (ATESD) improves LLM reasoning by dynamically adjusting how much of the reference reasoning the teacher shows the student during training, using a learnable policy controller and a discounted learning-progress reward. Experiments on math benchmarks show consistent improvements over existing self-distillation and RL baselines.

0 favorites 0 likes
← Back to home

Submit Feedback