teacher-exposure

#teacher-exposure

Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

Hugging Face Daily Papers ↗ · 2026-05-12 Cached

Adaptive Teacher Exposure for Self-Distillation (ATESD) improves LLM reasoning by dynamically adjusting how much of the reference reasoning the teacher shows the student during training, using a learnable policy controller and a discounted learning-progress reward. Experiments on math benchmarks show consistent improvements over existing self-distillation and RL baselines.

0 favorites 0 likes

teacher-exposure

Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

Submit Feedback