entropy-guidance

#entropy-guidance

Respecting Self-Uncertainty in On-Policy Self-Distillation for Efficient LLM Reasoning

arXiv cs.AI ↗ · 2026-05-14 Cached

The paper proposes EGRSD and CL-EGRSD, on-policy self-distillation methods that weight token-level supervision by teacher entropy to improve reasoning accuracy-length tradeoff in LLMs, evaluated on Qwen3-4B and Qwen3-8B.

0 favorites 0 likes

entropy-guidance

Respecting Self-Uncertainty in On-Policy Self-Distillation for Efficient LLM Reasoning

Submit Feedback