entropy-guidance

Tag

Cards List
#entropy-guidance

Respecting Self-Uncertainty in On-Policy Self-Distillation for Efficient LLM Reasoning

arXiv cs.AI · 2026-05-14 Cached

The paper proposes EGRSD and CL-EGRSD, on-policy self-distillation methods that weight token-level supervision by teacher entropy to improve reasoning accuracy-length tradeoff in LLMs, evaluated on Qwen3-4B and Qwen3-8B.

0 favorites 0 likes
← Back to home

Submit Feedback