training-free-diagnostics

#training-free-diagnostics

Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why

Hugging Face Daily Papers ↗ · 2026-05-11 Cached

This paper introduces a training-free diagnostic framework to analyze per-token distillation signals for reasoning models, revealing that guidance is more beneficial on incorrect rollouts and depends on student capacity and task context.

0 favorites 0 likes

training-free-diagnostics

Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why

Submit Feedback