Tag
Trajectory-Refined Distillation (TRD) addresses prefix failure in on-policy distillation for LLMs by correcting student rollouts at the trajectory level before distillation, consistently outperforming prior baselines across benchmarks.