token-reweighting

#token-reweighting

Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation

arXiv cs.LG ↗ · 3d ago Cached

Introduces FiRe-OPD, a method for on-policy distillation in LLMs that filters low-quality trajectories and applies soft reweighting to emphasize informative tokens, achieving improved performance in strong-to-weak, single-teacher, and multi-teacher settings.

0 favorites 0 likes

token-reweighting

Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation

Submit Feedback