token-reweighting

Tag

Cards List
#token-reweighting

Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation

arXiv cs.LG · 3d ago Cached

Introduces FiRe-OPD, a method for on-policy distillation in LLMs that filters low-quality trajectories and applies soft reweighting to emphasize informative tokens, achieving improved performance in strong-to-weak, single-teacher, and multi-teacher settings.

0 favorites 0 likes
← Back to home

Submit Feedback