mixed-policy

Tag

Cards List
#mixed-policy

Near-Future Policy Optimization

Hugging Face Daily Papers · 2026-04-22 Cached

Proposes Near-Future Policy Optimization (NPO), a mixed-policy RL method that accelerates convergence by learning from a later checkpoint of the same training run, boosting Qwen3-VL-8B-Instruct performance from 57.88 to 62.84.

0 favorites 0 likes
← Back to home

Submit Feedback