mixed-policy

#mixed-policy

Near-Future Policy Optimization

Hugging Face Daily Papers ↗ · 2026-04-22 Cached

Proposes Near-Future Policy Optimization (NPO), a mixed-policy RL method that accelerates convergence by learning from a later checkpoint of the same training run, boosting Qwen3-VL-8B-Instruct performance from 57.88 to 62.84.

0 favorites 0 likes

mixed-policy

Near-Future Policy Optimization

Submit Feedback