Tag
ProcessThinker introduces a practical post-training pipeline that provides step-level process rewards without training an explicit process reward model. It uses rollout-based rewards to give dense credit assignment for multi-step reasoning in multimodal LLMs, consistently improving performance on video benchmarks.
MiniMax is rolling out its M3 model on the API, featuring a 1,000,000 token context window.
Introduces Multi-Rollout On-Policy Distillation (MOPD), a method that conditions the teacher on both successful and failed peer rollouts to provide denser token-level supervision for language model post-training, improving performance across multiple benchmarks.
OpenAI is rolling out GPT-Image-2, a new image generation model. This appears to be a significant update to their image generation capabilities.