rollout

#rollout

ProcessThinker: Enhancing Multi-modal Large Language Models Reasoning via Rollout-based Process Reward

arXiv cs.CL ↗ · 5d ago Cached

ProcessThinker introduces a practical post-training pipeline that provides step-level process rewards without training an explicit process reward model. It uses rollout-based rewards to give dense credit assignment for multi-step reasoning in multimodal LLMs, consistently improving performance on video benchmarks.

0 favorites 0 likes

#rollout

MiniMax M3 is starting to rollout on the API

Reddit r/singularity ↗ · 2026-06-01

MiniMax is rolling out its M3 model on the API, featuring a 1,000,000 token context window.

0 favorites 0 likes

#rollout

Multi-Rollout On-Policy Distillation via Peer Successes and Failures

arXiv cs.LG ↗ · 2026-05-14 Cached

Introduces Multi-Rollout On-Policy Distillation (MOPD), a method that conditions the teacher on both successful and failed peer rollouts to provide denser token-level supervision for language model post-training, improving performance across multiple benchmarks.

0 favorites 0 likes

#rollout

GPT-Image-2 is rolling out

Reddit r/singularity ↗ · 2026-04-20

OpenAI is rolling out GPT-Image-2, a new image generation model. This appears to be a significant update to their image generation capabilities.

0 favorites 0 likes

rollout

ProcessThinker: Enhancing Multi-modal Large Language Models Reasoning via Rollout-based Process Reward

MiniMax M3 is starting to rollout on the API

Multi-Rollout On-Policy Distillation via Peer Successes and Failures

GPT-Image-2 is rolling out

Submit Feedback