@vllm_project: Today we're excited to introduce vime — a simple, stable, and efficient RL framework for LLM post-training in the vLLM …

X AI KOLs Following 06/09/26, 05:22 PM Tools

llm reinforcement-learning post-training vllm open-source framework

Summary

vime is a new open-source RL framework for LLM post-training, built on slime's training design and vLLM's inference engine, providing a simple, stable, and efficient pipeline within the vLLM ecosystem.

Today we're excited to introduce vime — a simple, stable, and efficient RL framework for LLM post-training in the vLLM ecosystem. Built on slime's proven training design and powered by vLLM inference, vime brings another strong option to the growing vLLM post-training ecosystem. Our goal isn't a one-size-fits-all framework. We want users with different needs to find the right vLLM-ecosystem choice for their workflows—whether that's vime, NeMo RL, OpenRLHF, verl, or others. More choice. More interoperability. More innovation. Learn more: https://vllm.ai/blog/2026-06-09-announcing-vime… #LLM #RLHF #PostTraining #vLLM

Original Article

View Cached Full Text

Cached at: 06/10/26, 05:53 PM

Today we’re excited to introduce vime — a simple, stable, and efficient RL framework for LLM post-training in the vLLM ecosystem.

Built on slime’s proven training design and powered by vLLM inference, vime brings another strong option to the growing vLLM post-training ecosystem.

Our goal isn’t a one-size-fits-all framework. We want users with different needs to find the right vLLM-ecosystem choice for their workflows—whether that’s vime, NeMo RL, OpenRLHF, verl, or others.

More choice. More interoperability. More innovation.

Learn more: https://vllm.ai/blog/2026-06-09-announcing-vime…

#LLM #RLHF #PostTraining #vLLM

Announcing vime: A Simple, Stable, and Efficient RL Framework for LLMs

Source: https://vllm.ai/blog/2026-06-09-announcing-vime We are excited to introducevime, an LLM post-training framework within the vLLM ecosystem. Built on slime’s training stack and data-generation design, vime connects Megatron and vLLM into a single RL pipeline so distributed training and inference can run reliably under one unified architecture.

slime has proven itself as a strong engineering paradigm for RL post-training: open, lightweight, and efficient. vime brings the vLLM ecosystem to slime, pairing slime’s training stack with vLLM’s inference strengths into a simple, stable, and efficient main pipeline—delivering stable train-inference alignment, flexible deployment modes, and full-stack GPU support.

Our Vision

RL frameworks with both battle-tested credibility and open-source DNA have always been rare.slime, validated on models like GLM, stands out as a representative: open, lightweight, concise, and efficient. Yet it does not natively integrate with the vLLM backend. vLLM, meanwhile, is the most active inference engine in the community, combining cutting-edge techniques with a multi-platform ecosystem and rapid iteration.

vime’s mission is to connect slime’s training design with vLLM’s inference strengths into one simple, stable, and efficient pipeline. Developers should not have to trade off between a single hardware stack, training stability, and inference performance.

Positioning

The vLLM community supports a broad set of LLM post-training frameworks, including (in alphabetical order)NeMo RL,OpenRLHF,verl, and others. We built vime to seamlessly bring slime’s proven training paradigm into the vLLM ecosystem, offering a production-ready bridge that aligns both projects’ rapid release cycles.

We hope that users with different needs can find the right vLLM-ecosystem choice for their workflows. The vLLM community will continue to support vLLM integrations across the broader post-training ecosystem.

Architecture Overview

vime adopts slime’s three-stage, decoupled train-inference design, with the key difference being that the rollout backend is replaced by vLLM:

Training (Megatron): The main training loop, responsible for parameter updates and synchronizing weights to the rollout side.
Rollout (vLLM + Router): Inference sampling, producing training samples with reward or verifier signals.
Data Buffer: Connects the training and rollout sides, managing prompt injection and custom rollout logic.

vime connects Megatron training with vLLM-powered rollout through a decoupled data buffer.## Key Capabilities

Easy to use: The parameter system inherits slime and Megatron conventions, with vLLM-side arguments passed through using the\-\-vllm\-prefix. The default rollout entry point isvime\.rollout\.vllm\_rollout.
Stable train-inference alignment: Across typical Dense and MoE scenarios,train\_rollout\_logprob\_abs\_diffstays within a controllable range over long runs. For MoE,R3(routing replay) further reduces train-inference mismatch.
Algorithm and model coverage: RL algorithms such as GRPO and PPO, plus models including Qwen3 Dense/MoE and GLM-4.5, ship with end-to-end examples and CI-verified paths.
Multi-hardware support: At the framework level, training resources, rollout resources, and cluster topology are abstracted uniformly, making it easier to reuse the same RL pipeline across different hardware backends as support evolves with the vLLM ecosystem.

Validation and Benchmarks

For Qwen3-30B-A3B with 8-GPU colocate, dapo-math-17k, and GRPO, GB200 mean step time is about147 seconds, while H200 mean step time is about252 seconds. Under the same framework, GB200 end-to-end step speed is about1.72xthat of H200.

Qwen3-30B-A3B vime step speed on GB200 and H200.We also validated train-inference consistency and end-to-end functionality on representative workloads across hardware.

Qwen3-4B on A100

For Qwen3-4B on A100 with GRPO, 4 training + 4 inference non-colocate, and gsm8k, vime’strain\_rollout\_logprob\_abs\_diffstays stable around0.011throughout training. The baseline drifts continuously to around0.77as training progresses, while vime delivers more stable train-inference alignment.

Qwen3-4B vime versus baseline training behavior.### Qwen3-30B-A3B MoE with R3

For Qwen3-30B-A3B MoE on A100 with 4 training GPUs, 4 inference GPUs, dapo-math-17k, and EP=4, enabling vime’s R3 routing replay reduces the logprob diff from roughly0.019to roughly0.013, markedly reducing MoE train-inference mismatch.

R3 routing replay reduces train-inference mismatch for Qwen3-30B-A3B MoE.### Qwen3-30B-A3B MoE on GB200

For Qwen3-30B-A3B MoE on GB200 with 8-GPU colocate and dapo-math-17k, vime and the baseline have closely alignedraw\_rewardcurves. Both keeptrain\_rollout\_logprob\_abs\_diffstable around0.018, with no sustained baseline-side drift.

Qwen3-30B-A3B MoE on GB200 shows stable alignment in colocated training and rollout.### GLM-4.5-Air on GB200

For GLM-4.5-Air on GB200 with GRPO, 8-GPU colocate, and dapo-math-17k,raw\_rewardtrends upward over 100 steps with a mean of about0.56.train\_rollout\_logprob\_abs\_diffstays in the0.02-0.03range, with a mean of about0.028, indicating solid train-inference alignment.

GLM-4.5-Air on GB200 maintains stable logprob alignment while reward improves.## Roadmap

vime is still evolving rapidly, with a roadmap focused on three areas:

Deeper vLLM integration: Continuously adopting new vLLM capabilities such as Router, PD disaggregation, FP8, and multi-model serving.
Multi-hardware expansion: Extending backends along vLLM’s hardware plugin system so vime runs efficiently on more accelerators and cluster configurations.
Training efficiency and algorithms: Fully asynchronous pipelines, train-inference mismatch correction, Agentic RL for multi-turn tool calling and multi-agent settings, and fast follow-up on new architectures such as MoE and VLM.

Quick Start

The getting-started path is similar to slime: configure Megatron training resources and vLLM rollout resources, prepare checkpoints and data, then launchtrain\.pyortrain\_async\.py.

Docs:Quick Start
Examples: Thescripts/andexamples/directories cover scenarios such as Qwen3-4B, Qwen3-30B-A3B MoE, and GLM-4.5-Air.

vime is maintained by the vLLM community, open-sourced under Apache 2.0, and built on the shoulders of projects like slime, Megatron-LM, and vLLM.

Code and docs:github.com/vllm-project/vime
Contributing: Issues and PRs are welcome. Pre-commit keeps the code style consistent.
Feedback: Share your experience, performance data, and feature suggestions on GitHub.

A simple architecture, stable behavior, and efficient performance: vime aims to pave the main pipeline for RL post-training for more developers. Join us and help bring this pipeline to more scenarios.

Acknowledgments

**Contributors:**Ao Shen, kaiyuan, princepride, Dakai An, knlnguyen1802, gcanlin, SamitHuang, and Meihan-chen.

We are grateful to the maintainers of theslime,Megatron-LM, andvLLMprojects for their pioneering work. We would also like to thank Kaichao You, Roger Wang, Hongsheng Liu, and Xiyuan Wang for their support of and contributions to organizing the vime project.

@vllm_project: Today we're excited to introduce vime — a simple, stable, and efficient RL framework for LLM post-training in the vLLM …

Announcing vime: A Simple, Stable, and Efficient RL Framework for LLMs

Our Vision

Positioning

Architecture Overview

Validation and Benchmarks

Qwen3-4B on A100

Quick Start

Acknowledgments

Similar Articles

vllm-project/vllm v0.19.1

vLLM has a new streaming parser for Qwen3+ available in nightly

vllm-project/vllm v0.20.1

vllm-project/vllm v0.21.0rc1

vllm-project/vllm v0.20.0

Submit Feedback