@PyTorch: While SGLang provided Day-0 support for DeepSeek-V4, the collaboration between the @lmsysorg and @NVIDIAAI engineering …

X AI KOLs Following News

Summary

SGLang provided Day-0 support for DeepSeek-V4, and collaboration between LMSys and NVIDIA engineering teams achieved up to 5x throughput increase in production, with improvements shown on the SemiAnalysis InferenceX dashboard.

While SGLang provided Day-0 support for DeepSeek-V4, the collaboration between the @lmsysorg and @NVIDIAAI engineering teams has taken its production performance to the next level. According to the public SemiAnalysis InferenceX dashboard, the GB300 disaggregated lane (DeepSeek-V4 Pro, FP4, 8K/1K) saw a 5x throughput increase—surging from ~2,200 to ~11,200 tok/s/GPU at identical interactivity levels. These updates sustain high throughput much deeper into target interactivity ranges most deployments target, while also driving a 2.9x lift on the Blackwell Ultra aggregated lane. Find the full technical breakdown in the comments below:
Original Article
View Cached Full Text

Cached at: 06/24/26, 03:57 AM

While SGLang provided Day-0 support for DeepSeek-V4, the collaboration between the @lmsysorg and @NVIDIAAI engineering teams has taken its production performance to the next level.

According to the public SemiAnalysis InferenceX dashboard, the GB300 disaggregated lane (DeepSeek-V4 Pro, FP4, 8K/1K) saw a 5x throughput increase—surging from ~2,200 to ~11,200 tok/s/GPU at identical interactivity levels. These updates sustain high throughput much deeper into target interactivity ranges most deployments target, while also driving a 2.9x lift on the Blackwell Ultra aggregated lane.

Find the full technical breakdown in the comments below:

Similar Articles

I have (even faster) DeepSeek V4 Pro at home

Reddit r/LocalLLaMA

A user reports successfully running the DeepSeek V4 Pro model locally using ktransformers and sharing detailed benchmark results across various context depths, demonstrating improved inference speeds.