hopper

#hopper

Here are some tips on hitting nearly 200 tok/s for DeepSeek v4 Flash on Hopper

Reddit r/LocalLLaMA ↗ · 2026-06-08 Cached

This blog post provides tips and benchmarks for achieving nearly 200 tokens per second inference on DeepSeek V4 Flash using vLLM on a dual GH200 workstation, highlighting the use of a quantized checkpoint from Canada-Quant and tensor parallelism optimizations.

0 favorites 0 likes

hopper

Here are some tips on hitting nearly 200 tok/s for DeepSeek v4 Flash on Hopper

Submit Feedback