concurrent-requests

#concurrent-requests

在 V100 上使用 Qwen3.6 27B 实现每秒 1000 tokens 生成

Reddit r/LocalLLaMA ↗ · 2026-05-25

在 V100 GPU 上，使用 Qwen3.6 27B 模型，通过 128 个并发请求实现了每秒 1000 tokens 的生成速度，单用户下为 80 t/s。

0 人收藏 0 人点赞