8-16 MI50s Minimax M3 @19 tps TG (peak)
Summary
Reports a peak throughput of 19 tokens per second for the Minimax M3 model running on 8-16 MI50 GPUs.
Similar Articles
MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)
Benchmark results for running Qwen 3.6 27B on AMD MI50 GPUs using a custom vllm fork, achieving 52.8 tokens/s TG and 1569 tokens/s PP without quantization or MTP, demonstrating usability for agentic tasks on 2018 hardware.
@0xSero: Minimax-M3 running on 4x RTX Pro 6000s - 800k context - 4x concurrency at 250k - 70-120 tok/s - 2000 tok/s prefill no c…
Minimax-M3 is demonstrated running on 4x RTX Pro 6000 GPUs with 800k context, achieving 70-120 tok/s inference and 2000 tok/s prefill at 4x concurrency using 376GB VRAM in mxfp4 format.
80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP
A user shares a configuration for achieving over 80 tokens per second with Qwen3.6 35B A3B on a 12GB VRAM GPU using llama.cpp and Multi-Token Prediction (MTP). The post includes benchmark results and specific command-line parameters to optimize performance.
@QuixiAI: @Kimi_Moonshot K2.6 running on my mi300x, 56 tps (single request). I will run a throughput test
Kimi K2.6 achieves 56 tokens per second on a single MI300X GPU; user plans further throughput benchmarking.
@populartourist: Unsloth Qwen3.6 27B Q6_K doing over 100 t/s with MTP on RTX 5090. That's coming up from 45-50 t/s without MTP. That's i…
Unsloth Qwen3.6 27B Q6_K achieves over 100 tokens per second with MTP on RTX 5090, up from 45-50 t/s without MTP.