hardware-benchmark

#hardware-benchmark

@0xSero: Minimax-M3 running on 4x RTX Pro 6000s - 800k context - 4x concurrency at 250k - 70-120 tok/s - 2000 tok/s prefill no c…

X AI KOLs Following ↗ · 2026-06-14 Cached

Minimax-M3 is demonstrated running on 4x RTX Pro 6000 GPUs with 800k context, achieving 70-120 tok/s inference and 2000 tok/s prefill at 4x concurrency using 376GB VRAM in mxfp4 format.

0 favorites 0 likes

#hardware-benchmark

Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

Hugging Face Daily Papers ↗ · 2026-05-28 Cached

This paper investigates the performance gap in batch-1 LLM decode for physical AI systems, finding that faster memory bandwidth does not proportionally reduce latency due to launch overheads, and that quantization efficiency varies significantly across hardware.

0 favorites 0 likes

hardware-benchmark

@0xSero: Minimax-M3 running on 4x RTX Pro 6000s - 800k context - 4x concurrency at 250k - 70-120 tok/s - 2000 tok/s prefill no c…

Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

Submit Feedback