@0xSero: GLM-5.1-478B-NVFP4 Running on: - 4x RTX Pro 6000 - Sglang - 370,000 max tokens (1.75x full context) - p10 27.7 | p90 45…

X AI KOLs Timeline 04/21/26, 12:25 PM Models

Summary

A quantized 478B-parameter GLM-5.1 model runs on 4×RTX Pro 6000 GPUs via SGLang, delivering 370k-token context at up to 45 tok/s decode and 1340 tok/s prefill, and is demoed driving Figma.

GLM-5.1-478B-NVFP4 Running on: - 4x RTX Pro 6000 - Sglang - 370,000 max tokens (1.75x full context) - p10 27.7 | p90 45.6 tok/s decode (gen) - 1340 tok/s prefill I could get 2x decode if I limit to 64k context (100 tok/s) In this video it operates Figma (:

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 04/21/26, 04:24 PM

@0xSero: GLM-5.1-478B-NVFP4 Running on: - 4x RTX Pro 6000 - Sglang - 370,000 max tokens (1.75x full context) - p10 27.7 | p90 45…

Similar Articles

@0xSero: Finally GLM-5.1-505B-REAP-NVFP4 45 tokens/s decode 1350 tokens/s prefill 32% prune This was the hardest I ever worked t…

Gemma 4 26B Hits 600 Tok/s on One RTX 5090

@iotcoi: Qwen3.6-27B-FP8 + Dflash + DDTree, 256k context, 10 agents ~200 tokens/sec max decode 136t/s average on a single tiny G…

@outsource_: NEW GLM+ QWEN 18B RUNS ON CONSUMER GPU IT BEATS 35B MoE AT HALF THE VRAM @KyleHessling1 just dropped the healed Qwopus-…

@rumgewieselt: Now its getting crazy ... 3x 1080 Ti (Pascal, 33GB VRAM) Qwen 3.6 27B MTP with 196K TurboQuant ~28-30 t/s consistently

Submit Feedback

Similar Articles

@0xSero: Finally GLM-5.1-505B-REAP-NVFP4 45 tokens/s decode 1350 tokens/s prefill 32% prune This was the hardest I ever worked t…

Gemma 4 26B Hits 600 Tok/s on One RTX 5090

@iotcoi: Qwen3.6-27B-FP8 + Dflash + DDTree, 256k context, 10 agents ~200 tokens/sec max decode 136t/s average on a single tiny G…

@outsource_: NEW GLM+ QWEN 18B RUNS ON CONSUMER GPU IT BEATS 35B MoE AT HALF THE VRAM @KyleHessling1 just dropped the healed Qwopus-…
A new 18B merged quantized model, Qwopus-GLM-18B-GGUF, outperforms 35B MoE models while using half the VRAM and running on consumer GPUs.

@rumgewieselt: Now its getting crazy ... 3x 1080 Ti (Pascal, 33GB VRAM) Qwen 3.6 27B MTP with 196K TurboQuant ~28-30 t/s consistently