Tag
NVIDIA releases GLM-5.1-NVFP4, a quantized version of ZAI's GLM-5.1 model with 754B total parameters (40B activated), available on Hugging Face under MIT license.
Developer @0xSero achieved high-performance inference on an optimized GLM-5.1-505B variant using NVFP4 quantization and 32% pruning, reaching 45 tokens/s decode and 1350 tokens/s prefill speeds.