@lmsysorg: NVIDIA just released an NVFP4 checkpoint of GLM-5.2 from @Zai_org, a 744B MoE (40B active) for reasoning & coding. Day-…

X AI KOLs Following 06/26/26, 07:53 AM Models

nvidia glm-5-2 nvfp4 quantization moe reasoning coding

Summary

NVIDIA released an NVFP4 quantized checkpoint of GLM-5.2, a 744B MoE model (40B active) optimized for reasoning and coding, with day-0 support in SGLang.

NVIDIA just released an NVFP4 checkpoint of GLM-5.2 from @Zai_org, a 744B MoE (40B active) for reasoning & coding. Day-0 support is live in SGLang! @nvidia > NVFP4 quantization via NVIDIA Model Optimizer: frontier-class reasoning at a fraction of the memory > Sparse attention with IndexShare indexer for efficient long-context > Ready to serve on Blackwell / Grace Blackwell, run it now with SGLang!

Original Article

View Cached Full Text

Cached at: 06/28/26, 12:01 PM

NVIDIA just released an NVFP4 checkpoint of GLM-5.2 from @Zai_org, a 744B MoE (40B active) for reasoning & coding. Day-0 support is live in SGLang! @nvidia

NVFP4 quantization via NVIDIA Model Optimizer: frontier-class reasoning at a fraction of the memory Sparse attention with IndexShare indexer for efficient long-context Ready to serve on Blackwell / Grace Blackwell, run it now with SGLang!

Cookbook:

Similar Articles

nvidia/GLM-5.2-NVFP4

Hugging Face Models Trending

NVIDIA released GLM-5.2-NVFP4, a quantized version of ZAI's GLM-5.2 MoE language model optimized for inference on NVIDIA Blackwell GPUs using Model Optimizer.

@mr_r0b0t: Official @NVIDIAAI GLM5.1-NVFP4 spotted on @huggingface

X AI KOLs Timeline

NVIDIA releases GLM-5.1-NVFP4, a quantized version of ZAI's GLM-5.1 model with 754B total parameters (40B activated), available on Hugging Face under MIT license.

@HuggingPapers: NVIDIA just released an optimized GLM-5.2 on Hugging Face A 753B parameter MoE with 1M context, quantized to NVFP4 for …

X AI KOLs Following

NVIDIA released an optimized GLM-5.2 MoE model on Hugging Face with 753B parameters and 1M context, quantized to NVFP4 for Blackwell GPUs while nearly matching FP8 accuracy.

zai-org/GLM-5.2-FP8

Hugging Face Models Trending

Z.AI releases GLM-5.2, a flagship open-source model with a solid 1M-token context, improved coding capabilities, and a new IndexShare sparse attention architecture that reduces FLOPs by 2.9x at 1M context.

@0xSero: GLM-5.1-478B-NVFP4 Running on: - 4x RTX Pro 6000 - Sglang - 370,000 max tokens (1.75x full context) - p10 27.7 | p90 45…