@lmsysorg: NVIDIA just released an NVFP4 checkpoint of GLM-5.2 from @Zai_org, a 744B MoE (40B active) for reasoning & coding. Day-…

X AI KOLs Following Models

Summary

NVIDIA released an NVFP4 quantized checkpoint of GLM-5.2, a 744B MoE model (40B active) optimized for reasoning and coding, with day-0 support in SGLang.

NVIDIA just released an NVFP4 checkpoint of GLM-5.2 from @Zai_org, a 744B MoE (40B active) for reasoning & coding. Day-0 support is live in SGLang! @nvidia > NVFP4 quantization via NVIDIA Model Optimizer: frontier-class reasoning at a fraction of the memory > Sparse attention with IndexShare indexer for efficient long-context > Ready to serve on Blackwell / Grace Blackwell, run it now with SGLang!
Original Article
View Cached Full Text

Cached at: 06/28/26, 12:01 PM

NVIDIA just released an NVFP4 checkpoint of GLM-5.2 from @Zai_org, a 744B MoE (40B active) for reasoning & coding. Day-0 support is live in SGLang! @nvidia

NVFP4 quantization via NVIDIA Model Optimizer: frontier-class reasoning at a fraction of the memory Sparse attention with IndexShare indexer for efficient long-context Ready to serve on Blackwell / Grace Blackwell, run it now with SGLang!

Cookbook:

Similar Articles

nvidia/GLM-5.2-NVFP4

Hugging Face Models Trending

NVIDIA released GLM-5.2-NVFP4, a quantized version of ZAI's GLM-5.2 MoE language model optimized for inference on NVIDIA Blackwell GPUs using Model Optimizer.

zai-org/GLM-5.2-FP8

Hugging Face Models Trending

Z.AI releases GLM-5.2, a flagship open-source model with a solid 1M-token context, improved coding capabilities, and a new IndexShare sparse attention architecture that reduces FLOPs by 2.9x at 1M context.