Tag
We're the first to run the full GLM-5.2 (753B FP8) on RTX 4090s by porting sparse-attention kernels to Ada GPUs, enabling frontier open-weights model on commodity hardware.
A turnkey Docker setup to serve the GLM-5.2-NVFP4-REAP-469B model on 4× RTX PRO 6000 Blackwell GPUs using vLLM, with detailed instructions and configuration options.
This technical report presents Kwai Keye-VL-2.0, an open-source Mixture-of-Experts multimodal foundation model designed for long-video understanding and agentic intelligence, leveraging DeepSeek Sparse Attention and cross-modal distillation to achieve state-of-the-art performance among similar-scale models.