deepseek-sparse-attention

#deepseek-sparse-attention

@totheagi: We're the first to make the full GLM-5.2 (FP8) run on RTX 4090s. GLM-5.2 is the new 753B SOTA open-weights model, and i…

X AI KOLs Timeline ↗ · 2d ago Cached

We're the first to run the full GLM-5.2 (753B FP8) on RTX 4090s by porting sparse-attention kernels to Ada GPUs, enabling frontier open-weights model on commodity hardware.

0 favorites 0 likes

#deepseek-sparse-attention

@0xSero: Rejoice fellow 6000 enjoyers. We have GLM at home

X AI KOLs Following ↗ · 3d ago Cached

A turnkey Docker setup to serve the GLM-5.2-NVFP4-REAP-469B model on 4× RTX PRO 6000 Blackwell GPUs using vLLM, with detailed instructions and configuration options.

0 favorites 0 likes

#deepseek-sparse-attention

Kwai Keye-VL-2.0 Technical Report

Hugging Face Daily Papers ↗ · 2026-06-09 Cached

This technical report presents Kwai Keye-VL-2.0, an open-source Mixture-of-Experts multimodal foundation model designed for long-video understanding and agentic intelligence, leveraging DeepSeek Sparse Attention and cross-modal distillation to achieve state-of-the-art performance among similar-scale models.

0 favorites 0 likes

deepseek-sparse-attention

@totheagi: We're the first to make the full GLM-5.2 (FP8) run on RTX 4090s. GLM-5.2 is the new 753B SOTA open-weights model, and i…

@0xSero: Rejoice fellow 6000 enjoyers. We have GLM at home

Kwai Keye-VL-2.0 Technical Report

Submit Feedback