@eliebakouch: the new sparse attention method introduced with this model is basically a combination of components from existing ones.…

X AI KOLs Following Models

Summary

Meituan introduces LongCat-2.0, a 1.6T parameter MoE model with 48B active parameters and 1M context length, featuring a new LongCat Sparse Attention (LSA) method that combines components from existing sparse attention techniques.

the new sparse attention method introduced with this model is basically a combination of components from existing ones. let's go over each sparse attention method and what they keep from them: - deepseek sparse attention (DSA): they keep the top-k indexer, this is the basis of all the following sparse attention methods - DSA + index sharing (glm5.2): they share the index over multiple layers - minimax sparse attention (and also NSA from deepseek): they do block level top-k indexing -> they add another top-k at the token level to select precisely each token in the selected block - compressed sparse attention (from deepseek V4) and also NSA: they keep the sliding window -> they also add a sink token iirc tl;dr: they do block level indexing, then token level on the selected blocks + add a sliding window and sink component with a 50/50 budget split. they share the token level top-k across layers
Original Article
View Cached Full Text

Cached at: 06/30/26, 07:42 AM

the new sparse attention method introduced with this model is basically a combination of components from existing ones. let’s go over each sparse attention method and what they keep from them: - deepseek sparse attention (DSA): they keep the top-k indexer, this is the basis of all the following sparse attention methods - DSA + index sharing (glm5.2): they share the index over multiple layers - minimax sparse attention (and also NSA from deepseek): they do block level top-k indexing -> they add another top-k at the token level to select precisely each token in the selected block - compressed sparse attention (from deepseek V4) and also NSA: they keep the sliding window -> they also add a sink token iirc tl;dr: they do block level indexing, then token level on the selected blocks + add a sliding window and sink component with a 50/50 budget split. they share the token level top-k across layers

Meituan LongCat (@Meituan_LongCat): Introducing LongCat-2.0 🐱 1.6T parameters · MoE with ~48B active · 1M context The full model behind Owl Alpha on @OpenRouter — now available.

Built for agentic coding from the ground up: ◆ LongCat Sparse Attention (LSA) — scales efficiently for 1M-context tokens ◆

Similar Articles

MiniMax Sparse Attention

Hugging Face Daily Papers

MiniMax Sparse Attention introduces a blockwise sparse attention mechanism that achieves significant speedups for ultra-long-context LLMs, reducing per-token attention compute by 28.4x at 1M context with wall-clock speedups of 14.2x for prefill and 7.6x for decoding on H800 GPUs. The method is accompanied by an open-source inference kernel and a publicly released multimodal model.