sparse-decoding

#sparse-decoding

EntmaxKV: Support-Aware Decoding for Entmax Attention

arXiv cs.LG ↗ · 2026-05-22 Cached

EntmaxKV introduces a support-aware sparse decoding framework for entmax attention that reduces KV-cache memory traffic by exploiting sparsity before loading pages, achieving significant speedups on long-context benchmarks while maintaining output quality.

0 favorites 0 likes

sparse-decoding

EntmaxKV: Support-Aware Decoding for Entmax Attention

Submit Feedback