sparse-decoding

Tag

Cards List
#sparse-decoding

EntmaxKV: Support-Aware Decoding for Entmax Attention

arXiv cs.LG · 2026-05-22 Cached

EntmaxKV introduces a support-aware sparse decoding framework for entmax attention that reduces KV-cache memory traffic by exploiting sparsity before loading pages, achieving significant speedups on long-context benchmarks while maintaining output quality.

0 favorites 0 likes
← Back to home

Submit Feedback