decoder-only

Tag

Cards List
#decoder-only

Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility

Hugging Face Daily Papers · 2026-05-07 Cached

The paper introduces SPEED, a layer-asymmetric KV visibility policy that reduces long-context inference costs by processing prompt tokens only in lower layers during prefill while maintaining full-depth attention during decoding.

0 favorites 0 likes
#decoder-only

River-LLM: Large Language Model Seamless Exit Based on KV Share

Hugging Face Daily Papers · 2026-04-20 Cached

River-LLM proposes a training-free early-exit framework for decoder-only LLMs that uses KV-sharing to eliminate KV-cache gaps, achieving 1.71–2.16× speedup without quality loss.

0 favorites 0 likes
← Back to home

Submit Feedback