decoder-only

#decoder-only

Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility

Hugging Face Daily Papers ↗ · 2026-05-07 Cached

The paper introduces SPEED, a layer-asymmetric KV visibility policy that reduces long-context inference costs by processing prompt tokens only in lower layers during prefill while maintaining full-depth attention during decoding.

0 favorites 0 likes

#decoder-only

River-LLM: Large Language Model Seamless Exit Based on KV Share

Hugging Face Daily Papers ↗ · 2026-04-20 Cached

River-LLM proposes a training-free early-exit framework for decoder-only LLMs that uses KV-sharing to eliminate KV-cache gaps, achieving 1.71–2.16× speedup without quality loss.

0 favorites 0 likes

decoder-only

Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility

River-LLM: Large Language Model Seamless Exit Based on KV Share

Submit Feedback