decoder-only

#decoder-only

PARTREP: Learning What to Repeat for Decoder-only LLMs

arXiv cs.CL ↗ · 2d ago Cached

PartRep proposes a selective prompt repetition method for decoder-only LLMs that appends only the most informative tokens (selected via NLL) instead of the full prompt, reducing KV cache and prefill FLOPs while retaining most of the accuracy gains across multiple benchmarks.

0 favorites 0 likes

#decoder-only

Recovering Input Text from Hidden States: Study of Gradient-Based Inversion of Decoder-Only Language Models

arXiv cs.CL ↗ · 3d ago Cached

This paper studies the problem of recovering input token sequences from last-layer hidden states of decoder-only language models using continuous embedding-space optimization, revealing that high-frequency function words are the main failure points while content words recover almost perfectly, achieving up to 97.5% exact-match rate.

0 favorites 0 likes

#decoder-only

Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility

Hugging Face Daily Papers ↗ · 2026-05-07 Cached

The paper introduces SPEED, a layer-asymmetric KV visibility policy that reduces long-context inference costs by processing prompt tokens only in lower layers during prefill while maintaining full-depth attention during decoding.

0 favorites 0 likes

#decoder-only

River-LLM: Large Language Model Seamless Exit Based on KV Share

Hugging Face Daily Papers ↗ · 2026-04-20 Cached

River-LLM proposes a training-free early-exit framework for decoder-only LLMs that uses KV-sharing to eliminate KV-cache gaps, achieving 1.71–2.16× speedup without quality loss.

0 favorites 0 likes

decoder-only

PARTREP: Learning What to Repeat for Decoder-only LLMs

Recovering Input Text from Hidden States: Study of Gradient-Based Inversion of Decoder-Only Language Models

Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility

River-LLM: Large Language Model Seamless Exit Based on KV Share

Submit Feedback