redundancy

#redundancy

Probing the Prompt KV Cache: Where It Becomes Dispensable

arXiv cs.CL ↗ · 3d ago Cached

This paper systematically investigates when and which parts of the prompt KV cache become dispensable during LLM decoding, showing that redundancy primarily involves chat template scaffolding rather than task content, and replacement with neutral filler preserves accuracy.

0 favorites 0 likes

redundancy

Probing the Prompt KV Cache: Where It Becomes Dispensable

Submit Feedback