redundancy

Tag

Cards List
#redundancy

Probing the Prompt KV Cache: Where It Becomes Dispensable

arXiv cs.CL · 3d ago Cached

This paper systematically investigates when and which parts of the prompt KV cache become dispensable during LLM decoding, showing that redundancy primarily involves chat template scaffolding rather than task content, and replacement with neutral filler preserves accuracy.

0 favorites 0 likes
← Back to home

Submit Feedback