Tag
PartRep proposes a selective prompt repetition method for decoder-only LLMs that appends only the most informative tokens (selected via NLL) instead of the full prompt, reducing KV cache and prefill FLOPs while retaining most of the accuracy gains across multiple benchmarks.