prompt-repetition

Tag

Cards List
#prompt-repetition

PARTREP: Learning What to Repeat for Decoder-only LLMs

arXiv cs.CL · 2d ago Cached

PartRep proposes a selective prompt repetition method for decoder-only LLMs that appends only the most informative tokens (selected via NLL) instead of the full prompt, reducing KV cache and prefill FLOPs while retaining most of the accuracy gains across multiple benchmarks.

0 favorites 0 likes
← Back to home

Submit Feedback