prompt-repetition

#prompt-repetition

PARTREP: Learning What to Repeat for Decoder-only LLMs

arXiv cs.CL ↗ · 2d ago Cached

PartRep proposes a selective prompt repetition method for decoder-only LLMs that appends only the most informative tokens (selected via NLL) instead of the full prompt, reducing KV cache and prefill FLOPs while retaining most of the accuracy gains across multiple benchmarks.

0 favorites 0 likes

prompt-repetition

PARTREP: Learning What to Repeat for Decoder-only LLMs

Submit Feedback