prefill-only-finetuning

Tag

Cards List
#prefill-only-finetuning

PreFT: Prefill-only finetuning for efficient inference

arXiv cs.LG · 17h ago Cached

PreFT proposes applying adapters only to prefill tokens, discarding them during decode, which increases throughput for multi-adapter serving with minimal performance loss.

0 favorites 0 likes
← Back to home

Submit Feedback