multi-adapter-serving

Tag

Cards List
#multi-adapter-serving

PreFT: Prefill-only finetuning for efficient inference

arXiv cs.LG · 17h ago Cached

PreFT proposes applying adapters only to prefill tokens, discarding them during decode, which increases throughput for multi-adapter serving with minimal performance loss.

0 favorites 0 likes
← Back to home

Submit Feedback