Tag
This paper introduces Code2LoRA, a hypernetwork-based method to generate adapters for code language models, addressing challenges under software evolution.
This paper explores using parameter-efficient fine-tuning (PEFT) as a compact substrate for persistent personal models, studying scaling up, down, and out, and introduces MinT for managing adapters.
The author benchmarks serving 1,000 LoRA adapters on one GPU using vLLM, finding that active adapter count and traffic shape are the real bottlenecks, and provides recommendations for tuning max_loras.