Tag
The paper introduces Latent Cache Flow (LCF), a method for efficient model-to-model communication by exchanging compressed KV caches instead of text, reducing adapter size and enabling cross-context communication.
The article introduces LLiMba, a 3B parameter model adapted from Qwen2.5 for Sardinian using continued pretraining and supervised fine-tuning on a single consumer GPU. It evaluates various LoRA configurations, finding that adapter capacity significantly impacts performance and factual accuracy in low-resource language adaptation.