adapter-efficiency

#adapter-efficiency

Latent Cache Flow: Model-to-Model Communication Without Text

arXiv cs.LG ↗ · 2026-05-25 Cached

The paper introduces Latent Cache Flow (LCF), a method for efficient model-to-model communication by exchanging compressed KV caches instead of text, reducing adapter size and enabling cross-context communication.

0 favorites 0 likes

#adapter-efficiency

LLiMba: Sardinian on a Single GPU -- Adapting a 3B Language Model to a Vanishing Romance Language

arXiv cs.CL ↗ · 2026-05-12 Cached

The article introduces LLiMba, a 3B parameter model adapted from Qwen2.5 for Sardinian using continued pretraining and supervised fine-tuning on a single consumer GPU. It evaluates various LoRA configurations, finding that adapter capacity significantly impacts performance and factual accuracy in low-resource language adaptation.

0 favorites 0 likes

adapter-efficiency

Latent Cache Flow: Model-to-Model Communication Without Text

LLiMba: Sardinian on a Single GPU -- Adapting a 3B Language Model to a Vanishing Romance Language

Submit Feedback