Tag
Introduces Latent Context Language Models (LCLMs), which encode 16 tokens as 1 latent token to improve performance, speed, and memory usage.
This paper presents Latent Context Language Models (LCLMs), a family of encoder-decoder compressors that efficiently handle long contexts through architectural search and large-scale pretraining, outperforming traditional KV cache methods in accuracy, speed, and memory usage.