@Pavel_Izmailov: New paper: Latent Context Language Models (LCLMs)! Idea: encode 16 tokens as 1 latent token, and have the LLM work on t…
Summary
Introduces Latent Context Language Models (LCLMs), which encode 16 tokens as 1 latent token to improve performance, speed, and memory usage.
View Cached Full Text
Cached at: 06/10/26, 09:57 PM
New paper: Latent Context Language Models (LCLMs)!
Idea: encode 16 tokens as 1 latent token, and have the LLM work on top of the latent tokens. Result: general-purpose model with much better performance / speed / memory usage frontier. https://t.co/ldsBOVkmFF
Similar Articles
End-to-End Context Compression at Scale
This paper presents Latent Context Language Models (LCLMs), a family of encoder-decoder compressors that efficiently handle long contexts through architectural search and large-scale pretraining, outperforming traditional KV cache methods in accuracy, speed, and memory usage.
@samhogan: RLMs pretty much solved context btw You can shove tens of millions of tokens into a good RLM harness and it just works.…
A developer shares their experience with Recurrent Language Models (RLMs), claiming they effectively handle extremely long context windows with tens of millions of tokens, representing a significant advancement in context handling capabilities.
@JulieKallini: Fast Byte Latent Transformer is accepted to ICML 2026! Byte-level LMs promise to free us from subword tokenizers, but d…
The Fast Byte Latent Transformer (BLT-D) has been accepted to ICML 2026, introducing a text diffusion method for parallel byte-level decoding to overcome the speed limitations of traditional byte-level language models.
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces
This paper introduces a framework for token-level influence attribution in large language models by learning orthogonal latent spaces with sparse autoencoders, enabling precise identification of training data tokens that jointly influence predictions, with applications in high-stakes domains like healthcare.
Less Context, More Accuracy: A Bi-Temporal Memory Engine for LLM Agents Where a Lean Retrieved Context Beats the Full History
This paper introduces Engram, an open-source bi-temporal memory engine for LLM agents that retrieves a compact context slice (∼9.6k tokens) to outperform the full-history baseline (79k tokens) by 10.4 accuracy points on LongMemEval, using a hybrid read path fusing dense, lexical, graph, and temporal signals.