Language Models Need Sleep
Summary
This paper proposes a sleep-like consolidation mechanism for transformer models that uses fast weights and recurrent passes to improve long-context processing while maintaining inference speed.
View Cached Full Text
Cached at: 05/26/26, 02:44 PM
Paper page - Language Models Need Sleep
Source: https://huggingface.co/papers/2605.26099
Abstract
A sleep-like consolidation mechanism for transformer models uses fast weights and recurrent passes to improve long-context processing while maintaining inference speed.
Transformer-based large language modelsare increasingly used for long-horizon tasks; however, theirattention mechanismscales poorly withcontext length. To handle this, we study asleep-like consolidation mechanismin which a model periodically converts recent context into persistentfast weightsbefore clearing itskey-value cache. During sleep, the model performs N offlinerecurrent passesover the accumulated context and updates thefast weightsin itsstate-space model(SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, includingcellular automataandmulti-hop graph retrieval, as well as a realisticmath reasoningtask, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration N for our models improves performance, with the largest gains on examples that require deeper reasoning.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2605\.26099
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.26099 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.26099 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.26099 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Language Models Need Sleep
This paper introduces a sleep-like consolidation mechanism for Transformer-based LLMs that periodically converts recent context into persistent fast weights in SSM blocks, clearing the KV cache to improve long-horizon reasoning without increasing inference latency.
Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories
This paper introduces a 'Sleep' paradigm for large language models that enables continual learning through memory consolidation and dreaming phases, allowing models to distill short-term knowledge into long-term parameters and self-improve without human supervision.
@omarsar0: Language models need "sleep"
A paper explores letting language model agents 'sleep' to reset internal state and improve performance on long-horizon tasks, addressing context length scaling issues.
Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models
Proposes Memory-Efficient Looped Transformer (MELT), a novel recurrent LLM architecture that decouples reasoning depth from memory consumption by sharing a single KV cache across loops and using chunk-wise training with interpolated transition and attention-aligned distillation.
Mela: Test-Time Memory Consolidation based on Transformation Hypothesis
The paper introduces Mela, a memory-augmented transformer architecture inspired by human memory consolidation, featuring a Hierarchical Memory Module that improves long-context language modeling performance.