Mela: Test-Time Memory Consolidation based on Transformation Hypothesis
Summary
The paper introduces Mela, a memory-augmented transformer architecture inspired by human memory consolidation, featuring a Hierarchical Memory Module that improves long-context language modeling performance.
View Cached Full Text
Cached at: 05/12/26, 07:31 AM
Paper page - Mela: Test-Time Memory Consolidation based on Transformation Hypothesis
Source: https://huggingface.co/papers/2605.10537
Abstract
A memory-augmented transformer architecture called Mela incorporates hierarchical memory modules inspired by human memory consolidation processes, enabling improved long-context language modeling through multi-granularity memory representations.
Memory consolidation, the process by which transient experiences are transformed into stable, structured representations, is a foundational organizing principle in the human brain, yet it remains largely unexplored as a design principle for modern sequence models. In this work, we leverage established neuroscientific theories ofmemory consolidationandcross-frequency couplingto propose theHierarchical Memory Module(HMM), a neural memory architecture composed of two functionally distinct sub-modules that operate at different update frequencies. Inspired by the transformation hypothesis, the low-frequency sub-module produces high-level representations that capture abstract, gist-level knowledge, while the high-frequency sub-module produces fine-grained representations that preserve richer episodic detail. The final memory output is dynamically reconstructed as a context-dependent combination of both representations, analogous to the reconstructive nature of human memory retrieval. We integrate HMM into aTransformer-based language decoderto form Mela, a family ofmemory-augmented language modelsthat perform onlinememory consolidationat test time. To further exploit themulti-granularity memory representationsproduced by HMM, we introduceMemStack, a method that distributes different levels of memory features across the early layers of the decoder without introducing additional tokens. Experiments on language modeling demonstrate that Mela outperforms Transformer baselines across all the model sizes. Moreover, with the pretrained context length fixed at 4K, Mela maintains performance on significantly longer contexts, whereas Transformer baselines degrade rapidly beyond their training length. Extensive ablation studies validate the contribution of each component and provide guidance for practical configuration.
View arXiv pageView PDFGitHub0Add to collection
Get this paper in your agent:
hf papers read 2605\.10537
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.10537 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.10537 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.10537 in a Space README.md to link it from this page.
Collections including this paper1
Similar Articles
Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models
Proposes Memory-Efficient Looped Transformer (MELT), a novel recurrent LLM architecture that decouples reasoning depth from memory consumption by sharing a single KV cache across loops and using chunk-wise training with interpolated transition and attention-aligned distillation.
@dair_ai: // Memory as a Model // The paper augments any LLM with a separate trained memory model that stores, retrieves, and int…
MeMo introduces a modular memory model that augments any LLM to store, retrieve, and integrate new knowledge without retraining or catastrophic forgetting. It outperforms RAG-based methods on benchmarks like BrowseComp-Plus, NarrativeQA, and MuSiQue.
H-Mem: A Novel Memory Mechanism for Evolving and Retrieving Agent Memory via a Hybrid Structure
H-Mem is a novel memory mechanism for LLM-based agents that uses a hybrid structure combining a temporal and semantic tree with a knowledge graph to model memory evolution and improve retrieval, achieving state-of-the-art performance on QA benchmarks.
T-Mem: Memory That Anticipates, Not Archives
T-Mem is a new long-term conversational memory architecture that enables both descriptive and associative recall, covering scenarios where query and memory share surface features and those where they are connected by latent semantic arcs. It reaches state-of-the-art on the LoCoMo and LoCoMo-Plus benchmarks.
MemTrain: Self-Supervised Context Memory Training
MemTrain proposes a self-supervised training framework that uses masked reconstruction and intermediate memory recall proxy tasks on Wikipedia corpora to enhance LLM agents' context memory, achieving up to 17.67 point gains on downstream memory-intensive QA benchmarks.