HeLa-Mem: Hebbian Learning and Associative Memory for LLM Agents

arXiv cs.CL Papers

Summary

HeLa-Mem is a bio-inspired memory architecture for LLM agents that models memory as a dynamic graph using Hebbian learning dynamics, featuring episodic and semantic memory stores to improve long-term coherence. Experiments on LoCoMo show superior performance across question categories while using fewer context tokens.

arXiv:2604.16839v1 Announce Type: new Abstract: Long-term memory is a critical challenge for Large Language Model agents, as fixed context windows cannot preserve coherence across extended interactions. Existing memory systems represent conversation history as unstructured embedding vectors, retrieving information through semantic similarity. This paradigm fails to capture the associative structure of human memory, wherein related experiences progressively strengthen interconnections through repeated co-activation. Inspired by cognitive neuroscience, we identify three mechanisms central to biological memory: association, consolidation, and spreading activation, which remain largely absent in current research. To bridge this gap, we propose HeLa-Mem, a bio-inspired memory architecture that models memory as a dynamic graph with Hebbian learning dynamics. HeLa-Mem employs a dual-level organization: (1) an episodic memory graph that evolves through co-activation patterns, and (2) a semantic memory store populated via Hebbian Distillation, wherein a Reflective Agent identifies densely connected memory hubs and distills them into structured, reusable semantic knowledge. This dual-path design leverages both semantic similarity and learned associations, mirroring the episodic-semantic distinction in human cognition. Experiments on LoCoMo demonstrate superior performance across four question categories while using significantly fewer context tokens. Code is available on GitHub: https://github.com/ReinerBRO/HeLa-Mem
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/21/26, 07:04 AM

# HeLa-Mem: Hebbian Learning and Associative Memory for LLM Agents
Source: [https://arxiv.org/html/2604.16839](https://arxiv.org/html/2604.16839)
Jinchang Zhu1,∗,a, Jindong Li1,∗, Cheng Zhang2,∗, Jiahong Liu3, Menglin Yang1,†,b 1The Hong Kong University of Science and Technology \(Guangzhou\) 2Jilin University 3The Chinese University of Hong Kong ajzhu997@connect\.hkust\-gz\.edu\.cnbmenglinyang@hkust\-gz\.edu\.cn ∗Equal contribution\.†Corresponding author

###### Abstract

Long\-term memory is a critical challenge for Large Language Model agents, as fixed context windows cannot preserve coherence across extended interactions\. Existing memory systems represent conversation history as unstructured embedding vectors, retrieving information through semantic similarity\. This paradigm fails to capture theassociative structureof human memory, wherein related experiences progressively strengthen interconnections through repeated co\-activation\. Inspired by cognitive neuroscience, we identify three mechanisms central to biological memory:association,consolidation, andspreading activation, which remain largely absent in current research\. To bridge this gap, we proposeHeLa\-Mem, a bio\-inspired memory architecture that models memory as a dynamic graph with Hebbian learning dynamics\. HeLa\-Mem employs a dual\-level organization: \(1\) anepisodic memory graphthat evolves through co\-activation patterns, and \(2\) asemantic memory storepopulated via Hebbian Distillation, wherein a Reflective Agent identifies densely connected memory hubs and distills them into structured, reusable semantic knowledge\. This dual\-path design leverages both semantic similarity and learned associations, mirroring the episodic\-semantic distinction in human cognition\. Experiments on LoCoMo demonstrate superior performance across four question categories while using significantly fewer context tokens \(Figure[1](https://arxiv.org/html/2604.16839#S1.F1)\)\. Code is available on[GitHub](https://github.com/ReinerBRO/HeLa-Mem)\.

HeLa\-Mem: Hebbian Learning and Associative Memory for LLM Agents

Jinchang Zhu1,∗,a, Jindong Li1,∗, Cheng Zhang2,∗, Jiahong Liu3, Menglin Yang1,†,b1The Hong Kong University of Science and Technology \(Guangzhou\)2Jilin University3The Chinese University of Hong Kongajzhu997@connect\.hkust\-gz\.edu\.cnbmenglinyang@hkust\-gz\.edu\.cn∗Equal contribution\.†Corresponding author\.

## 1Introduction

Large language models have demonstrated remarkable capabilities in language understanding and generation, enabling increasingly sophisticated interactive agents\(Yanget al\.,[2024](https://arxiv.org/html/2604.16839#bib.bib20); Liuet al\.,[2026](https://arxiv.org/html/2604.16839#bib.bib26)\)\. However, sustaining coherent behavior over long time horizons remains a fundamental challenge\(Zhanget al\.,[2025](https://arxiv.org/html/2604.16839#bib.bib1)\)\. Due to their reliance on fixed\-length context windows, LLMs struggle to maintain consistent representations of past interactions as dialogues extend or span multiple sessions\. This limitation often leads to fragmented memory\(Wuet al\.,[2025](https://arxiv.org/html/2604.16839#bib.bib2); Huet al\.,[2025](https://arxiv.org/html/2604.16839#bib.bib3); Lianget al\.,[2025](https://arxiv.org/html/2604.16839#bib.bib4); Liuet al\.,[2025](https://arxiv.org/html/2604.16839#bib.bib25)\), resulting in factual inconsistencies, diminished personalization, and unstable agent behavior\(Liet al\.,[2026](https://arxiv.org/html/2604.16839#bib.bib27)\)\. Addressing long\-term memory coherence is therefore essential for LLM agents operating in settings that require persistent user adaptation, multi\-session knowledge retention, or stable persona maintenance\.

![Refer to caption](https://arxiv.org/html/2604.16839v1/x1.png)Figure 1:Performance vs\. token efficiency on LoCoMo, averaged across GPT\-4o\-mini and GPT\-4o\. HeLa\-Mem achieves strong performance with fewer tokens, landing in the upper\-left ideal region\.Current memory mechanisms for LLM agents can be broadly categorized into three methodological paradigms\.Knowledge\-organization methods, such as A\-Mem\(Xuet al\.,[2025](https://arxiv.org/html/2604.16839#bib.bib12)\), structure memory into interconnected semantic networks to enable adaptive management\.Retrieval mechanism\-oriented approaches, exemplified by MemoryBank\(Zhonget al\.,[2024](https://arxiv.org/html/2604.16839#bib.bib14)\), integrate semantic retrieval with memory forgetting curves for long\-term updating\.Architecture\-driven methods, including MemGPT\(Packeret al\.,[2023](https://arxiv.org/html/2604.16839#bib.bib11)\), employ hierarchical memory structures with explicit read and write operations to dynamically manage limited context windows\. Despite their demonstrated effectiveness, these approaches are typically developed in isolation, each prioritizing a single dimension—memory structure, retrieval strategy, or update mechanism—while largely overlooking their mutual interaction and joint contribution to long\-term coherence\.

More fundamentally, this component\-wise optimization often overlooks a critical aspect of long\-term coherence: thedynamic evolutionof memory structure\. Human memory is not a static database where items are stored and retrieved in isolation; rather, it is a dynamic system where connections are continuously reorganized by experience\. For example, a topic discussed today might trigger a memory from a month ago, not necessarily because they share surface\-level keywords, but because they are part of the same evolving narrative arc\. Current systems, by treating storage and retrieval as separate static processes, fail to capture this evolving connectivity, leading to agents that “remember” facts but lack the “continuity” of a developing relationship\.

To better understand how long\-term coherence can be maintained, we draw on a fundamental principle of biological memory:Hebbian learning\. In biological systems, experiences that are repeatedly co\-activated gradually develop stronger associations, a phenomenon often summarized as “neurons that fire together wire together” \(illustrated in Figure[2](https://arxiv.org/html/2604.16839#S1.F2)\)\. This associative organization allows related memories to be efficiently reactivated through spreading activation and supports the gradual consolidation of episodic experiences into more stable semantic knowledge\. Together, association, consolidation, and spreading activation form a tightly coupled memory process that enables biological systems to maintain coherent representations over extended time scales—capabilities that remain largely absent from current artificial memory designs\.

Building on this perspective, we propose HeLa\-Mem \(HebbianLearningassociativeMemory\), a unified memory architecture for LLM agents\. HeLa\-Mem represents conversational history as a dynamic graph with Hebbian learning dynamics and operates through coordinated mechanisms of online association, reflective consolidation, and dual\-path retrieval, establishing a unified memory management framework that captures both fine\-grained details and high\-level patterns\.

The primary contributions of our work are:

- •We proposeHeLa\-Mem, a bio\-inspired framework that utilizes anOnline Encoding & Associationmechanism to model conversation history as a dynamic Hebbian graph, where co\-activated memories strengthen connections to capture latent context\.
- •We introduce aReflective Consolidationframework usingHebbian Distillation, which identifies hub clusters and transforms them into structured semantic knowledge, preventing graph explosion while retaining key information\.
- •We implement aDual\-Path Retrievalstrategy that leverages spreading activation to traverse Hebbian edges, achieving the best average rank \(1\.25\) across all question categories\.
- •Comprehensive experiments on the LoCoMo benchmark validate HeLa\-Mem’s effectiveness, achieving superior performance across four categories while using significantly fewer context tokens\.

![Refer to caption](https://arxiv.org/html/2604.16839v1/figures/NeuronA_B.jpg)Figure 2:Conceptual illustration of Hebbian learning in associative memory\. Two memory nodes \(Neuron A and B\) representing distinct experiences \(e\.g\., a daytime event and a nighttime event\) develop strengthened synaptic connections when co\-activated through shared context\. This "neurons that fire together, wire together" principle forms the theoretical foundation of HeLa\-Mem’s dynamic memory graph\.
## 2Related Work

### 2\.1Memory for LLM Agents

![Refer to caption](https://arxiv.org/html/2604.16839v1/x2.png)Figure 3:The architectural overview of HeLa\-Mem\.The framework consists of three modules: \(1\)Hebbian Associationfor dynamic graph construction \(Section[3\.2](https://arxiv.org/html/2604.16839#S3.SS2)\); \(2\)Reflective Consolidationfor semantic knowledge distillation \(Section[3\.3](https://arxiv.org/html/2604.16839#S3.SS3)\); and \(3\)Retrieval and Responseusing a Dual\-Path strategy \(Section[3\.4](https://arxiv.org/html/2604.16839#S3.SS4)\)\.Existing Large Language Models face fundamental challenges in handling complex scenarios requiring long\-term coherence\(Huet al\.,[2025](https://arxiv.org/html/2604.16839#bib.bib3)\)\. Advancements in memory systems addressing this problem can be broadly grouped into three categories\.

Knowledge\-organization methods focus on capturing and structuring intermediate reasoning states\. Think\-in\-Memory\(Liuet al\.,[2023](https://arxiv.org/html/2604.16839#bib.bib13)\)stores evolving chains\-of\-thought, enabling consistency through continual updates\. A\-Mem\(Xuet al\.,[2025](https://arxiv.org/html/2604.16839#bib.bib12)\)organizes knowledge into an interconnected note network that spans sessions\.

Retrieval mechanism\-oriented approaches, pioneered by RAG\(Lewiset al\.,[2020](https://arxiv.org/html/2604.16839#bib.bib8)\), enrich the model with external memory libraries\. MemoryBank\(Zhonget al\.,[2024](https://arxiv.org/html/2604.16839#bib.bib14)\)logs conversations, events, and user traits in a vector database and refreshes them using a forgetting\-curve schedule\. Generative Agents\(Parket al\.,[2023](https://arxiv.org/html/2604.16839#bib.bib10)\)keep memories in natural language and add a reflection loop for relevance filtering\. EmotionalRAG\(Huanget al\.,[2024](https://arxiv.org/html/2604.16839#bib.bib18)\)retrieves memory entries by combining semantic similarity with the agent’s emotional state\.

Architecture\-driven designs alter the core control flow to manage context explicitly\. MemGPT\(Packeret al\.,[2023](https://arxiv.org/html/2604.16839#bib.bib11)\)adopts an OS\-like hierarchy with dedicated read/write calls\. SCM\(Lianget al\.,[2023](https://arxiv.org/html/2604.16839#bib.bib17)\)introduces dual buffers and a memory controller that gates selective recall\. Mem0\(Chhikaraet al\.,[2025](https://arxiv.org/html/2604.16839#bib.bib16)\)dynamically extracts and consolidates salient information for scalable long\-term memory\. MemoryOS\(Kanget al\.,[2025](https://arxiv.org/html/2604.16839#bib.bib15)\)introduces a three\-tier hierarchical storage with short\-term, mid\-term, and long\-term memory units, employing segment\-page organization for dynamic updating\.

### 2\.2Hebbian Learning in Neural Networks

Hebbian learning, summarized as “neurons that fire together wire together”\(Hebb,[2005](https://arxiv.org/html/2604.16839#bib.bib5)\), is a foundational principle in neuroscience describing how synaptic connections strengthen through correlated activity \(see Figure[2](https://arxiv.org/html/2604.16839#S1.F2)\)\. Formally, given activation statesxix\_\{i\}andxjx\_\{j\}of two neurons, the connection weightwi​jw\_\{ij\}is updated asΔ​wi​j=η⋅xi⋅xj\\Delta w\_\{ij\}=\\eta\\cdot x\_\{i\}\\cdot x\_\{j\}, whereη\\etais the learning rate\. This principle has been applied in Hopfield networks\(Hopfield,[1982](https://arxiv.org/html/2604.16839#bib.bib6)\), which demonstrate how recurrent neural networks with symmetric connections can function as associative memories, storing and retrieving patterns through energy minimization\. More recently,Ramsaueret al\.\([2020](https://arxiv.org/html/2604.16839#bib.bib7)\)show that modern Hopfield networks with continuous states are mathematically equivalent to the attention mechanism in Transformers, revealing deep connections between biological memory principles and contemporary deep learning architectures\. In the context of LLM agents, Hebbian dynamics offer a principled approach to capture latent associations between memories that may not be apparent from semantic similarity alone\.

## 3HeLa\-Mem Architecture

Inspired by the synaptic plasticity of the human brain, HeLa\-Mem models conversation history as a dynamic associative graph rather than a static log\. Our design is guided by three neuroscience intuitions: \(1\)Association over Isolation—memories that co\-occur should wire together, forming latent pathways beyond simple semantic similarity; \(2\)Active Consolidation—frequently accessed memory clusters should solidify into stable knowledge, similar to sleep\-based consolidation; and \(3\)Spreading Retrieval—recalling one memory should naturally trigger related concepts through established synaptic routes\.

Based on these principles \(Figure[3](https://arxiv.org/html/2604.16839#S2.F3)\), HeLa\-Mem operates through a continuous cognitive lifecycle:

- •Online Encoding & Association: Conversation turns are encoded into the Episodic Memory Graph, where a Hebbian Learning mechanism dynamically strengthens connections between co\-activated memories\.
- •Reflective Memory Agent: Upon reaching associative thresholds, this agent identifies hub nodes and appliesHebbian Distillationto consolidate them into stable semantic knowledge, preventing noise accumulation\.
- •Dual\-Path Retrieval: During query time, queries activate both specific episodic details and broader semantic knowledge through spreading activation\.

### 3\.1Memory Storage

#### 3\.1\.1Episodic Memory Graph

Conversation turns are stored as nodes in a weighted graph\. Each node contains the original text, a dense embedding, the timestamp, extracted keywords, and the speaker role\. Edges between nodes represent associative connections, with weights indicating the strength of association\. Initially, consecutive turns are connected with small weights; these weights evolve through Hebbian learning\.

#### 3\.1\.2Semantic Memory Store

The semantic level stores distilled knowledge extracted from episodic memories: specifically, we applyHebbian Distillationon hub\-centered clusters in the episodic graph to producedistilled semantic recordswith traceable evidence links to their source turns\.

- •User Model: Stable user characteristics \(e\.g\., “enjoys outdoor activities”\) with confidence scores and supporting evidence\.
- •Factual Memory: Extracted facts with absolute timestamps, such as event dates and relationships\.
- •Agent Knowledge: The agent’s established persona, preferences, and behavioral patterns\.

This serves as long\-term memory that persists beyond conversation windows\.

### 3\.2Online Encoding & Association

Synaptic efficacy in the brain is not fixed; it is plastic, evolving based on activity patterns\. We emulate this dynamic through Hebbian learning to capture latent associations that semantic embeddings alone might miss\.

Following the neuroscience principle that “neurons that fire together wire together,” edge weights strengthen when memories are co\-activated during retrieval:

wi​j\(t\+1\)=\(1−λ\)⋅wi​j\(t\)⏟synaptic decay\+η⋅𝕀​\(vi,vj∈𝒦t\)⏟active reinforcement,w\_\{ij\}^\{\(t\+1\)\}=\\underbrace\{\(1\-\\lambda\)\\cdot w\_\{ij\}^\{\(t\)\}\}\_\{\\text\{synaptic decay\}\}\+\\underbrace\{\\eta\\cdot\\mathbb\{I\}\(v\_\{i\},v\_\{j\}\\in\\mathcal\{K\}\_\{t\}\)\}\_\{\\text\{active reinforcement\}\},\(1\)whereλ\\lambdais the decay rate,η\\etais the learning rate, and𝕀​\(⋅\)\\mathbb\{I\}\(\\cdot\)is an indicator that the pair\(vi,vj\)\(v\_\{i\},v\_\{j\}\)is co\-activated in the current retrieval set𝒦t\\mathcal\{K\}\_\{t\}\. This dynamic allows frequently correlated memories to strengthen while unused connections fade over time\.

### 3\.3Reflective Memory Agent

While associative learning happens during active retrieval, long\-term memory maintenance relies on active consolidation\. To prevent memory overload and crystallize important information, we introduce a Reflective Agent that mimics the brain’s sleep\-based consolidation process throughHebbian Distillation\.

The Reflective Agent monitors the graph’s structural evolution to manage the memory lifecycle \(see Figure[4](https://arxiv.org/html/2604.16839#S3.F4)\), analogous to sleep\-based memory consolidation in the brain\.

Hub Detection\.Nodes that have accumulated high total edge weight through Hebbian learning are identified as hubs\. Specifically, a nodeviv\_\{i\}is flagged for consolidation if itsassociative strengthexceeds a thresholdδh​u​b\\delta\_\{hub\}:

D​\(vi\)=∑j∈𝒩​\(i\)wi​j\>δh​u​b\.D\(v\_\{i\}\)=\\sum\_\{j\\in\\mathcal\{N\}\(i\)\}w\_\{ij\}\>\\delta\_\{hub\}\.\(2\)
Upon detecting that a node’s accumulative weight exceeds the thresholdδh​u​b\\delta\_\{hub\}, the agent triggers Hebbian Distillation\. To capture the full context, the agent retrieves the hub node along with its strongly connected neighbors\. The LLM synthesizes this cluster of related memories to identify common themes and causal relationships, abstracting them into declarative semantic entries\. These distilled records are stored in the Semantic Memory Store, effectively compressing repetitive episodic details into stable, generalizable knowledge\.

Adaptive Forgetting\.This process is triggered when a node’s status falls below critical retention thresholds\. A memory is flagged for removal only if it simultaneously satisfies three criteria: \(1\) its total edge weight is belowδp​r​u​n​e\\delta\_\{prune\}\(indicating structural irrelevance\), \(2\) its inactive duration exceedsδa​g​e\\delta\_\{age\}\(indicating temporal dormancy\), and \(3\) it has zero recent access\. This strict compound criterion ensures that the system selectively removes noise while preserving strong, albeit older, associations\.

### 3\.4Dual\-Path Retrieval

Human memory retrieval is rarely a single\-step lookup; it is a spreading activation process where one thought triggers another\. HeLa\-Mem adopts a dual\-path retrieval strategy to emulate this interaction between direct recall and associative spreading\.

Given a query, retrieval proceeds in two stages\.

Base Activation\.Each episodic node receives an initial score combining embedding similarity, temporal decay, and keyword overlap:

Sb​a​s​e​\(vi\)\\displaystyle S\_\{base\}\(v\_\{i\}\)=\(sim​\(𝐪,𝐞i\)\+α⋅keyword\_match\)\\displaystyle=\\left\(\\text\{sim\}\(\\mathbf\{q\},\\mathbf\{e\}\_\{i\}\)\+\\alpha\\cdot\\text\{keyword\\\_match\}\\right\)\(3\)⋅γ​\(vi\),\\displaystyle\\quad\\cdot\\gamma\(v\_\{i\}\),wheresim​\(⋅,⋅\)\\text\{sim\}\(\\cdot,\\cdot\)denotes cosine similarity between the query embedding𝐪\\mathbf\{q\}and node embedding𝐞i\\mathbf\{e\}\_\{i\},γ​\(vi\)=exp⁡\(−Δ​t/τ\)\\gamma\(v\_\{i\}\)=\\exp\(\-\\Delta t/\\tau\)is the temporal decay factor with time constantτ\\tau, andα\\alphacontrols the bonus for keyword matches\.

![Refer to caption](https://arxiv.org/html/2604.16839v1/x3.png)Figure 4:Hebbian memory graph showing the Reflective Agent’s dual role\.Hub nodes\(red, high degree\) are candidates for Hebbian Distillation, consolidating related memories into structured semantic knowledge\.Isolated nodes\(gray, low degree with dashed circles\) are candidates for Adaptive Forgetting, maintaining memory efficiency\.Spreading Activation\.High\-scoring nodes propagate activation through Hebbian edges:

S​\(vj\)=Sb​a​s​e​\(vj\)\+β​∑i∈𝒩​\(j\)Sb​a​s​e​\(vi\)⋅wi​j,S\(v\_\{j\}\)=S\_\{base\}\(v\_\{j\}\)\+\\beta\\sum\_\{i\\in\\mathcal\{N\}\(j\)\}S\_\{base\}\(v\_\{i\}\)\\cdot w\_\{ij\},\(4\)where𝒩​\(j\)\\mathcal\{N\}\(j\)denotes the neighbors of nodevjv\_\{j\}in the memory graph,wi​jw\_\{ij\}is the Hebbian edge weight, andβ\\betacontrols the spreading activation strength\. This enables retrieval of memories that are semantically distant from the query but strongly associated with initially activated content—particularly beneficial for multi\-hop reasoning\.

Dual\-Path Ranking\.The final retrieval set is constructed by combining two ranked lists:

ℛf​i​n​a​l=Top\-​k​\(Sb​a​s​e\)⏟base path∪Top\-​m​\(S∣v∉Top\-​k\)⏟flip path,\\mathcal\{R\}\_\{final\}=\\underbrace\{\\text\{Top\-\}k\(S\_\{base\}\)\}\_\{\\text\{base path\}\}\\cup\\underbrace\{\\text\{Top\-\}m\(S\\mid v\\notin\\text\{Top\-\}k\)\}\_\{\\text\{flip path\}\},\(5\)where the base path selects the top\-kknodes bySb​a​s​eS\_\{base\}, and the flip path promotes up tommadditional nodes that rank highest by spreading\-augmented scoreSSbut were not already selected\. This dual\-path approach ensures retrieval of both semantically relevant memories and associatively linked memories that spreading activation surfaces\. Semantic memory entries are also retrieved and merged to form the final context\.

### 3\.5Response Generation

The LLM generates responses using the integrated context \(episodic memories and semantic knowledge\) along with a system prompt that establishes the conversational role\. The detailed prompt structure is provided in Appendix[A](https://arxiv.org/html/2604.16839#A1)\.

## 4Experiments

### 4\.1Experimental Settings

Dataset\.We conduct experiments on the LoCoMo benchmark\(Maharanaet al\.,[2024](https://arxiv.org/html/2604.16839#bib.bib22)\), specifically designed for assessing long\-term conversational memory capabilities\. It consists of ultra\-long dialogues averaging 300 turns and about 9K tokens per conversation\. Questions span multiple categories to systematically evaluate memory abilities\.

Evaluation Metrics\.Following prior work on long\-term conversational memory\(Maharanaet al\.,[2024](https://arxiv.org/html/2604.16839#bib.bib22); Xuet al\.,[2025](https://arxiv.org/html/2604.16839#bib.bib12)\), we employ standard F1 and BLEU\-1 scores to evaluate performance\.

Compared Methods\.We compare HeLa\-Mem with representative memory methods including LoCoMo \(Native\), ReadAgent\(Leeet al\.,[2024](https://arxiv.org/html/2604.16839#bib.bib21)\), MemoryBank\(Zhonget al\.,[2024](https://arxiv.org/html/2604.16839#bib.bib14)\), MemGPT\(Packeret al\.,[2023](https://arxiv.org/html/2604.16839#bib.bib11)\), A\-MemXuet al\.\([2025](https://arxiv.org/html/2604.16839#bib.bib12)\), Mem0\(Chhikaraet al\.,[2025](https://arxiv.org/html/2604.16839#bib.bib16)\), LightMem\(Fanget al\.,[2025](https://arxiv.org/html/2604.16839#bib.bib23)\), and MemoryOS\(Kanget al\.,[2025](https://arxiv.org/html/2604.16839#bib.bib15)\)\. Baseline results for GPT\-4o\-mini, GPT\-4o, and Qwen2\.5\-3b are reported fromXuet al\.\([2025](https://arxiv.org/html/2604.16839#bib.bib12)\); results for Qwen2\.5\-14b are reported fromYanet al\.\([2025](https://arxiv.org/html/2604.16839#bib.bib24)\)\. MemoryOS results marked with†are reproduced by us\.

Implementation Details\.We evaluate HeLa\-Mem across four backbone LLMs: GPT\-4o\-mini, GPT\-4o\(Achiamet al\.,[2023](https://arxiv.org/html/2604.16839#bib.bib19)\), Qwen2\.5\-14b, and Qwen2\.5\-3b\(Yanget al\.,[2024](https://arxiv.org/html/2604.16839#bib.bib20)\)\. For HeLa\-Mem, the time decay constant is set toτ=60\\tau=60days\. Episodic retrieval usesk=10k=10, and semantic retrieval usesk=5k=5\. The Hebbian learning rate isη=0\.02\\eta=0\.02, edge decay rateλ=0\.995\\lambda=0\.995, spreading activation strengthβ=0\.1\\beta=0\.1, and spreading thresholdθ=0\.6\\theta=0\.6\.

### 4\.2Main Results

Table[1](https://arxiv.org/html/2604.16839#S4.T1)presents the detailed performance breakdown across four different LLM backbones: GPT\-4o\-mini, GPT\-4o, Qwen2\.5\-14b, and Qwen2\.5\-3b\. Results show that HeLa\-Mem consistently outperforms baselines across varying model sizes and capabilities\.

Table 1:Experimental results on LoCoMo dataset of QA tasks across four categories \(Multi Hop, Temporal, Open Domain, and Single Hop\) using different methods\. Results are reported in F1 and BLEU\-1 \(%\) scores\. Best performance per model is marked in bold\. Missing baselines are marked with “\-” \(Token Length \(↓\\downarrow\): lower values are better;†: Reproduced results\)\.Multi HopTemporalOpen DomainSingle HopTokenModelMethodF1BLEUF1BLEUF1BLEUF1BLEULength \(↓\\downarrow\)GPT\-4o\-miniLoCoMo25\.0219\.7518\.4114\.7712\.0411\.1640\.3629\.0516,910ReadAgent9\.156\.4812\.608\.875\.315\.129\.677\.66643MemoryBank5\.004\.779\.686\.995\.565\.946\.615\.16432MemGPT26\.6517\.7225\.5219\.449\.157\.4441\.0434\.3416,977A\-Mem27\.0220\.0945\.8536\.6712\.1412\.0044\.6537\.062,520MemoryOS†38\.3929\.5241\.5835\.9923\.7517\.1745\.8640\.702,000\\cellcolormycolor\-1HeLa\-Mem\\cellcolormycolor\-140\.14\\cellcolormycolor\-131\.26\\cellcolormycolor\-147\.29\\cellcolormycolor\-141\.28\\cellcolormycolor\-129\.70\\cellcolormycolor\-123\.45\\cellcolormycolor\-151\.89\\cellcolormycolor\-146\.25\\cellcolormycolor\-11,010GPT\-4oLoCoMo28\.0018\.479\.095\.7816\.4714\.8061\.5654\.1916,910ReadAgent14\.619\.954\.163\.198\.848\.3712\.4610\.29805MemoryBank6\.494\.692\.472\.436\.435\.308\.287\.10569MemGPT30\.3622\.8317\.2913\.1812\.2411\.8760\.1653\.3516,987A\-Mem32\.8623\.7639\.4131\.2317\.1015\.8448\.4342\.971,216MemoryOS†40\.2331\.8943\.5733\.5520\.5815\.8543\.8539\.032,000\\cellcolormycolor\-1HeLa\-Mem\\cellcolormycolor\-139\.12\\cellcolormycolor\-129\.82\\cellcolormycolor\-150\.79\\cellcolormycolor\-144\.54\\cellcolormycolor\-124\.38\\cellcolormycolor\-119\.24\\cellcolormycolor\-149\.69\\cellcolormycolor\-144\.08\\cellcolormycolor\-11,036Qwen2\.5\-14bA\-Mem22\.0915\.2827\.1922\.0513\.4910\.7433\.7530\.041,300MEM031\.7324\.8228\.9626\.2415\.0311\.2842\.5835\.15\-MemoryOS38\.1929\.2632\.2427\.8620\.2715\.9446\.3341\.62\-LightMem25\.4519\.6132\.0327\.7015\.8111\.8134\.9231\.22\-\\cellcolormycolor\-1HeLa\-Mem\\cellcolormycolor\-136\.59\\cellcolormycolor\-127\.02\\cellcolormycolor\-136\.08\\cellcolormycolor\-129\.91\\cellcolormycolor\-124\.22\\cellcolormycolor\-120\.23\\cellcolormycolor\-149\.95\\cellcolormycolor\-145\.15\\cellcolormycolor\-1944Qwen2\.5\-3bLoCoMo4\.614\.293\.112\.714\.555\.977\.035\.6916,910ReadAgent2\.471\.783\.013\.015\.575\.223\.252\.51776MemoryBank3\.603\.391\.721\.976\.636\.584\.113\.32298MemGPT5\.074\.312\.942\.957\.047\.107\.265\.5216,961A\-Mem18\.2311\.9424\.3219\.7416\.4814\.3123\.6319\.231,300MemoryOS†19\.2014\.8420\.8516\.0513\.5710\.8625\.6518\.782,000\\cellcolormycolor\-1HeLa\-Mem\\cellcolormycolor\-120\.12\\cellcolormycolor\-114\.59\\cellcolormycolor\-124\.79\\cellcolormycolor\-121\.35\\cellcolormycolor\-112\.24\\cellcolormycolor\-110\.24\\cellcolormycolor\-129\.51\\cellcolormycolor\-125\.91\\cellcolormycolor\-11,072

Detailed Analysis \(GPT\-4o\-mini\)\.Focusing on GPT\-4o\-mini as a representative case, HeLa\-Mem demonstrates significant advantages\. In Multi\-hop reasoning, it achieves 40\.14%, outperforming MemoryOS \(38\.39%\) and A\-Mem \(27\.02%\)\. This validates the Hebbian graph’s ability to bridge disparate information pieces through learned associations\.

For Temporal tasks, HeLa\-Mem scores 47\.29%, surpassing MemoryOS \(41\.58%\)\. The preservation of absolute timestamps during distillation allows accurate grounding of relative time expressions\. In Open Domain questions, it reaches 29\.70%, providing useful semantic context even for topics outside the main conversation flow\. HeLa\-Mem leads in Single\-hop tasks \(51\.89%\), demonstrating that the hierarchical retrieval approach remains effective for straightforward factual queries\.

Token Efficiency\.Notably, HeLa\-Mem achieves these results using only∼\\sim1,010 tokens on average\. This efficiency stems from the selective nature of Hebbian retrieval, which surfaces only the most strongly associated memories without the computational overhead of processing full context windows\.

Robustness Across Backbones\.To validate stability, Table[2](https://arxiv.org/html/2604.16839#S4.T2)summarizes the averaged performance across all backbones\. HeLa\-Mem achieves the bestAverage Rank of 1\.25, significantly surpassing MemoryOS \(2\.25\)\. This confirms that the theoretical advantages of Hebbian dynamics translate into robust empirical gains regardless of the underlying LLM’s scale\.

Table 2:Averaged results across three backbone LLMs \(GPT\-4o\-mini, GPT\-4o, Qwen2\.5\-3b\)\. Avg Rank is computed across all categories lower is better \(†Reproduced results\)\.MethodMulti\-hopTemporalOpenSingleAvgF1BLF1BLF1BLF1BLRankLoCoMo19\.2114\.1710\.207\.7511\.0210\.6436\.3229\.645\.00ReadAgent8\.746\.076\.595\.026\.576\.248\.466\.826\.38MemoryBank5\.034\.284\.623\.806\.215\.946\.335\.196\.88MemGPT20\.6914\.9515\.2511\.869\.488\.8036\.1531\.074\.50A\-Mem26\.0418\.6036\.5329\.2115\.2414\.0538\.9033\.093\.00MemoryOS†32\.6125\.4235\.3328\.5319\.3014\.6338\.4532\.842\.25\\cellcolormycolor\-1HeLa\-Mem\\cellcolormycolor\-133\.13\\cellcolormycolor\-125\.22\\cellcolormycolor\-140\.96\\cellcolormycolor\-135\.72\\cellcolormycolor\-122\.11\\cellcolormycolor\-117\.64\\cellcolormycolor\-143\.70\\cellcolormycolor\-138\.75\\cellcolormycolor\-11\.25

![Refer to caption](https://arxiv.org/html/2604.16839v1/x4.png)Figure 5:Dual\-Path Retrieval for multi\-hop reasoning\. Given a query requiring both “career influence” and “meeting location,” the semantic path retrieves Turn 89 \(career context\)\. The Hebbian path then propagates activation through learned associations \(edge weight 0\.52\) to retrieve Turn 15 \(location context\), which semantic similarity alone would miss\. The baseline without Spreading Activation cannot bridge these memories\.
### 4\.3Ablation Study

To understand the contribution of each component in HeLa\-Mem, we conduct ablation experiments by removing key modules\. Table[3](https://arxiv.org/html/2604.16839#S4.T3)presents the results on GPT\-4o\-mini\.

Table 3:Ablation study on LoCoMo benchmark\. Results show F1 / BLEU\-1 scores \(%\)\.VariantMulti\-hopTemporalOpenSingleAvgF1F1BLF1BLF1BLF1BL\\cellcolormycolor\-1HeLa\-Mem \(Full\)\\cellcolormycolor\-136\.04\\cellcolormycolor\-126\.56\\cellcolormycolor\-146\.23\\cellcolormycolor\-140\.48\\cellcolormycolor\-129\.50\\cellcolormycolor\-123\.55\\cellcolormycolor\-145\.04\\cellcolormycolor\-139\.80\\cellcolormycolor\-134\.74w/o Forgetting36\.7127\.9546\.5040\.9130\.5824\.4545\.2440\.0134\.28w/o Spreading Activation33\.8825\.5744\.3639\.6227\.7622\.2843\.3438\.3332\.19w/o Reflective Agent30\.1722\.3842\.1936\.9224\.5119\.8340\.4634\.0729\.87

Effect of Reflective Agent\.Removing the Reflective Memory Agent causes the largest performance drop \(34\.74%→\\rightarrow29\.87%\), with Multi\-hop reasoning suffering most severely \(36\.04%→\\rightarrow30\.17%\)\. This confirms that the meta\-cognitive component is essential for identifying high\-degree hub nodes and triggering Hebbian Distillation, which consolidates related episodic memories into structured semantic knowledge\.

Effect of Spreading Activation\.Disabling spreading activation leads to a notable performance decline \(34\.74%→\\rightarrow32\.19%\), particularly affecting Multi\-hop reasoning \(36\.04%→\\rightarrow33\.88%\)\. This validates our dual\-path design: without spreading activation, the system degrades to a single semantic path, failing to retrieve memories that are semantically distant from the query but strongly associated through Hebbian connections, which is crucial for multi\-hop reasoning that requires bridging disparate pieces of information\. This underscores the value of leveraging historically learned pathways rather than relying solely on static semantic similarity\.

Effect of Adaptive Forgetting\.Interestingly, removing the forgetting mechanism shows minimal impact on the current LoCoMo benchmark\. We attribute this to the limited conversation length \(∼\\sim300 turns\), which does not yet saturate the memory capacity\. However, forgetting is critical for scalability in reliable agent deployment\. Without it, the memory store would grow unboundedly, inevitably increasing retrieval costs and introducing noise from obsolete information\. This mechanism ensures that the system’s performance remains stable regardless of conversation duration, acting as a garbage collection process for irrelevant associations\.

### 4\.4Reflective Agent: Memory Lifecycle Management

Figure[4](https://arxiv.org/html/2604.16839#S3.F4)illustrates the structure of the Hebbian memory graph after encoding a multi\-session conversation\. The graph comprises 23 episodic memory nodes, where edge thickness reflects the association strength accumulated through co\-activation during retrieval\. Node appearance encodes the lifecycle status assigned by the Reflective Agent:

Hub Nodes \(Red, Solid\)\.Ten nodes exhibit degree≥10\\geq 10, indicating dense connectivity within the graph\. These nodes tend to occupy central positions, as they serve as anchors connecting multiple conversation threads\. The Reflective Agent identifies such high\-degree nodes and applies Hebbian Distillation to consolidate their associated episodic clusters into stable semantic entries\. For instance, the node with degree 17 in Figure[4](https://arxiv.org/html/2604.16839#S3.F4)links several temporally dispersed discussions, making it a natural candidate for knowledge extraction\.

Isolated Nodes \(Gray, Dashed\)\.Four nodes exhibit degree<4<4and show no recent access activity\. Their peripheral positions and weak integration suggest limited relevance to the ongoing narrative\. The Adaptive Forgetting mechanism flags these nodes for removal, thereby controlling graph growth and reducing retrieval noise over extended conversations\. The remaining nine nodes \(blue\) have moderate connectivity and are retained in the episodic store\.

This visualization confirms that Hebbian learning enables automated lifecycle management without manual annotation \(see Appendix[B](https://arxiv.org/html/2604.16839#A2)for the edge\-weight heatmap\)\.

### 4\.5Case Study: Trace Analysis of Associative Recall

We analyze the retrieval process for a multi\-hop query: “Where did you first meet the person who influenced your career choice?” \(see Figure[5](https://arxiv.org/html/2604.16839#S4.F5)\)\.

Historical Context\.The entities “Dr\. Sarah” \(Person\) and “Adoption Support Conference” \(Location\) appeared together in Session 1, and were subsequently co\-activated in Sessions 39 and 61\. Through Hebbian learning, these repeated co\-occurrences accumulated a strong associative weight ofw89,15≈0\.52w\_\{89,15\}\\approx 0\.52between the career advice memory \(Turn 89\) and the meeting event \(Turn 15\)\. This accumulated weight reflects the frequency and recency of co\-activation across the conversation history\.

Experimental Trace\.The baseline method identifies Turn 89 \(“Dr\. Sarah encouraged…”\) as the top candidate due to high semantic similarity \(0\.82\) but fails to retrieve Turn 15 \(“Met Dr\. Sarah at…”\) because its low similarity score of 0\.35 falls below the retrieval threshold\. This results in a “semantic trap” where the model knows the person but not the location\.

In contrast, HeLa\-Mem utilizes the Hebbian path\. Spreading activation from the retrieved Turn 89 propagates through the learned edge \(0\.520\.52\) to Turn 15\. The final retrieval score for Turn 15 is dynamically updated:

St​o​t​a​l=0\.35⏟Semantic\+0\.36⏟Hebbian≈0\.71S\_\{total\}=\\underbrace\{0\.35\}\_\{\\text\{Semantic\}\}\+\\underbrace\{0\.36\}\_\{\\text\{Hebbian\}\}\\approx\\mathbf\{0\.71\}\(6\)This score boost, derived strictly from historical association, promotes Turn 15 into the active context\. By retrieving both the cue \(Person\) and the target \(Location\), HeLa\-Mem correctly synthesizes the answer: “At the adoption support conference\.” This demonstrates how Hebbian associations complement semantic retrieval for complex reasoning\.

## 5Conclusion

We introduceHeLa\-Mem, a bio\-inspired memory architecture that models conversation history as a dynamic graph driven by Hebbian learning principles\. Unlike static context windows, HeLa\-Mem mimics the brain’s plasticity, where “neurons that fire together, wire together,” enabling the spontaneous emergence of associative pathways for retrieval\. Building on this foundation, our Reflective Agent distills transient episodes into structured semantic knowledge, while Adaptive Forgetting ensures long\-term scalability\. Experiments on the LoCoMo benchmark demonstrate that this synergy between associative retention and semantic consolidation yields superior performance across diverse question types and robustness across diverse LLM backbones\. These findings suggest that incorporating neuro\-symbolic dynamics offers a promising direction for evolving static LLMs into lifelong learning agents\.

## 6Limitations

While HeLa\-Mem effectively models long\-term memory consolidation, it faces a “cold start” challenge: Hebbian weights require sufficient interaction history to accumulate, meaning the benefits of associative retrieval are less pronounced in early conversation stages\. Future work may explore initializing Hebbian edges using semantic similarity as a prior, allowing the graph structure to bootstrap before sufficient co\-occurrences accumulate\. Additionally, the quality of both Semantic Memory and Hub Detection relies on the capabilities of the underlying LLM; hallucinations or reasoning errors during the distillation process could propagate into the long\-term storage, potentially affecting future retrieval accuracy\.

## 7Ethical Considerations

We use the publicly available LoCoMo benchmark and do not collect any private user data\. The proposed memory architecture is intended to enhance the consistency of LLM agents\. However, we acknowledge that long\-term memory systems could potentially reinforce biases present in the underlying LLM if not carefully monitored\. The distilled semantic memories should be treated with the same caution as standard LLM generations regarding accuracy and bias\.

## Acknowledgments

This work was partially supported by the Guangdong Provincial Natural Science Foundation General Program \(Grant No\. 2026A1515012118\)\.

## References

- J\. Achiam, S\. Adler, S\. Agarwal, L\. Ahmad, I\. Akkaya, F\. L\. Aleman, D\. Almeida, J\. Altenschmidt, S\. Altman, S\. Anadkat, R\. Avila, I\. Babuschkin, S\. Balaji, V\. Balcom, P\. Baltescu, H\. Bao, M\. Bavarian, J\. Belgum, I\. Bello, J\. Berdine, G\. Bernadett\-Shapiro, C\. Berner, L\. Bogdonoff, O\. Boiko, M\. Boyd, A\. Brakman, G\. Brockman, T\. Brooks, M\. Brundage, K\. Button, T\. Cai, R\. Campbell, A\. Cann, B\. Carey, C\. Carlson, R\. Carmichael, B\. Chan, C\. Chang, F\. Chantzis, D\. Chen, S\. Chen, R\. Chen, J\. Chen, M\. Chen, B\. Chess, C\. Cho, C\. Chu, H\. W\. Chung, D\. Cummings, J\. Currier, Y\. Dai, C\. Decareaux, T\. Degry, N\. Deutsch, D\. Deville, A\. Dhar, D\. Dohan, S\. Dowling, S\. Dunning, A\. Ecoffet, A\. Eleti, T\. Eloundou, D\. Farhi, L\. Fedus, N\. Felix, S\. P\. Fishman, J\. Forte, I\. Fulford, L\. Gao, E\. Georges, C\. Gibson, V\. Goel, T\. Gogineni, G\. Goh, R\. Gontijo\-Lopes, J\. Gordon, M\. Grafstein, S\. Gray, R\. Greene, J\. Gross, S\. S\. Gu, Y\. Guo, C\. Hallacy, J\. Han, J\. Harris, Y\. He, M\. Heaton, J\. Heidecke, C\. Hesse, A\. Hickey, W\. Hickey, P\. Hoeschele, B\. Houghton, K\. Hsu, S\. Hu, X\. Hu, J\. Huizinga, S\. Jain, S\. Jain, J\. Jang, A\. Jiang, R\. Jiang, H\. Jin, D\. Jin, S\. Jomoto, B\. Jonn, H\. Jun, T\. Kaftan, Ł\. Kaiser, A\. Kamali, I\. Kanitscheider, N\. S\. Keskar, T\. Khan, L\. Kilpatrick, J\. W\. Kim, C\. Kim, Y\. Kim, J\. H\. Kirchner, J\. Kiros, M\. Knight, D\. Kokotajlo, Ł\. Kondraciuk, A\. Kondrich, A\. Konstantinidis, K\. Kosic, G\. Krueger, V\. Kuo, M\. Lampe, I\. Lan, T\. Lee, J\. Leike, J\. Leung, D\. Levy, C\. M\. Li, R\. Lim, M\. Lin, S\. Lin, M\. Litwin, T\. Lopez, R\. Lowe, P\. Lue, A\. Makanju, K\. Malfacini, S\. Manning, T\. Markov, Y\. Markovski, B\. Martin, K\. Mayer, A\. Mayne, B\. McGrew, S\. M\. McKinney, C\. McLeavey, P\. McMillan, J\. McNeil, D\. Medina, A\. Mehta, J\. Menick, L\. Metz, A\. Mishchenko, P\. Mishkin, V\. Monaco, E\. Morikawa, D\. Mossing, T\. Mu, M\. Murati, O\. Murk, D\. Mély, A\. Nair, R\. Nakano, R\. Nayak, A\. Neelakantan, R\. Ngo, H\. Noh, L\. Ouyang, C\. O’Keefe, J\. Pachocki, A\. Paino, J\. Palermo, A\. Pantuliano, G\. Parascandolo, J\. Parish, E\. Parparita, A\. Passos, M\. Pavlov, A\. Peng, A\. Perelman, F\. d\. A\. B\. Peres, M\. Petrov, H\. P\. d\. O\. Pinto, M\. Pokorny, M\. Pokrass, V\. H\. Pong, T\. Powell, A\. Power, B\. Power, E\. Proehl, R\. Puri, A\. Radford, J\. Rae, A\. Ramesh, C\. Raymond, F\. Real, K\. Rimbach, C\. Ross, B\. Rotsted, H\. Roussez, N\. Ryder, M\. Saltarelli, T\. Sanders, S\. Santurkar, G\. Sastry, H\. Schmidt, D\. Schnurr, J\. Schulman, D\. Selsam, K\. Sheppard, T\. Sherbakov, J\. Shieh, S\. Shoker, P\. Shyam, S\. Sidor, E\. Sigler, M\. Simens, J\. Sitkin, K\. Slama, I\. Sohl, B\. Sokolowsky, Y\. Song, N\. Staudacher, F\. P\. Such, N\. Summers, I\. Sutskever, J\. Tang, N\. Tezak, M\. B\. Thompson, P\. Tillet, A\. Tootoonchian, E\. Tseng, P\. Tuggle, N\. Turley, J\. Tworek, J\. F\. C\. Uribe, A\. Vallone, A\. Vijayvergiya, C\. Voss, C\. Wainwright, J\. J\. Wang, A\. Wang, B\. Wang, J\. Ward, J\. Wei, C\. Weinmann, A\. Welihinda, P\. Welinder, J\. Weng, L\. Weng, M\. Wiethoff, D\. Willner, C\. Winter, S\. Wolrich, H\. Wong, L\. Workman, S\. Wu, J\. Wu, M\. Wu, K\. Xiao, T\. Xu, S\. Yoo, K\. Yu, Q\. Yuan, W\. Zaremba, R\. Zellers, C\. Zhang, M\. Zhang, S\. Zhao, T\. Zheng, J\. Zhuang, W\. Zhuk, and B\. Zoph \(2023\)GPT\-4 technical report\.arXiv preprint arXiv:2303\.08774\.Cited by:[§4\.1](https://arxiv.org/html/2604.16839#S4.SS1.p4.7)\.
- Mem0: building production\-ready ai agents with scalable long\-term memory\.arXiv preprint arXiv:2504\.19413\.Cited by:[§2\.1](https://arxiv.org/html/2604.16839#S2.SS1.p4.1),[§4\.1](https://arxiv.org/html/2604.16839#S4.SS1.p3.1)\.
- J\. Fang, X\. Deng, H\. Xu, Z\. Jiang, Y\. Tang, Z\. Xu, S\. Deng, Y\. Yao, M\. Wang, S\. Qiao, H\. Chen, and N\. Zhang \(2025\)LightMem: lightweight and efficient memory\-augmented generation\.arXiv preprint arXiv:2510\.18866\.Cited by:[Appendix D](https://arxiv.org/html/2604.16839#A4.p1.6),[§4\.1](https://arxiv.org/html/2604.16839#S4.SS1.p3.1)\.
- D\. O\. Hebb \(2005\)The organization of behavior: a neuropsychological theory\.Psychology press\.Cited by:[§2\.2](https://arxiv.org/html/2604.16839#S2.SS2.p1.5)\.
- J\. J\. Hopfield \(1982\)Neural networks and physical systems with emergent collective computational abilities\.Proceedings of the National Academy of Sciences79\(8\),pp\. 2554–2558\.Cited by:[§2\.2](https://arxiv.org/html/2604.16839#S2.SS2.p1.5)\.
- Y\. Hu, S\. Liu, Y\. Yue, G\. Zhang, B\. Liu, F\. Zhu, J\. Lin, H\. Guo, S\. Dou, Z\. Xi, S\. Jin, J\. Tan, Y\. Yin, J\. Liu, Z\. Zhang, Z\. Sun, Y\. Zhu, H\. Sun, B\. Peng, Z\. Cheng, X\. Fan, J\. Guo, X\. Yu, Z\. Zhou, Z\. Hu, J\. Huo, J\. Wang, Y\. Niu, Y\. Wang, Z\. Yin, X\. Hu, Y\. Liao, Q\. Li, K\. Wang, W\. Zhou, Y\. Liu, D\. Cheng, Q\. Zhang, T\. Gui, S\. Pan, Y\. Zhang, P\. Torr, Z\. Dou, J\. Wen, X\. Huang, Y\. Jiang, and S\. Yan \(2025\)Memory in the age of ai agents\.arXiv preprint arXiv:2512\.13564\.Cited by:[§1](https://arxiv.org/html/2604.16839#S1.p1.1),[§2\.1](https://arxiv.org/html/2604.16839#S2.SS1.p1.1)\.
- L\. Huang, H\. Lan, Z\. Sun, C\. Shi, and T\. Bai \(2024\)Emotional rag: enhancing role\-playing agents through emotional retrieval\.In2024 IEEE International Conference on Knowledge Graph \(ICKG\),pp\. 120–127\.Cited by:[§2\.1](https://arxiv.org/html/2604.16839#S2.SS1.p3.1)\.
- J\. Kang, M\. Ji, Z\. Zhao, and T\. Bai \(2025\)Memory os of ai agent\.arXiv preprint arXiv:2506\.06326\.Cited by:[§2\.1](https://arxiv.org/html/2604.16839#S2.SS1.p4.1),[§4\.1](https://arxiv.org/html/2604.16839#S4.SS1.p3.1)\.
- K\. Lee, X\. Chen, H\. Furuta, J\. Canny, and I\. Fischer \(2024\)A human\-inspired reading agent with gist memory of very long contexts\.arXiv preprint arXiv:2402\.09727\.Cited by:[§4\.1](https://arxiv.org/html/2604.16839#S4.SS1.p3.1)\.
- P\. Lewis, E\. Perez, A\. Piktus, F\. Petroni, V\. Karpukhin, N\. Goyal, H\. Küttler, M\. Lewis, W\. Yih, T\. Rocktäschel, S\. Riedel, and D\. Kiela \(2020\)Retrieval\-augmented generation for knowledge\-intensive nlp tasks\.Advances in neural information processing systems33,pp\. 9459–9474\.Cited by:[§2\.1](https://arxiv.org/html/2604.16839#S2.SS1.p3.1)\.
- J\. Li, Y\. Fu, J\. Liu, L\. Cao, W\. Ji, M\. Yang, I\. King, and M\. Yang \(2026\)Discrete tokenization for multimodal llms: a comprehensive survey\.IEEE Transactions on Pattern Analysis and Machine Intelligence\(\),pp\. 1–24\.External Links:[Document](https://dx.doi.org/10.1109/TPAMI.2026.3676982)Cited by:[§1](https://arxiv.org/html/2604.16839#S1.p1.1)\.
- J\. Liang, H\. Li, C\. Li, J\. Zhou, S\. Jiang, Z\. Wang, C\. Ji, Z\. Zhu, R\. Liu, T\. Ren, J\. Fu, S\. Ng, X\. Liang, M\. Liu, and B\. Qin \(2025\)AI meets brain: memory systems from cognitive neuroscience to autonomous agents\.arXiv preprint arXiv:2512\.23343\.Cited by:[§1](https://arxiv.org/html/2604.16839#S1.p1.1)\.
- X\. Liang, B\. Wang, H\. Huang, S\. Wu, P\. Wu, L\. Lu, Z\. Ma, and Z\. Li \(2023\)Scm: enhancing large language model with self\-controlled memory framework\.arXiv preprint arXiv:2304\.13343\.Cited by:[§2\.1](https://arxiv.org/html/2604.16839#S2.SS1.p4.1)\.
- J\. Liu, Z\. Qiu, Z\. Li, Q\. Dai, W\. Yu, J\. Zhu, M\. Hu, M\. Yang, T\. Chua, and I\. King \(2025\)A survey of personalized large language models: progress and future directions\.arXiv preprint arXiv:2502\.11528\.Cited by:[§1](https://arxiv.org/html/2604.16839#S1.p1.1)\.
- J\. Liu, W\. Yu, Q\. Dai, Z\. Li, J\. Zhu, M\. Yang, T\. Chua, and I\. King \(2026\)PerFit: exploring personalization shifts in representation space of LLMs\.InThe Fourteenth International Conference on Learning Representations,Cited by:[§1](https://arxiv.org/html/2604.16839#S1.p1.1)\.
- L\. Liu, X\. Yang, Y\. Shen, B\. Hu, Z\. Zhang, J\. Gu, and G\. Zhang \(2023\)Think\-in\-memory: recalling and post\-thinking enable LLMs with long\-term memory\.arXiv preprint arXiv:2311\.08719\.Cited by:[§2\.1](https://arxiv.org/html/2604.16839#S2.SS1.p2.1)\.
- A\. Maharana, D\. Lee, S\. Tulyakov, M\. Bansal, F\. Barbieri, and Y\. Fang \(2024\)Evaluating very long\-term conversational memory of LLM agents\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),Bangkok, Thailand,pp\. 13851–13870\.External Links:[Link](https://aclanthology.org/2024.acl-long.747/),[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.747)Cited by:[§4\.1](https://arxiv.org/html/2604.16839#S4.SS1.p1.1),[§4\.1](https://arxiv.org/html/2604.16839#S4.SS1.p2.1)\.
- C\. Packer, V\. Fang, S\. Patil, K\. Lin, S\. Wooders, and J\. Gonzalez \(2023\)MemGPT: towards LLMs as operating systems\.\.Cited by:[§1](https://arxiv.org/html/2604.16839#S1.p2.1),[§2\.1](https://arxiv.org/html/2604.16839#S2.SS1.p4.1),[§4\.1](https://arxiv.org/html/2604.16839#S4.SS1.p3.1)\.
- J\. S\. Park, J\. O’Brien, C\. J\. Cai, M\. R\. Morris, P\. Liang, and M\. S\. Bernstein \(2023\)Generative agents: interactive simulacra of human behavior\.InProceedings of the 36th annual acm symposium on user interface software and technology,pp\. 1–22\.Cited by:[§2\.1](https://arxiv.org/html/2604.16839#S2.SS1.p3.1)\.
- H\. Ramsauer, B\. Schäfl, J\. Lehner, P\. Seidl, M\. Widrich, T\. Adler, L\. Gruber, M\. Holzleitner, M\. Pavlović, G\. K\. Sandve, V\. Greiff, D\. Kreil, M\. Kopp, G\. Klambauer, J\. Brandstetter, and S\. Hochreiter \(2020\)Hopfield networks is all you need\.arXiv preprint arXiv:2008\.02217\.Cited by:[§2\.2](https://arxiv.org/html/2604.16839#S2.SS2.p1.5)\.
- Y\. Wu, S\. Liang, C\. Zhang, Y\. Wang, Y\. Zhang, H\. Guo, R\. Tang, and Y\. Liu \(2025\)From human memory to AI memory: a survey on memory mechanisms in the era of LLMs\.arXiv preprint arXiv:2504\.15965\.Cited by:[§1](https://arxiv.org/html/2604.16839#S1.p1.1)\.
- W\. Xu, Z\. Liang, K\. Mei, H\. Gao, J\. Tan, and Y\. Zhang \(2025\)A\-MEM: agentic memory for LLM agents\.arXiv preprint arXiv:2502\.12110\.Cited by:[§1](https://arxiv.org/html/2604.16839#S1.p2.1),[§2\.1](https://arxiv.org/html/2604.16839#S2.SS1.p2.1),[§4\.1](https://arxiv.org/html/2604.16839#S4.SS1.p2.1),[§4\.1](https://arxiv.org/html/2604.16839#S4.SS1.p3.1)\.
- B\. Yan, C\. Li, H\. Qian, S\. Lu, and Z\. Liu \(2025\)General agentic memory via deep research\.arXiv preprint arXiv:2511\.18423\.Cited by:[§4\.1](https://arxiv.org/html/2604.16839#S4.SS1.p3.1)\.
- A\. Yang, B\. Yang, B\. Zhang, B\. Hui, B\. Zheng, B\. Yu, C\. Li, D\. Liu, F\. Huang, H\. Wei, H\. Lin, J\. Yang, J\. Tu, J\. Zhang, J\. Yang, J\. Yang, J\. Zhou, J\. Lin, K\. Dang, K\. Lu, K\. Bao, K\. Yang, L\. Yu, M\. Li, M\. Xue, P\. Zhang, Q\. Zhu, R\. Men, R\. Lin, T\. Li, T\. Xia, X\. Ren, X\. Ren, Y\. Fan, Y\. Su, Y\. Zhang, Y\. Wan, Y\. Liu, Z\. Cui, Z\. Zhang, and Z\. Qiu \(2024\)Qwen2\.5 technical report\.arXiv preprint arXiv:2412\.15115\.External Links:[Link](https://doi.org/10.48550/arXiv.2412.15115),[Document](https://dx.doi.org/10.48550/ARXIV.2412.15115),2412\.15115Cited by:[§1](https://arxiv.org/html/2604.16839#S1.p1.1),[§4\.1](https://arxiv.org/html/2604.16839#S4.SS1.p4.7)\.
- Z\. Zhang, Q\. Dai, X\. Bo, C\. Ma, R\. Li, X\. Chen, J\. Zhu, Z\. Dong, and J\. Wen \(2025\)A survey on the memory mechanism of large language model\-based agents\.ACM Trans\. Inf\. Syst\.43\(6\)\.External Links:ISSN 1046\-8188,[Link](https://doi.org/10.1145/3748302),[Document](https://dx.doi.org/10.1145/3748302)Cited by:[§1](https://arxiv.org/html/2604.16839#S1.p1.1)\.
- W\. Zhong, L\. Guo, Q\. Gao, H\. Ye, and Y\. Wang \(2024\)Memorybank: enhancing large language models with long\-term memory\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.38,pp\. 19724–19731\.Cited by:[§1](https://arxiv.org/html/2604.16839#S1.p2.1),[§2\.1](https://arxiv.org/html/2604.16839#S2.SS1.p3.1),[§4\.1](https://arxiv.org/html/2604.16839#S4.SS1.p3.1)\.

## Appendix ALLM Prompts

This appendix provides the core LLM prompts used in the HeLa\-Mem system\.

### A\.1Hebbian Distillation

The Reflective Agent uses the following prompt to extract structured knowledge from memory clusters identified as hubs\.

Prompt 1:Semantic Memory Extraction Prompt1System:Youareaknowledgeextractionengineanalyzingconversationmemories\.

2ExtractONLYfactualinformationwithdirectevidence\.

3Outputconcise,structuredentries\.

4

5User:Analyzethefollowingmemoryclusterandextract:

6

71\.USERCHARACTERISTICS:

8\-Observabletraits\(withevidence\)

9\-Contentpreferences\(withevidence\)

10\-Interactionpatterns

11

122\.FACTUALINFORMATION:

13\-Eventswithdatesandlocations

14\-Statedpreferences

15\-Mentionedrelationships

16

17Format:Concisebulletpointswithsupportingevidence\.

18

19MemoryCluster:\{conversation\}

### A\.2Response Generation

The system uses the following prompt to generate responses using retrieved episodic and semantic memories\.

Prompt 2:Response Generation Prompt1System:YouareanAIassistantwithaccesstoconversationhistory\.

2Answerquestionsconciselyusingtheprovidedcontext\.

3Fordates,useformat"15July2023"\.

4

5User:

6<EPISODICMEMORIES\>

7\{episodic\_context\}

8

9<SEMANTICKNOWLEDGE\>

10\{semantic\_knowledge\}

11

12<USERCHARACTERISTICS\>

13\{user\_model\}

14

15Question:\{query\}

16

17Provideanextremelyconciseanswerusingconcreteentities\.

18Outputonlytheanswercontent,withoutlabels\.

## Appendix BHebbian Weight Visualization

Figure[6](https://arxiv.org/html/2604.16839#A2.F6)shows the Hebbian edge weight matrix for the first 20 memory nodes\. Stronger weights \(darker colors\) indicate associations formed through co\-activation\. The matrix exhibits both local associations near the diagonal and cross\-topic connections between distant nodes, demonstrating that Hebbian learning captures semantic relationships beyond temporal adjacency\. Nodes with red borders have high total connectivity, making them candidates for Hebbian Distillation\.

![Refer to caption](https://arxiv.org/html/2604.16839v1/x5.png)Figure 6:Hebbian edge weight matrix for the first 20 memory nodes\. Stronger weights \(darker colors\) indicate more frequent co\-activation\. Nodes with red borders have high total connectivity across multiple memories\.
## Appendix CDataset Statistics

We utilize the LoCoMo benchmark, focusing on the long\-context conversation split\. Table[5](https://arxiv.org/html/2604.16839#A3.T5)provides the detailed statistics of the 10 conversations used in our experiments, while Table[5](https://arxiv.org/html/2604.16839#A3.T5)details the distribution across different question categories\.

Table 4:LoCoMo dataset overview\.MetricValueNumber of Conversations1010Avg\. Turns per Conversation∼300\\sim 300Avg\. Tokens per Conversation∼9,000\\sim 9,000Total Question\-Answer Pairs1,9861,986
Table 5:Distribution of question types in the evaluation set\.CategoryCountPercentageSingle\-hop84184142\.3%42\.3\\%Multi\-hop28228214\.2%14\.2\\%Temporal32132116\.2%16\.2\\%Open\-domain96964\.8%4\.8\\%Adversarial44644622\.5%22\.5\\%Total𝟏,𝟗𝟖𝟔\\mathbf\{1,986\}100\.0%\\mathbf\{100\.0\\%\}

## Appendix DAdditional Benchmark: LongMemEval\-S

We additionally evaluate HeLa\-Mem on LongMemEval\-S, a 500\-item long\-term conversational memory benchmark\. We use GPT\-4o\-mini as both the backbone model and the LLM judge\. For retrieval, HeLa\-Mem uses top\-15 episodic memories and top\-5 semantic memories \(20 total\)\. The best setting on this benchmark uses learning rateη=0\.02\\eta=0\.02, decay rateλ=0\.995\\lambda=0\.995, spreading activation strengthβ=0\.1\\beta=0\.1, spreading thresholdθ=0\.4\\theta=0\.4, keyword weight0\.70\.7, and max flipped itemsm=3m=3\. In Table[7](https://arxiv.org/html/2604.16839#A4.T7),Singledenotes the merged single\-hop group combining Single\-User, Single\-Asst, and Single\-Pref\. Baseline numbers are reported fromFanget al\.\([2025](https://arxiv.org/html/2604.16839#bib.bib23)\)under the same total retrieval budget\.

Table 6:Overall accuracy on LongMemEval\-S\.MethodACC \(%\)LangMem37\.20MemoryOS44\.80Mem053\.61FullText56\.80NaiveRAG61\.00A\-MEM62\.60HeLa\-Mem65\.40Table 7:Category\-wise accuracy on LongMemEval\-S\.MethodTemporalMulti\-SessionKnowledge\-UpdateSingleLangMem15\.7920\.3066\.6755\.13MemoryOS32\.3331\.0648\.7264\.74Mem040\.1546\.2170\.1262\.82FullText31\.5845\.4576\.9278\.21NaiveRAG39\.8548\.4867\.9585\.90A\-MEM47\.3648\.8764\.1184\.62HeLa\-Mem50\.3857\.1478\.2178\.85

The category\-wise values are not directly averaged to obtain the overall ACC because the category sizes are unequal\.

HeLa\-Mem achieves the best overall accuracy and the best performance on the three reasoning\-intensive categories: Temporal, Multi\-Session, and Knowledge\-Update\.

## Appendix ELLM Usage Statement

We use publicly available large language model tools as writing assistants to check grammar and polish a small number of sentences\. All technical content, claims, and contributions are conceived, written, and verified by the authors\. For schematic figures, several icons or visual elements are refined with the assistance of LLM\-based design tools, while the figure layout, semantics, and interpretation are fully determined by the authors\. Since this paper involves LLM\-related research, all model usage that affects experiments, analysis, or results is explicitly documented in Section Experiments\. No other parts of the manuscript are generated or substantively rewritten by an LLM\.

Similar Articles

Belief Memory: Agent Memory Under Partial Observability

arXiv cs.AI

This paper introduces BeliefMem, a novel memory paradigm for LLM agents that stores multiple candidate conclusions with probabilities to handle partial observability and reduce self-reinforcing errors. Empirical evaluations show it outperforms deterministic baselines on LoCoMo and ALFWorld benchmarks.