T-Mem: Memory That Anticipates, Not Archives

arXiv cs.CL Papers

Summary

T-Mem is a new long-term conversational memory architecture that enables both descriptive and associative recall, covering scenarios where query and memory share surface features and those where they are connected by latent semantic arcs. It reaches state-of-the-art on the LoCoMo and LoCoMo-Plus benchmarks.

arXiv:2606.15405v1 Announce Type: new Abstract: Long-term memory is essential for conversational agents to remain coherent across extended dialogues, follow through on commitments made many sessions earlier, and adapt their behaviour to each user. Current LLM-backed long-term conversational memory, however, is reachability-bounded by the similarity between a query and stored content, both lexical and dense-vector. The approach is effective when query and memory share surface features such as wording or named entities (we call this descriptive). But it misses another, equally valuable class of cases, where query and memory do not share surface features and are tied only by a latent semantic arc (associative). On this regime prevailing long-term memory systems collectively fail. Covering this other half is what allows an assistant, for the first time, to actively draw on past dialogue as a semantic asset. On the memory side, this is the engineering counterpart of what cognitive science calls episodic future thinking: rehearsing past experience for the future contexts under which it will need to be found. We call these write-time rehearsals triggers. We propose T-Mem, the first long-term conversational memory architecture that covers both descriptive and associative recall. At each of two evidence granularities, single facts and full exchanges, T-Mem instantiates one descriptive trigger family and one associative trigger family, so that every memory remains reachable from both surface-similar and relevance-bound queries. As empirical validation, T-Mem reaches state-of-the-art on both LoCoMo and LoCoMo-Plus.
Original Article
View Cached Full Text

Cached at: 06/16/26, 11:47 AM

# T-Mem: Memory That Anticipates, Not Archives
Source: [https://arxiv.org/html/2606.15405](https://arxiv.org/html/2606.15405)
Weidong GuoDakai WangZixuan Wang Hui LiuYu Xu Tencent \{weidongguo, dkkatzewang, zzixuanwang, pvopliu, henrysxu\}@tencent\.com

###### Abstract

Long\-term memory is essential for conversational agents to remain coherent across extended dialogues, follow through on commitments made many sessions earlier, and adapt their behaviour to each user\. Current LLM\-backed long\-term conversational memory, however, is reachability\-bounded by the similarity between a query and stored content, both lexical and dense\-vector\. The approach is effective when query and memory share surface features such as wording or named entities \(we call this descriptive\)\. But it misses another, equally valuable class of cases, where query and memory do not share surface features and are tied only by a latent semantic arc \(associative\)\. On this regime prevailing long\-term memory systems collectively fail\. Covering this other half is what allows an assistant, for the first time, to actively draw on past dialogue as a semantic asset\. On the memory side, this is the engineering counterpart of what cognitive science calls episodic future thinking: rehearsing past experience for the future contexts under which it will need to be found\. We call these write\-time rehearsals triggers\. We proposeT\-Mem, the first long\-term conversational memory architecture that covers both descriptive and associative recall\. At each of two evidence granularities, single facts and full exchanges, T\-Mem instantiates one descriptive trigger family and one associative trigger family, so that every memory remains reachable from both surface\-similar and relevance\-bound queries\. As empirical validation, T\-Mem reaches state\-of\-the\-art on both LoCoMo and LoCoMo\-Plus\.

T\-Mem: Memory That Anticipates, Not Archives

Weidong Guo Dakai Wang Zixuan WangHui LiuYu XuTencent\{weidongguo, dkkatzewang, zzixuanwang, pvopliu, henrysxu\}@tencent\.com

††footnotetext:Our source code will be released upon acceptance\.## 1Introduction

External long\-term memory modules built around large language models let a conversational assistant reconnect each new turn with past content: when the user later uses similar wording, names the same entities, or anchors the same time and place, the corresponding stored memory should be retrieved correctly\. We call this regime descriptive recall\. Flat retrieval\-augmented generation\(Lewiset al\.,[2020](https://arxiv.org/html/2606.15405#bib.bib7)\), graph\- and hypergraph\-structured memory\(Edgeet al\.,[2024](https://arxiv.org/html/2606.15405#bib.bib3); Yueet al\.,[2026](https://arxiv.org/html/2606.15405#bib.bib15)\), agentic hierarchical memory\(Chhikaraet al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib2); Xuet al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib13)\), and OS\-style memory kernels\(Packeret al\.,[2023](https://arxiv.org/html/2606.15405#bib.bib10); Liet al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib8); Wang and Chen,[2025](https://arxiv.org/html/2606.15405#bib.bib12)\)all share the same retrieval recipe: project query and stored content into a single similarity space \(lexical BM25 or dense embedding\) and take the nearest top\-KK\. These systems differ widely in structure but occupy the same region of the retrieval design space \(Figure[1](https://arxiv.org/html/2606.15405#S1.F1), descriptive half\)\.

![Refer to caption](https://arxiv.org/html/2606.15405v1/x1.png)Figure 1:Trigger\-design space across granularity \(item/scene\) and orientation \(descriptive/associative\)\. Prior systems cluster on the descriptive \(right\) half; T\-Mem instantiates one trigger per quadrant\.Long\-horizon dialogue, however, hosts another, equally common kind of query–memory relation: query and target share no surface form; what binds them is a latent semantic arc such as a causal link or the continuation of a shared situation\. We call this regime associative recall, after the cue\-bound retrieval long studied in human associative memory\(Anderson and Bower,[2014](https://arxiv.org/html/2606.15405#bib.bib1)\)\. As an example, a month ago the user mentioned, “a colleague on my team has a serious seafood allergy”; today the user asks, “where should we take the team for dinner tonight?” The two messages share no surface form, yet the first is precisely what the second must draw on\. Users in long\-running conversations rarely re\-raise an old topic with the same wording they used weeks earlier; they revisit it through indirect cues triggered by a new situation\(Xuet al\.,[2022b](https://arxiv.org/html/2606.15405#bib.bib29); Maharanaet al\.,[2024](https://arxiv.org/html/2606.15405#bib.bib9)\)\. Similarity search, by construction, can only converge along lexical or semantic distance; a target whose surface form has already drifted away can never be reached from the same neighbourhood\.

Unfolding this observation gives us an orientation axis, orientation∈\\in\{descriptive, associative\}\. The axis does not stand alone: the evidence on which a query lands also has two natural granularities\. Some queries ask about a single fact \(e\.g\., “which week was Calvin’s Tokyo concert?”\) and are answered at the item layer\. Others ask about a coherent stretch of dialogue \(e\.g\., “what else did we cover when we last discussed Calvin’s tour?”\) and are answered at the scene layer\. The granularity axis applies on both sides of orientation and is therefore orthogonal to it, opening a2×22\\\!\\times\\\!2retrieval space \(Figure[1](https://arxiv.org/html/2606.15405#S1.F1)\)\. Prevailing memory systems concentrate in Quadrant I \(item×\\timesdescriptive\), with recent graph\-based systems extending toward Quadrant IV \(scene×\\timesdescriptive\); Quadrants II and III, the associative half, remain a structural blind spot\.

We proposeT\-Mem, the first long\-term conversational memory architecture that covers all four quadrants\. T\-Mem organises memory at two evidence granularities, scene \(a coherent stretch of dialogue\) and item \(an atomic fact extracted from a scene\)\. On top of this evidence layer, T\-Mem instantiates one trigger family per quadrant — Entity \(Q I\), Bridge \(Q II\), Horizon \(Q III\), and Scene \(Q IV\); Bridge and Horizon are precisely the two families that fill in the associative blind spot identified above\. All four families are precomputed offline by a memory\-construction LLM at write time and stored alongside their host item or scene\. This decouples what makes a memory reachable from what counts as evidence at answer time: when a query no longer reaches the host through similarity, it can still locate the host via one of its triggers\. As a result, every memory remains reachable both from descriptively similar and from associatively relevant queries\.

Our contributions are summarized as follows:

\(i\) We show that prevailing long\-term conversational memory systems share a single similarity\-based retrieval recipe and therefore cover only the descriptive half of a granularity×\\timesorientation design space \(Q I and Q IV\); associative recall \(Q II and Q III\) is a structural blind spot of this recipe\.

\(ii\) We propose T\-Mem, a long\-term conversational memory architecture that closes this blind spot at the indexing layer: on top of a scene–item evidence layer, T\-Mem instantiates one write\-time trigger family per quadrant of the design space, bringing all four quadrants under one system\.

\(iii\) Empirically, T\-Mem reaches state\-of\-the\-art on both LoCoMo \(80\.26%\) and LoCoMo\-Plus \(74\.81%\), and tightens the cross\-benchmark gap from the 28–50 pp range of prevailing systems to 5\.45 pp\.

## 2Related Work

Long\-term conversational memory has converged on four broad design families\(Zhanget al\.,[2024](https://arxiv.org/html/2606.15405#bib.bib25)\)\. We organise prior work along the two axes used in Section[1](https://arxiv.org/html/2606.15405#S1): granularity \(item vs\. scene\) and orientation \(descriptive vs\. associative\)\.

### 2\.1Long\-Term Memory Architectures

\(i\) Flat RAG\.Retrieval\-augmented generation\(Lewiset al\.,[2020](https://arxiv.org/html/2606.15405#bib.bib7)\)chunks and dense\-indexes the dialogue stream and retrieves by query embedding; it is the baseline on which all later families build\.

\(ii\) Graph\-, hypergraph\-, and temporal\-graph structured memory\.A second family adds explicit structure: knowledge graphs for query\-focused summarisation \(GraphRAGEdgeet al\.,[2024](https://arxiv.org/html/2606.15405#bib.bib3)and variantsGuoet al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib5); Gutiérrezet al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib6)\), temporal graphs for agent memory \(ZepRasmussenet al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib11)\), and hyperedges over a topic–episode–fact hierarchy \(HyperMemYueet al\.,[2026](https://arxiv.org/html/2606.15405#bib.bib15)\)\.

\(iii\) Agentic hierarchical memory\.A third family condenses each session into fine\-grained notes and consolidates them incrementally: forgetting\-curve user facts \(MemoryBankZhonget al\.,[2024](https://arxiv.org/html/2606.15405#bib.bib14)\), production\-oriented fact layers \(Mem0Chhikaraet al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib2)\), dynamic re\-linking across notes \(A\-MEMXuet al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib13)\), with lightweight\(Fanget al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib4)\)and autonomous\-augmentation\(Salamaet al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib21); Huanget al\.,[2026](https://arxiv.org/html/2606.15405#bib.bib22)\)variants\.

\(iv\) OS\-style memory kernels\.A fourth family frames memory as hierarchical storage with explicit scheduling and paging policies, with MemGPT\(Packeret al\.,[2023](https://arxiv.org/html/2606.15405#bib.bib10)\), MemOS\(Liet al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib8)\), and MIRIX\(Wang and Chen,[2025](https://arxiv.org/html/2606.15405#bib.bib12)\)as representatives\.

Together these families occupy Quadrants I and IV of Figure[1](https://arxiv.org/html/2606.15405#S1.F1): item\-only systems \(flat RAG, agentic stacks, most OS\-kernels\) sit in I, structured memory extends toward IV\. Quadrants II and III remain systematically under\-served\.

### 2\.2Cognitive\-Cue Memory Access

A separate line draws on classical associative memory\(Anderson and Bower,[2014](https://arxiv.org/html/2606.15405#bib.bib1)\), where recall is cue\-driven: a current utterance activates a temporally distant episode through learned associations rather than surface similarity\. This aligns with what cognitive science calls episodic future thinking\(Suddendorf and Corballis,[2007](https://arxiv.org/html/2606.15405#bib.bib19); Addis and Schacter,[2008](https://arxiv.org/html/2606.15405#bib.bib20)\), in which past experience is rehearsed for future cues\. LoCoMo\(Maharanaet al\.,[2024](https://arxiv.org/html/2606.15405#bib.bib9)\)and LoCoMo\-Plus\(Liet al\.,[2026](https://arxiv.org/html/2606.15405#bib.bib16)\)operationalise this future\-oriented view on the evaluation side; T\-Mem instantiates it on the retrieval side, with the Bridge and Horizon Triggers occupying Quadrants II and III respectively\.

### 2\.3Persona and Profile Memory

A complementary body of work treats per\-speaker profiles as a first\-class ingredient of long\-form dialogue\. Persona\-grounded benchmarks\(Zhanget al\.,[2018](https://arxiv.org/html/2606.15405#bib.bib26); Mazaréet al\.,[2018](https://arxiv.org/html/2606.15405#bib.bib27)\)established that speakers carry structured attributes retrievable across turns, and long\-horizon variants\(Xuet al\.,[2022a](https://arxiv.org/html/2606.15405#bib.bib28),[b](https://arxiv.org/html/2606.15405#bib.bib29); Janget al\.,[2023](https://arxiv.org/html/2606.15405#bib.bib30)\)argue that such profiles should aggregate across distant sessions\. MemoryBank\(Zhonget al\.,[2024](https://arxiv.org/html/2606.15405#bib.bib14)\)and Mem0\(Chhikaraet al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib2)\)instantiate this view inside agentic memory systems\. We treat this functionality as complementary, not as a substitute for fine\-grained recall: T\-Mem includes a per\-speaker Persona that fills the profile\-coverage gap left by sparse trigger output\.

![Refer to caption](https://arxiv.org/html/2606.15405v1/x2.png)Figure 2:Framework of T\-Mem\. The construction pipeline segments scenes, assigns topics, extracts items, and instantiates one trigger family per quadrant \(colours of scene nodes indicate topic membership\)\. The retrieval cascade performs top\-down topic→\\toscene→\\toitem selection with associative\-trigger augmentation\.

## 3Approach

We design T\-Mem as a memory architecture organised by retrieval capability rather than by storage form, with each component allocated to one quadrant of the cue–memory space of §[1](https://arxiv.org/html/2606.15405#S1)\. Figure[2](https://arxiv.org/html/2606.15405#S2.F2)gives an overview; the section proceeds in three passes: a typed memory structure, a four\-stage construction pipeline, and a top\-down retrieval cascade\.

Table 1:A worked example: two scenes on different days are grouped under one topic, anchoring an item — Gina’s store expansion — that depends on both\. The Persona row aggregates speaker attributes that no single item carries on its own\.### 3\.1Memory Structure

T\-Mem operates over five kinds of objects \(Table[1](https://arxiv.org/html/2606.15405#S3.T1)gives a worked instance\)\. We introduce them one by one before giving a unified summary\.

- •scenes\(𝒱S\\mathcal\{V\}^\{S\}\): cohesive exchanges segmented from the dialogue stream; handed to the QA LLM as evidence at answer time\.
- •items\(𝒱I\\mathcal\{V\}^\{I\}\): atomic facts extracted from scenes, anchored viaℰS​I\\mathcal\{E\}^\{SI\}to one or more host scenes; also handed to the QA LLM as evidence\. An atomic item attaches to a single source scene, while a connected item attaches to every source scene it draws from\.
- •topic labels\(𝒱T\\mathcal\{V\}^\{T\}\): a multi\-label tagging produced by a lightweight topic module, connected to scenes viaℰT​S\\mathcal\{E\}^\{TS\}\. Topic labels are used only to scope item extraction and to pre\-filter the scene pool at retrieval; they are never read by the QA LLM\.
- •four trigger families\(𝒯Ent,𝒯Brg\\mathcal\{T\}^\{\\mathrm\{Ent\}\},\\mathcal\{T\}^\{\\mathrm\{Brg\}\}on items;𝒯Scn,𝒯Hor\\mathcal\{T\}^\{\\mathrm\{Scn\}\},\\mathcal\{T\}^\{\\mathrm\{Hor\}\}on scenes\): one per quadrant of the design space of §[1](https://arxiv.org/html/2606.15405#S1); they participate in retrieval only\.
- •Persona\(𝒳\\mathcal\{X\}\): a per\-speaker summary of standing traits, aggregated across the full conversation\. It is injected as ambient context at answer time outside the retrieval channel, appended after the retrieved scenes and items rather than competing for the retrieval budget\.

Putting these five kinds of objects together, T\-Mem stores conversational memory as the typed tuple

ℳ=\(\\displaystyle\\mathcal\{M\}\\;=\\;\\bigl\(𝒱T∪𝒱S∪𝒱I,ℰT​S∪ℰS​I,\\displaystyle\\mathcal\{V\}^\{T\}\\\!\\cup\\\!\\mathcal\{V\}^\{S\}\\\!\\cup\\\!\\mathcal\{V\}^\{I\},\\;\\;\\mathcal\{E\}^\{TS\}\\\!\\cup\\\!\\mathcal\{E\}^\{SI\},\(1\)𝒯Ent∪𝒯Brg∪𝒯Scn∪𝒯Hor,𝒳\)\.\\displaystyle\\mathcal\{T\}^\{\\mathrm\{Ent\}\}\\\!\\cup\\\!\\mathcal\{T\}^\{\\mathrm\{Brg\}\}\\\!\\cup\\\!\\mathcal\{T\}^\{\\mathrm\{Scn\}\}\\\!\\cup\\\!\\mathcal\{T\}^\{\\mathrm\{Hor\}\},\\;\\;\\mathcal\{X\}\\,\\bigr\)\.Three design commitments shape this object: evidence layers are kept type\-segregated so that scenes and items can each be retrieved at their own granularity; topic labels are kept off the QA channel so that pre\-filtering does not contaminate evidence; and triggers are kept off the evidence path so that “how a memory is reached” is decoupled from “what is reached”\.

### 3\.2Memory Construction

ℳ\\mathcal\{M\}is produced by a four\-stage pipeline: scene segmentation, topic assignment, item extraction, and trigger instantiation\. The order is load\-bearing, since each stage materialises a structural commitment that no later stage can otherwise recover\.

#### 3\.2\.1Scenes and Topics

Session boundaries in long\-term dialogue are an artefact of data collection rather than event closure: a session can mix several events, and one event can span several sessions\. T\-Mem therefore segments scenes by event closure, using a lightweight boundary\-detector LLM that scans a sliding turn buffer and emits a scene whenever the current event closes\.

Each scene node carries four fields with three distinct read\-paths:

- •title and summary — consumed by the lexical and dense indices;
- •raw turn sequence — the evidence handed to the QA LLM at answer time;
- •third\-person narrative — the input read by the Scene and Horizon trigger extractors\.

Scenes that belong to the same recurring subject can be scattered far apart along the dialogue stream, leaving an extractor that sees one scene at a time no way to recover their cross\-scene logical links\. T\-Mem therefore processes scenes in arrival order: for each new scene, a lightweight topic module decides which existing topic labels it should be admitted to, and opens a new label whenever none fits\. Topic labels grow from the data, and a scene can be admitted to several at once\.

#### 3\.2\.2Item Extraction

Conversational queries split into two classes: those that interrogate a fact in isolation, and those that interrogate the relation between two facts\. A single granularity of items cannot serve both\. Queries of the first kind are best served by a unit that distills the fact away from its scene\. Queries of the second kind are best served by a unit that preserves the cross\-scene logical link before retrieval ever sees it\. For every topic labelvT∈𝒱Tv^\{T\}\\\!\\in\\\!\\mathcal\{V\}^\{T\}we therefore invoke the extractor LLM once and obtain two complementary item types in the same response\. An atomic item is linked viaℰS​I\\mathcal\{E\}^\{SI\}to its single source scene, while a connected item is linked viaℰS​I\\mathcal\{E\}^\{SI\}to every source scene it draws from\.

#### 3\.2\.3Trigger Instantiation

The four trigger families do not play equivalent roles: Entity and Scene serve the descriptive half of the design space \(already covered by similarity search\), while Bridge and Horizon serve the associative half \(unreachable by similarity search and identified as the blind spot in §[1](https://arxiv.org/html/2606.15405#S1)\)\.

Entity Trigger \(Q I\)\.Names the item with a superordinate concept\. Entity and Bridge are produced jointly within a single prompt call \(oneNN\-trigger generation per item\), not in separate stages\.

Bridge Trigger \(Q II\)\.Projects the item onto a situation in which knowing this item would matter, even when the surface form is far away \(e\.g\., an item about “an allergy” projects to choosing a restaurant for a team dinner\)\. Each trigger comes with a one\-clause rationale\.

Scene Trigger \(Q IV\)\.Describes the current scene along four orthogonal attributes \(situation, object, event, and emotion\), one sentence each\.

Horizon Trigger \(Q III\)\.Projects the same scene onto a set of forward\-looking dimensions, so that the scene can still be reached when a future query approaches it from a related but different situation\.

### 3\.3Indexing

The two index types serve the two retrieval axes of §[1](https://arxiv.org/html/2606.15405#S1): Node Indices support the descriptive axis along which a query restates a host scene or item, and Trigger\-aware Indices support the associative axis along which a query reaches a host through a learned link\.

##### Node Indices\.

Every node is registered into a shared BM25 corpus and a per\-type dense table\.

##### Trigger\-aware Indices\.

Each trigger family surfaces host nodes on behalf of the query, not the triggers themselves\. Each item exposes three views \(concept\-only, bridge\-only, joint==concept∥\\\|bridge∥\\\|rationale\), independently encoded; the cosine score is the nan\-aware max across views and is attributed to the host item\. Scene\-level triggers follow the analogous multi\-view scheme over scene attributes and Horizon channels\.

Algorithm 1T\-Mem online retrieval \(sketch\)\.1:query

qq; top\-

KKbudgets

kT,kS,kIk^\{\\mathrm\{T\}\}\\\!,k^\{\\mathrm\{S\}\}\\\!,k^\{\\mathrm\{I\}\}
2:scenes

ℛS\\mathcal\{R\}\_\{S\}, items

ℛI\\mathcal\{R\}\_\{I\}, persona

XX
3:

ℛT←topkT​RRF​\(q,𝒱T\)\\mathcal\{R\}\_\{T\}\\leftarrow\\mathrm\{top\}\_\{k^\{\\mathrm\{T\}\}\}\\,\\mathrm\{RRF\}\\bigl\(q,\\mathcal\{V\}^\{T\}\\bigr\)⊳\\trianglerighttopic prefilter

4:

𝒞2←ℰT​S​\(ℛT\)∪SceneCue​\(q\)\\mathcal\{C\}\_\{2\}\\leftarrow\\mathcal\{E\}^\{TS\}\(\\mathcal\{R\}\_\{T\}\)\\cup\\mathrm\{SceneCue\}\(q\)⊳\\triangleright\+\+Scn, Hor

5:

ℛS←topkS​RRF​\(q,𝒞2\)\\mathcal\{R\}\_\{S\}\\leftarrow\\mathrm\{top\}\_\{k^\{\\mathrm\{S\}\}\}\\,\\mathrm\{RRF\}\\bigl\(q,\\mathcal\{C\}\_\{2\}\\bigr\)
6:

𝒞3←ℰS​I​\(ℛS\)∪TrigRecall​\(q;τ\)\\mathcal\{C\}\_\{3\}\\leftarrow\\mathcal\{E\}^\{SI\}\(\\mathcal\{R\}\_\{S\}\)\\cup\\mathrm\{TrigRecall\}\(q;\\tau\)⊳\\triangleright\+\+Ent, Brg

7:

ℛI←topkI​RRF​\(q,𝒞3\)\\mathcal\{R\}\_\{I\}\\leftarrow\\mathrm\{top\}\_\{k^\{\\mathrm\{I\}\}\}\\,\\mathrm\{RRF\}\\bigl\(q,\\mathcal\{C\}\_\{3\}\\bigr\)
8:

X←Persona​\(speaker​\(q\)\)X\\leftarrow\\mathrm\{Persona\}\\bigl\(\\mathrm\{speaker\}\(q\)\\bigr\)⊳\\trianglerightambient

9:return

\(ℛS,ℛI,X\)\(\\mathcal\{R\}\_\{S\},\\,\\mathcal\{R\}\_\{I\},\\,X\)

Table 2:LoCoMo results \(LLM\-as\-judge accuracy, %\)\. Rows marked with†\\daggerreuse HyperMem’s own QA pipeline, which deviates from the official LoCoMo pipeline \(see Appendix[E](https://arxiv.org/html/2606.15405#A5)\); all other rows follow the official pipeline, with numbers for MIRIX, Mem0, Zep, Memobase, MemU, Supermemory and MemOS taken fromLiet al\.\([2025](https://arxiv.org/html/2606.15405#bib.bib8)\)\.Table 3:LoCoMo vs\. LoCoMo\-Plus \(LLM\-as\-judge accuracy, %\)\.Gapis the drop from LoCoMo to LoCoMo\-Plus\. Numbers for Mem0, SeCom, A\-Mem, GPT\-4o and Gemini\-2\.5\-Pro are taken fromLiet al\.\([2026](https://arxiv.org/html/2606.15405#bib.bib16)\); the adversarial category is excluded following common LoCoMo practice\.

### 3\.4Retrieval

Given a query, T\-Mem feeds the QA LLM through a top\-down topic→\\toscene→\\toitem cascade, scoring candidates at each layer with reciprocal rank fusion \(RRF\) over the lexical and dense rankings of §[3\.3](https://arxiv.org/html/2606.15405#S3.SS3):

RRF​\(d\)=∑m=1M1k0\+rankm​\(d\),\\mathrm\{RRF\}\(d\)\\;=\\;\\sum\_\{m=1\}^\{M\}\\frac\{1\}\{k\_\{0\}\\;\+\\;\\mathrm\{rank\}\_\{m\}\(d\)\},\(2\)whereMMis the per\-call number of fused ranklists andk0k\_\{0\}is the smoothing constant\. Algorithm[1](https://arxiv.org/html/2606.15405#alg1)sketches the per\-layer flow\.

##### Cascade ordering and trigger bypass\.

The top\-down ordering is the structural invariant of the cascade: each upper layer is coarser by construction, so pushing the cheapest cut to the front lets the scene and item layers each operate on a small fraction of the corpus\. Crucially, scenes and items reached through any of the four triggers are not gated by the topic prefilter \(Stage 1 of Algorithm[1](https://arxiv.org/html/2606.15405#alg1)\)\. The prefilter is, by construction, a similarity\-based neighbourhood test \(BM25 \+ dense over topic labels\), while the Bridge and Horizon Triggers \(§[3\.2\.3](https://arxiv.org/html/2606.15405#S3.SS2.SSS3)\) are designed precisely to surface on cues that fall outside any such neighbourhood\. Gating them by surviving topics would re\-impose the same similarity\-only retrieval regime that §[1](https://arxiv.org/html/2606.15405#S1)sets T\-Mem against\.

## 4Experiments

### 4\.1Experimental Setup

##### Benchmarks\.

We evaluate on two long\-term\-memory benchmarks\.LoCoMo\(Maharanaet al\.,[2024](https://arxiv.org/html/2606.15405#bib.bib9)\)comprises multi\-session conversations spanning weeks to months, with four question types: Single\-hop \(recover a single fact stated once in the dialogue\), Multi\-hop \(combine facts spanning two or more sessions\), Temporal \(resolve when something happened\), and Open\-domain \(answer using a speaker’s persona traits or world knowledge beyond the dialogue text\)\.LoCoMo\-Plus\(Liet al\.,[2026](https://arxiv.org/html/2606.15405#bib.bib16)\)extends LoCoMo with a Cognitive subset that probes the associative axis of §[1](https://arxiv.org/html/2606.15405#S1): it injects probes whose cue and answer item are bound only by narrative or causal cues rather than by lexical or semantic proximity, so that the cue item lies far beyond any realistic top\-KKunder standard similarity retrievers\. Its Cognitive subset is the only new contribution, and we report its score as the LoCoMo\-Plus number throughout §[4](https://arxiv.org/html/2606.15405#S4)\.

##### Baselines\.

On LoCoMo \(Table[2](https://arxiv.org/html/2606.15405#S3.T2)\) we compare against Mem0\(Chhikaraet al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib2)\), Zep\(Rasmussenet al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib11)\), Memobase111[https://www\.memobase\.io/](https://www.memobase.io/), MemU222[https://github\.com/NevaMind\-AI/memU](https://github.com/NevaMind-AI/memU), Supermemory333[https://supermemory\.ai/](https://supermemory.ai/), MIRIX\(Wang and Chen,[2025](https://arxiv.org/html/2606.15405#bib.bib12)\), MemOS\(Liet al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib8)\)and HyperMem\(Yueet al\.,[2026](https://arxiv.org/html/2606.15405#bib.bib15)\)\. On LoCoMo\-Plus \(Table[3](https://arxiv.org/html/2606.15405#S3.T3)\) we compare against the memory systems Mem0, SeCom\(Panet al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib17)\), A\-Mem\(Xuet al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib13)\), MemOS and HyperMem, plus two closed\-source LLM reference baselines \(GPT\-4o, Gemini\-2\.5\-Pro\), in line withLiet al\.\([2026](https://arxiv.org/html/2606.15405#bib.bib16)\)\. Numbers taken from prior work vs\. run by us are listed in the captions of Table[2](https://arxiv.org/html/2606.15405#S3.T2)and Table[3](https://arxiv.org/html/2606.15405#S3.T3)\.

##### Implementation Details\.

The dense encoder isbge\-m3\(Xiaoet al\.,[2024](https://arxiv.org/html/2606.15405#bib.bib18)\)\. The three top\-KKbudgets\(kT,kS,kI\)\(k^\{\\mathrm\{T\}\},k^\{\\mathrm\{S\}\},k^\{\\mathrm\{I\}\}\)and the trigger\-union size follow the defaults swept in §[4\.5](https://arxiv.org/html/2606.15405#S4.SS5)\. The memory\-construction LLM is GPT\-4\.1\-mini in all our runs\. For answer generation and LLM\-as\-judge we follow each benchmark’s official protocol: LoCoMo uses GPT\-4o\-mini for both roles, with all LoCoMo numbers averaged over three independent runs to align with the protocol ofLiet al\.\([2025](https://arxiv.org/html/2606.15405#bib.bib8)\); LoCoMo\-Plus uses GPT\-4o for answer generation and Gemini\-2\.5\-Flash as judge under the binary memory\-awareness protocol ofLiet al\.\([2026](https://arxiv.org/html/2606.15405#bib.bib16)\)\. Full hyperparameters in Appendix[E](https://arxiv.org/html/2606.15405#A5); prompt templates in Appendix[B](https://arxiv.org/html/2606.15405#A2)\.

### 4\.2Main Results on LoCoMo

T\-Mem reaches 80\.26% overall LLM\-as\-judge accuracy on LoCoMo, 3\.25 percentage points \(pp\) above the strongest baseline HyperMem, and is the global maximum on five of the six columns of the main block \(Open\-domain narrowly loses to MemOS by 0\.69 pp\)\. Token\-level F1 \(51\.96\) corroborates this ranking\.

The four question\-type columns line up with the design points of §[3](https://arxiv.org/html/2606.15405#S3)\. Single\-hop and Temporal correspond to the scene–item evidence layer, the Entity route of the item\-level Triggers, and the Scene Trigger\. Multi\-hop corresponds to the Bridge route of the item\-level Triggers and the topic\-label pre\-filter, which together feed the candidate pool with cross\-scene evidence beyond the reach of a single similarity hit\. Open\-domain corresponds to queries about a speaker’s standing traits, where T\-Mem’s Persona channel complements the scene–item evidence\.

ConfigurationLoCoMoLoCoMo\-Plus%𝚫\\boldsymbol\{\\Delta\}%𝚫\\boldsymbol\{\\Delta\}T\-Mem80\.26–74\.81–w/o SL75\.52−4\.74\-4\.7471\.82−2\.99\-2\.99w/o IC77\.40−2\.86\-2\.8674\.31−0\.50\-0\.50w/o ET \+ BT78\.83−1\.43\-1\.4374\.56−0\.25\-0\.25w/o PS78\.72−1\.54\-1\.5473\.82−0\.99\-0\.99w/o TF78\.83−1\.43\-1\.4372\.57−2\.24\-2\.24w/o ST79\.92−0\.34\-0\.3470\.07−4\.74\-4\.74w/o HT80\.18−0\.08\-0\.0862\.34−12\.47\\boldsymbol\{\-12\.47\}w/o ST \+ HT79\.86−0\.40\-0\.4052\.62−22\.19\\boldsymbol\{\-22\.19\}Table 4:Component ablations on LoCoMo and LoCoMo\-Plus \(Overall LLM\-as\-judge accuracy, %\)\. All eight ablations are run on the same memory system across both benchmarks\.
### 4\.3Main Results on LoCoMo\-Plus

LoCoMo\-Plus\(Liet al\.,[2026](https://arxiv.org/html/2606.15405#bib.bib16)\)most directly tests the Quadrant\-II / III associative triggers\.

T\-Mem narrows the LoCoMo→\\toLoCoMo\-Plus Gap to only 5\.45 pp \(Table[3](https://arxiv.org/html/2606.15405#S3.T3)\) — roughly five times tighter than the strongest prior memory system HyperMem, and nearly an order of magnitude tighter than the Mem0 / SeCom / A\-Mem cluster\.

We attribute this contraction to the associative triggers, with the scene\-level Horizon trigger carrying the bulk of the LoCoMo\-Plus gain \(ablation in §[4\.4](https://arxiv.org/html/2606.15405#S4.SS4); the item\-level Bridge trigger is hard to register on this benchmark, see Limitations\)\. As a counterpoint, the closed\-source LLMs in the top block of Table[3](https://arxiv.org/html/2606.15405#S3.T3)still leave a Gap of roughly 45 pp under full\-context input, indicating that backbone capacity alone cannot close the associative\-axis disconnect of LoCoMo\-Plus\.

![Refer to caption](https://arxiv.org/html/2606.15405v1/x3.png)Figure 3:Component ablations on LoCoMo and LoCoMo\-Plus\. Each axis is one switch; radii are within\-benchmark normalised so smaller radius means greater impact, with the most\-affected switch on each dataset anchoring at the centre\.![Refer to caption](https://arxiv.org/html/2606.15405v1/x4.png)Figure 4:Hyperparameter sensitivity of T\-Mem on LoCoMo Overall \(%\)\. Each panel sweeps one of\{P,S,I,R\}=\{kT,kS,kI,R\}\\\{P,S,I,R\\\}=\\\{k^\{\\mathrm\{T\}\},k^\{\\mathrm\{S\}\},k^\{\\mathrm\{I\}\},R\\\}\(topic / scene / item / item\-trigger union top\-KK\) while holding the others at their defaults\(15,5,15,10\)\(15,5,15,10\); the default operating point is marked with a red square\.
### 4\.4Ablation Study

Table[4](https://arxiv.org/html/2606.15405#S4.T4)and Figure[3](https://arxiv.org/html/2606.15405#S4.F3)report Overall LLM\-as\-judge accuracy across eight ablations on both benchmarks; the eight switches partition T\-Mem’s components into a descriptive\-axis group \(scene / item evidence layer, Entity \+ Bridge triggers, Persona, topic prefilter\) and an associative\-axis group \(the two scene\-level triggers Scene and Horizon\), and the asymmetry between the two columns carries the argument\. The eight switches, in table order, are SL \(Scene Layer\), IC \(Item Channel\), ET\+BT \(Entity \+ Bridge Triggers\), PS \(Persona\), TF \(Topic\-label Filter\), and the two scene\-level triggers ST \(Scene Trigger\) and HT \(Horizon Trigger\)\. All “w/o XX” labels below correspond directly to rows of Table[4](https://arxiv.org/html/2606.15405#S4.T4)\.

The top block \(SL / IC / ET\+BT / PS / TF\) shows a strong contrast between the two benchmarks: drops of1\.41\.4–4\.74\.7pp on LoCoMo, but only0\.250\.25–2\.992\.99pp on LoCoMo\-Plus\. This asymmetry is rooted in the LoCoMo\-Plus format itself, which is dominated by scene\-level associative evidence; item\-level components, including the Bridge trigger, are hard to register on this benchmark even when retrieved and supplied to the QA prompt\.

The bottom block \(ST / HT, individually and jointly\) inverts the picture: all three configurations move Overall by under half a point on LoCoMo, but collapse it by up to−22\.19\-22\.19pp on LoCoMo\-Plus, with ST \+ HT exceeding every descriptive\-axis switch on that subset by more than7×7\\times\. These scene\-level triggers therefore sit at the bottom of LoCoMo’s effect ranking and at the top of LoCoMo\-Plus’s, consistent with their role in §[3](https://arxiv.org/html/2606.15405#S3)\. This asymmetry, silent on LoCoMo and decisive on LoCoMo\-Plus, is the empirical form of §[1](https://arxiv.org/html/2606.15405#S1)’s diagnosis that a system optimised for LoCoMo is implicitly optimised for staying inside the similarity neighbourhood\. An axis benchmarked along similarity alone cannot, by construction, register the cost of removing the associative axis\.

### 4\.5Hyperparameter Analysis

We sweep the four budgets defined in §[3\.4](https://arxiv.org/html/2606.15405#S3.SS4)\(Figure[4](https://arxiv.org/html/2606.15405#S4.F4)\): topic top\-KK\(kTk^\{\\mathrm\{T\}\}\), scene top\-KK\(kSk^\{\\mathrm\{S\}\}\), item top\-KK\(kIk^\{\\mathrm\{I\}\}\), and the item\-trigger union top\-KK\(RR\)\. We vary one at a time with others fixed; the QA configuration matches Table[2](https://arxiv.org/html/2606.15405#S3.T2)’s main rows\.

Two of the four sweeps are monotone\-saturating, two are sweet\-spot\-shaped: topic top\-KKsaturates by 15 \(with no measurable further gain at 20\) and item top\-KKat 15, while scene top\-KKand the item\-trigger union top\-KKpeak at 5 and 10 respectively\. The default operating point\(kT,kS,kI,R\)=\(15,5,15,10\)\(k^\{\\mathrm\{T\}\},k^\{\\mathrm\{S\}\},k^\{\\mathrm\{I\}\},R\)=\(15,5,15,10\)sits at the saturation knee or sweet spot of each sweep, indicating that T\-Mem is not finely tuned to a narrow operating regime\.

![Refer to caption](https://arxiv.org/html/2606.15405v1/x5.png)Figure 5:Token usage versus accuracy on LoCoMo and LoCoMo\-Plus\. Thexx\-axis is average input tokens per query \(log scale\)\. Token\-usage numbers for systems other than T\-Mem, HyperMem, and MemOS on LoCoMo\-Plus are taken from the HyperMem paper\(Yueet al\.,[2026](https://arxiv.org/html/2606.15405#bib.bib15)\)\.
### 4\.6Efficiency Analysis

A retrieval\-augmented memory architecture must justify its added cost\. Figure[5](https://arxiv.org/html/2606.15405#S4.F5)contrasts each system’s input\-token usage against its overall LLM\-as\-judge accuracy on both benchmarks \(log\-scaledxx\-axis\)\. T\-Mem stands out on both: at a token budget below HyperMem’s, it simultaneously reaches higher accuracy on LoCoMo and on LoCoMo\-Plus\. The lower\-token cluster \(Mem0, Zep, MemOS\) trades token cost for a clear accuracy deficit on LoCoMo and collapses on LoCoMo\-Plus\. Under the CoT\-QA configuration, T\-Mem likewise leads HyperMem at the same token budget, so the lead persists across both QA pipelines\.

## 5Conclusion

Current long\-term conversational memory systems cover only one mode of recall, the one driven by similarity between a query and stored content; the other mode, where query and memory are bound by latent semantic association, is structurally left out\. We proposeT\-Mem, which places one trigger family per quadrant of the2×22\\\!\\times\\\!2recall design space, so that every memory remains reachable from both descriptively similar and associatively relevant queries\. The empirical payoff is visible across both LoCoMo and LoCoMo\-Plus, where T\-Mem reaches state\-of\-the\-art and collapses the cross\-benchmark gap into a single\-digit margin\. This is the engineering form of a long\-standing claim from cognitive science:*a long\-term memory system earns its adaptive value not by archiving the dialogue stream faithfully, but by anticipating, at write time, the future cues under which its contents will need to be reached\.*

## Limitations

The writing pipeline \(scene segmentation, item extraction, four\-family trigger instantiation, and Persona summarisation\) relies on a memory\-construction LLM strong enough to follow structured\-output instructions reliably; behaviour under substantially weaker or fully local LLMs is uncharacterised\. Memory is also built offline, leaving incremental update and consolidation, as well as reinforcement\-learning memory management\(Yanet al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib23); Wanget al\.,[2025](https://arxiv.org/html/2606.15405#bib.bib24)\), to future work\. Finally, LoCoMo\-Plus’s Cognitive subset is to our knowledge the only public benchmark that directly probes the associative axis we target; its cue is itself a short dialogue, which structurally favours scene\-granularity recall and gives the item\-level Bridge trigger no lever\. A fact\-level cognitive benchmark with a single\-fact cue would round out the other half of the evidence\.

## References

- Constructive episodic simulation: temporal distance and detail of past and future events modulate hippocampal engagement\.Hippocampus18\(2\),pp\. 227–237\.External Links:[Document](https://dx.doi.org/10.1002/hipo.20405)Cited by:[§2\.2](https://arxiv.org/html/2606.15405#S2.SS2.p1.1)\.
- J\. R\. Anderson and G\. H\. Bower \(2014\)Human associative memory\.Psychology Press\.Cited by:[§1](https://arxiv.org/html/2606.15405#S1.p2.1),[§2\.2](https://arxiv.org/html/2606.15405#S2.SS2.p1.1)\.
- P\. Chhikara, D\. Khant, S\. Aryan, T\. Singh, and D\. Yadav \(2025\)Mem0: building production\-ready AI agents with scalable long\-term memory\.CoRRabs/2504\.19413\.Cited by:[§1](https://arxiv.org/html/2606.15405#S1.p1.1),[§2\.1](https://arxiv.org/html/2606.15405#S2.SS1.p3.1),[§2\.3](https://arxiv.org/html/2606.15405#S2.SS3.p1.1),[§4\.1](https://arxiv.org/html/2606.15405#S4.SS1.SSS0.Px2.p1.1)\.
- D\. Edge, H\. Trinh, N\. Cheng, J\. Bradley, A\. Chao, A\. Mody, S\. Truitt, and J\. Larson \(2024\)From local to global: a graph RAG approach to query\-focused summarization\.CoRRabs/2404\.16130\.Cited by:[§1](https://arxiv.org/html/2606.15405#S1.p1.1),[§2\.1](https://arxiv.org/html/2606.15405#S2.SS1.p2.1)\.
- J\. Fang, X\. Deng, H\. Xu, Z\. Jiang, Y\. Tang, Z\. Xu, S\. Deng, Y\. Yao, M\. Wang, S\. Qiao, H\. Chen, and N\. Zhang \(2025\)LightMem: lightweight and efficient memory\-augmented generation\.CoRRabs/2510\.18866\.Cited by:[§2\.1](https://arxiv.org/html/2606.15405#S2.SS1.p3.1)\.
- Z\. Guo, L\. Xia, Y\. Yu, T\. Ao, and C\. Huang \(2025\)LightRAG: simple and fast retrieval\-augmented generation\.InFindings of the Association for Computational Linguistics: EMNLP 2025,Suzhou, China,pp\. 10746–10761\.Cited by:[§2\.1](https://arxiv.org/html/2606.15405#S2.SS1.p2.1)\.
- B\. J\. Gutiérrez, Y\. Shu, W\. Qi, S\. Zhou, and Y\. Su \(2025\)From RAG to memory: non\-parametric continual learning for large language models\.InProceedings of the 42nd International Conference on Machine Learning \(ICML 2025\),Vancouver, BC, Canada\.Cited by:[§2\.1](https://arxiv.org/html/2606.15405#S2.SS1.p2.1)\.
- W\. Huang, Z\. Wang, H\. Lin, S\. Wang, B\. Xu, Q\. Li, B\. Zhu, L\. Yang, and C\. Qin \(2026\)AMA: adaptive memory via multi\-agent collaboration\.CoRRabs/2601\.20352\.Cited by:[§2\.1](https://arxiv.org/html/2606.15405#S2.SS1.p3.1)\.
- J\. Jang, M\. Boo, and H\. Kim \(2023\)Conversation chronicles: towards diverse temporal and relational dynamics in multi\-session conversations\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,Singapore,pp\. 13584–13606\.Cited by:[§2\.3](https://arxiv.org/html/2606.15405#S2.SS3.p1.1)\.
- P\. Lewis, E\. Perez, A\. Piktus, F\. Petroni, V\. Karpukhin, N\. Goyal, H\. Küttler, M\. Lewis, W\. Yih, T\. Rocktäschel, S\. Riedel, and D\. Kiela \(2020\)Retrieval\-augmented generation for knowledge\-intensive NLP tasks\.InAdvances in Neural Information Processing Systems 33 \(NeurIPS 2020\),Cited by:[§1](https://arxiv.org/html/2606.15405#S1.p1.1),[§2\.1](https://arxiv.org/html/2606.15405#S2.SS1.p1.1)\.
- Y\. Li, W\. Guo, L\. Zhang, R\. Xu, M\. Huang, H\. Liu, L\. Xu, Y\. Xu, and J\. Liu \(2026\)LoCoMo\-Plus: beyond\-factual cognitive memory evaluation framework for LLM agents\.CoRRabs/2602\.10715\.Cited by:[Appendix C](https://arxiv.org/html/2606.15405#A3.p1.1),[Appendix D](https://arxiv.org/html/2606.15405#A4.p1.1),[Figure 6](https://arxiv.org/html/2606.15405#A5.F6),[§2\.2](https://arxiv.org/html/2606.15405#S2.SS2.p1.1),[Table 3](https://arxiv.org/html/2606.15405#S3.T3),[§4\.1](https://arxiv.org/html/2606.15405#S4.SS1.SSS0.Px1.p1.1),[§4\.1](https://arxiv.org/html/2606.15405#S4.SS1.SSS0.Px2.p1.1),[§4\.1](https://arxiv.org/html/2606.15405#S4.SS1.SSS0.Px3.p1.2),[§4\.3](https://arxiv.org/html/2606.15405#S4.SS3.p1.1)\.
- Z\. Li, S\. Song, C\. Xi, H\. Wang, C\. Tang, S\. Niu, D\. Chen, J\. Yang, C\. Li, Q\. Yu, J\. Zhao, Y\. Wang, P\. Liu, Z\. Lin, P\. Wang, J\. Huo, T\. Chen, K\. Chen, K\. Li,et al\.\(2025\)MemOS: a memory OS for AI system\.CoRRabs/2507\.03724\.Cited by:[Figure 6](https://arxiv.org/html/2606.15405#A5.F6),[§1](https://arxiv.org/html/2606.15405#S1.p1.1),[§2\.1](https://arxiv.org/html/2606.15405#S2.SS1.p4.1),[Table 2](https://arxiv.org/html/2606.15405#S3.T2),[§4\.1](https://arxiv.org/html/2606.15405#S4.SS1.SSS0.Px2.p1.1),[§4\.1](https://arxiv.org/html/2606.15405#S4.SS1.SSS0.Px3.p1.2)\.
- A\. Maharana, D\. Lee, S\. Tulyakov, M\. Bansal, F\. Barbieri, and Y\. Fang \(2024\)Evaluating very long\-term conversational memory of LLM agents\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\), ACL 2024,Bangkok, Thailand,pp\. 13851–13870\.Cited by:[Appendix C](https://arxiv.org/html/2606.15405#A3.p1.1),[§1](https://arxiv.org/html/2606.15405#S1.p2.1),[§2\.2](https://arxiv.org/html/2606.15405#S2.SS2.p1.1),[§4\.1](https://arxiv.org/html/2606.15405#S4.SS1.SSS0.Px1.p1.1)\.
- P\. Mazaré, S\. Humeau, M\. Raison, and A\. Bordes \(2018\)Training millions of personalized dialogue agents\.InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing,Brussels, Belgium,pp\. 2775–2779\.Cited by:[§2\.3](https://arxiv.org/html/2606.15405#S2.SS3.p1.1)\.
- C\. Packer, V\. Fang, S\. G\. Patil, K\. Lin, S\. Wooders, and J\. E\. Gonzalez \(2023\)MemGPT: towards LLMs as operating systems\.CoRRabs/2310\.08560\.Cited by:[§1](https://arxiv.org/html/2606.15405#S1.p1.1),[§2\.1](https://arxiv.org/html/2606.15405#S2.SS1.p4.1)\.
- Z\. Pan, Q\. Wu, H\. Jiang, X\. Luo, H\. Cheng, D\. Li, Y\. Yang, C\. Lin, H\. V\. Zhao, L\. Qiu, and J\. Gao \(2025\)On memory construction and retrieval for personalized conversational agents\.InProceedings of the 13th International Conference on Learning Representations \(ICLR 2025\),Cited by:[§4\.1](https://arxiv.org/html/2606.15405#S4.SS1.SSS0.Px2.p1.1)\.
- P\. Rasmussen, P\. Paliychuk, T\. Beauvais, J\. Ryan, and D\. Chalef \(2025\)Zep: a temporal knowledge graph architecture for agent memory\.CoRRabs/2501\.13956\.Cited by:[§2\.1](https://arxiv.org/html/2606.15405#S2.SS1.p2.1),[§4\.1](https://arxiv.org/html/2606.15405#S4.SS1.SSS0.Px2.p1.1)\.
- R\. Salama, J\. Cai, M\. Yuan, A\. Currey, M\. Sunkara, Y\. Zhang, and Y\. Benajiba \(2025\)MemInsight: autonomous memory augmentation for LLM agents\.CoRRabs/2503\.21760\.Cited by:[§2\.1](https://arxiv.org/html/2606.15405#S2.SS1.p3.1)\.
- T\. Suddendorf and M\. C\. Corballis \(2007\)The evolution of foresight: what is mental time travel, and is it unique to humans?\.Behavioral and Brain Sciences30\(3\),pp\. 299–313\.External Links:[Document](https://dx.doi.org/10.1017/S0140525X07001975)Cited by:[§2\.2](https://arxiv.org/html/2606.15405#S2.SS2.p1.1)\.
- Y\. Wang and X\. Chen \(2025\)MIRIX: multi\-agent memory system for LLM\-based agents\.CoRRabs/2507\.07957\.Cited by:[§1](https://arxiv.org/html/2606.15405#S1.p1.1),[§2\.1](https://arxiv.org/html/2606.15405#S2.SS1.p4.1),[§4\.1](https://arxiv.org/html/2606.15405#S4.SS1.SSS0.Px2.p1.1)\.
- Y\. Wang, R\. Takanobu, Z\. Liang, Y\. Mao, Y\. Hu, J\. McAuley, and X\. Wu \(2025\)Mem\-α\\alpha: learning memory construction via reinforcement learning\.CoRRabs/2509\.25911\.Cited by:[Limitations](https://arxiv.org/html/2606.15405#Sx1.p1.1)\.
- S\. Xiao, Z\. Liu, P\. Zhang, N\. Muennighoff, D\. Lian, and J\. Nie \(2024\)C\-Pack: packed resources for general Chinese embeddings\.InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval \(SIGIR 2024\),Cited by:[Appendix E](https://arxiv.org/html/2606.15405#A5.SS0.SSS0.Px4.p1.6),[§4\.1](https://arxiv.org/html/2606.15405#S4.SS1.SSS0.Px3.p1.2)\.
- J\. Xu, A\. Szlam, and J\. Weston \(2022a\)Beyond goldfish memory: long\-term open\-domain conversation\.InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),Dublin, Ireland,pp\. 5180–5197\.Cited by:[§2\.3](https://arxiv.org/html/2606.15405#S2.SS3.p1.1)\.
- W\. Xu, Z\. Liang, K\. Mei, H\. Gao, J\. Tan, and Y\. Zhang \(2025\)A\-MEM: agentic memory for LLM agents\.CoRRabs/2502\.12110\.Cited by:[§1](https://arxiv.org/html/2606.15405#S1.p1.1),[§2\.1](https://arxiv.org/html/2606.15405#S2.SS1.p3.1),[§4\.1](https://arxiv.org/html/2606.15405#S4.SS1.SSS0.Px2.p1.1)\.
- X\. Xu, Z\. Gou, W\. Wu, Z\. Niu, H\. Wu, H\. Wang, and S\. Wang \(2022b\)Long time no see\! Open\-Domain conversation with long\-term persona memory\.InFindings of the Association for Computational Linguistics: ACL 2022,Dublin, Ireland,pp\. 2639–2650\.Cited by:[§1](https://arxiv.org/html/2606.15405#S1.p2.1),[§2\.3](https://arxiv.org/html/2606.15405#S2.SS3.p1.1)\.
- S\. Yan, X\. Yang, Z\. Huang, E\. Nie, Z\. Ding, Z\. Li, X\. Ma, J\. Bi, K\. Kersting, J\. Z\. Pan, H\. Schütze, V\. Tresp, and Y\. Ma \(2025\)Memory\-R1: enhancing large language model agents to manage and utilize memories via reinforcement learning\.CoRRabs/2508\.19828\.Cited by:[Limitations](https://arxiv.org/html/2606.15405#Sx1.p1.1)\.
- J\. Yue, C\. Hu, J\. Sheng, Z\. Zhou, W\. Zhang, T\. Liu, L\. Guo, and Y\. Deng \(2026\)HyperMem: hypergraph memory for long\-term conversations\.InProceedings of the 64th Annual Meeting of the Association for Computational Linguistics \(ACL 2026\),Cited by:[§1](https://arxiv.org/html/2606.15405#S1.p1.1),[§2\.1](https://arxiv.org/html/2606.15405#S2.SS1.p2.1),[Figure 5](https://arxiv.org/html/2606.15405#S4.F5),[§4\.1](https://arxiv.org/html/2606.15405#S4.SS1.SSS0.Px2.p1.1)\.
- S\. Zhang, E\. Dinan, J\. Urbanek, A\. Szlam, D\. Kiela, and J\. Weston \(2018\)Personalizing dialogue agents: I have a dog, do you have pets too?\.InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),Melbourne, Australia,pp\. 2204–2213\.Cited by:[§2\.3](https://arxiv.org/html/2606.15405#S2.SS3.p1.1)\.
- Z\. Zhang, X\. Bo, C\. Ma, R\. Li, X\. Chen, Q\. Dai, J\. Zhu, Z\. Dong, and J\. Wen \(2024\)A survey on the memory mechanism of large language model based agents\.CoRRabs/2404\.13501\.Cited by:[§2](https://arxiv.org/html/2606.15405#S2.p1.1)\.
- W\. Zhong, L\. Guo, Q\. Gao, H\. Ye, and Y\. Wang \(2024\)MemoryBank: enhancing large language models with long\-term memory\.InProceedings of the 38th AAAI Conference on Artificial Intelligence \(AAAI 2024\),pp\. 19724–19731\.Cited by:[§2\.1](https://arxiv.org/html/2606.15405#S2.SS1.p3.1),[§2\.3](https://arxiv.org/html/2606.15405#S2.SS3.p1.1)\.

## Appendix AAlgorithms

Algorithms[2](https://arxiv.org/html/2606.15405#alg2)–[5](https://arxiv.org/html/2606.15405#alg5)together expand the entire T\-Mem pipeline into stage\-by\-stage pseudocode: the online retrieval cascade \(§[3\.4](https://arxiv.org/html/2606.15405#S3.SS4), including the trigger\-recall calls invoked therein\), offline construction \(§[3\.2](https://arxiv.org/html/2606.15405#S3.SS2)\), and offline indexing \(§[3\.3](https://arxiv.org/html/2606.15405#S3.SS3)\)\. Algorithm[2](https://arxiv.org/html/2606.15405#alg2)in particular spells out the internals of theSceneCue​\(⋅\)\\mathrm\{SceneCue\}\(\\cdot\)andTrigRecall​\(⋅\)\\mathrm\{TrigRecall\}\(\\cdot\)calls abstracted in Algorithm[1](https://arxiv.org/html/2606.15405#alg1)of the main paper\. Algorithm[4](https://arxiv.org/html/2606.15405#alg4)expands Stage 2 of the construction pipeline \(theTopicLLM\\mathrm\{TopicLLM\}call in Algorithm[3](https://arxiv.org/html/2606.15405#alg3)\) into its three subroutines — batched matching, new\-topic creation, and topic update\.

Algorithm 2T\-Mem online retrieval \(full\): theSceneCue\\mathrm\{SceneCue\}andTrigRecall\\mathrm\{TrigRecall\}calls of Algorithm[1](https://arxiv.org/html/2606.15405#alg1)are unfolded inline\.1:query

qq; budgets

kT,kS,kIk^\{\\mathrm\{T\}\}\\\!,k^\{\\mathrm\{S\}\}\\\!,k^\{\\mathrm\{I\}\}; item\-trigger gate

τ=0\.85\\tau\{=\}0\.85
2:retrieval context handed to the QA LLM

3:Stage 1: topic\-label filtering

4:

ℛT←topkT​\(RRF​\(q,𝒱T\)\)\\mathcal\{R\}\_\{T\}\\\!\\leftarrow\\\!\\mathrm\{top\}\_\{k^\{\\mathrm\{T\}\}\}\\\!\\bigl\(\\mathrm\{RRF\}\(q,\\mathcal\{V\}^\{T\}\)\\bigr\)⊳\\trianglerightEq\.[2](https://arxiv.org/html/2606.15405#S3.E2)over BM25, dense

5:

𝒫S←\{vS:\(vT,vS\)∈ℰT​S,vT∈ℛT\}\\mathcal\{P\}\_\{S\}\\\!\\leftarrow\\\!\\\{v^\{S\}\\\!:\\\!\(v^\{T\},v^\{S\}\)\\\!\\in\\\!\\mathcal\{E\}^\{TS\},\\,v^\{T\}\\\!\\in\\\!\\mathcal\{R\}\_\{T\}\\\}
6:Stage 2: scene selection \(similarity∪\\cupassociative\)

7:⊳\\trianglerightSceneCue​\(q\)\\mathrm\{SceneCue\}\(q\)unfolded:

8:

rdlg,rscn,rhor←Cos​\(q,𝐞dlg\),Cos​\(q,𝐞scn\),maxs⁡Cos​\(q,𝐞shor\)r^\{\\mathrm\{dlg\}\},r^\{\\mathrm\{scn\}\},r^\{\\mathrm\{hor\}\}\\leftarrow\\mathrm\{Cos\}\(q,\\mathbf\{e\}^\{\\mathrm\{dlg\}\}\),\\mathrm\{Cos\}\(q,\\mathbf\{e\}^\{\\mathrm\{scn\}\}\),\\max\_\{s\}\\mathrm\{Cos\}\(q,\\mathbf\{e\}^\{\\mathrm\{hor\}\}\_\{s\}\)
9:

𝒜S←top​\(RRF​\(rdlg,rscn,rhor\)\)\\mathcal\{A\}\_\{S\}\\\!\\leftarrow\\\!\\mathrm\{top\}\\\!\\bigl\(\\mathrm\{RRF\}\(r^\{\\mathrm\{dlg\}\},r^\{\\mathrm\{scn\}\},r^\{\\mathrm\{hor\}\}\)\\bigr\)⊳\\trianglerightEq\.[2](https://arxiv.org/html/2606.15405#S3.E2)withM=3M\{=\}3→\\tohost scenes

10:

𝒞2←𝒫S∪𝒜S\\mathcal\{C\}\_\{2\}\\\!\\leftarrow\\\!\\mathcal\{P\}\_\{S\}\\cup\\mathcal\{A\}\_\{S\}
11:

ℛS←topkS​\(RRF​\(q,𝒞2\)\)\\mathcal\{R\}\_\{S\}\\\!\\leftarrow\\\!\\mathrm\{top\}\_\{k^\{\\mathrm\{S\}\}\}\\\!\\bigl\(\\mathrm\{RRF\}\(q,\\mathcal\{C\}\_\{2\}\)\\bigr\)
12:Stage 3: item selection \(scene\-bound∪\\cuptrigger\-reached\)

13:

𝒫I←\{vI:\(vS,vI\)∈ℰS​I,vS∈ℛS\}\\mathcal\{P\}\_\{I\}\\\!\\leftarrow\\\!\\\{v^\{I\}\\\!:\\\!\(v^\{S\},v^\{I\}\)\\\!\\in\\\!\\mathcal\{E\}^\{SI\},\\,v^\{S\}\\\!\\in\\\!\\mathcal\{R\}\_\{S\}\\\}
14:⊳\\trianglerightTrigRecall​\(q;τ\)\\mathrm\{TrigRecall\}\(q;\\tau\)unfolded:

15:for

vI∈𝒱Iv^\{I\}\\in\\mathcal\{V\}^\{I\}do

16:

svI←nanmax\(Cos\(q,𝐞vIcon\),s\_\{v^\{I\}\}\\\!\\leftarrow\\\!\\mathrm\{nanmax\}\\bigl\(\\mathrm\{Cos\}\(q,\\mathbf\{e\}^\{\\mathrm\{con\}\}\_\{v^\{I\}\}\),
17:

Cos\(q,𝐞vIbrg\),Cos\(q,𝐞vIjoint\)\)\\mathrm\{Cos\}\(q,\\mathbf\{e\}^\{\\mathrm\{brg\}\}\_\{v^\{I\}\}\),\\mathrm\{Cos\}\(q,\\mathbf\{e\}^\{\\mathrm\{joint\}\}\_\{v^\{I\}\}\)\\bigr\)
18:endfor

19:

𝒜I←\{vI:svI≥τ\}\\mathcal\{A\}\_\{I\}\\\!\\leftarrow\\\!\\\{v^\{I\}\\\!:\\\!s\_\{v^\{I\}\}\\\!\\geq\\\!\\tau\\\}⊳\\trianglerighthard cosine gate→\\tohost items

20:

𝒞3←𝒫I∪𝒜I\\mathcal\{C\}\_\{3\}\\\!\\leftarrow\\\!\\mathcal\{P\}\_\{I\}\\cup\\mathcal\{A\}\_\{I\}
21:

ℛI←topkI​\(RRF​\(q,𝒞3\)\)\\mathcal\{R\}\_\{I\}\\\!\\leftarrow\\\!\\mathrm\{top\}\_\{k^\{\\mathrm\{I\}\}\}\\\!\\bigl\(\\mathrm\{RRF\}\(q,\\mathcal\{C\}\_\{3\}\)\\bigr\)
22:Persona\+\+answer generation

23:

X←Persona​\(speaker​\(q\)\)X\\\!\\leftarrow\\\!\\mathrm\{Persona\}\(\\mathrm\{speaker\}\(q\)\)
24:return

QA​\_​LLM​\(q,ℛS,ℛI,X\)\\mathrm\{QA\\\_LLM\}\\\!\\bigl\(q,\\,\\mathcal\{R\}\_\{S\},\\,\\mathcal\{R\}\_\{I\},\\,X\\bigr\)

Algorithm 3T\-Mem offline construction\.1:dialogue stream

X=\(xt\)t=1TX=\(x\_\{t\}\)\_\{t=1\}^\{T\}
2:memory

ℳ=\(𝒱T∪𝒱S∪𝒱I,ℰT​S∪ℰS​I,𝒯∗,𝒳\)\\mathcal\{M\}=\(\\mathcal\{V\}^\{T\}\\\!\\cup\\\!\\mathcal\{V\}^\{S\}\\\!\\cup\\\!\\mathcal\{V\}^\{I\},\\,\\mathcal\{E\}^\{TS\}\\\!\\cup\\\!\\mathcal\{E\}^\{SI\},\\,\\mathcal\{T\}^\{\*\},\\,\\mathcal\{X\}\)
3:

𝒱S,𝒱T,𝒱I←∅\\mathcal\{V\}^\{S\},\\mathcal\{V\}^\{T\},\\mathcal\{V\}^\{I\}\\leftarrow\\varnothing;

ℰT​S,ℰS​I←∅\\mathcal\{E\}^\{TS\},\\mathcal\{E\}^\{SI\}\\leftarrow\\varnothing;

H←\[\]H\\leftarrow\[\\,\]
4:Stage 1: scene segmentation

5:for

xt∈Xx\_\{t\}\\in Xdo

6:append

xtx\_\{t\}to buffer

HH
7:if

BoundaryLLM​\(H\)=close\\mathrm\{BoundaryLLM\}\(H\)=\\textsc\{close\}then

8:emit

vS=ScenePack​\(H\)v^\{S\}\\\!=\\\!\\mathrm\{ScenePack\}\(H\);

𝒱S←𝒱S∪\{vS\}\\mathcal\{V\}^\{S\}\\\!\\leftarrow\\\!\\mathcal\{V\}^\{S\}\\cup\\\{v^\{S\}\\\}
9:

H←\[\]H\\leftarrow\[\\,\]
10:endif

11:endfor

12:Stage 2: topic assignment

13:for

vnewS∈𝒱Sv^\{S\}\_\{\\text\{new\}\}\\in\\mathcal\{V\}^\{S\}\(in arrival order\)do

14:

𝒜←TopicLLM​\(vnewS,𝒱T\)\\mathcal\{A\}\\leftarrow\\mathrm\{TopicLLM\}\(v^\{S\}\_\{\\text\{new\}\},\\,\\mathcal\{V\}^\{T\}\)⊳\\trianglerightbatched per\-label admission

15:if

𝒜=∅\\mathcal\{A\}=\\varnothingthen

16:open new

vTv^\{T\}from

vnewSv^\{S\}\_\{\\text\{new\}\};

𝒱T←𝒱T∪\{vT\}\\mathcal\{V\}^\{T\}\\\!\\leftarrow\\\!\\mathcal\{V\}^\{T\}\\cup\\\{v^\{T\}\\\}; add

\(vT,vnewS\)\(v^\{T\},v^\{S\}\_\{\\text\{new\}\}\)to

ℰT​S\\mathcal\{E\}^\{TS\}
17:else

18:for

vT∈𝒜v^\{T\}\\in\\mathcal\{A\}do

19:add

\(vT,vnewS\)\(v^\{T\},v^\{S\}\_\{\\text\{new\}\}\)to

ℰT​S\\mathcal\{E\}^\{TS\}
20:refresh metadata of

vTv^\{T\}over its scenes

21:endfor

22:endif

23:endfor

24:Stage 3: item extraction

25:for

vT∈𝒱Tv^\{T\}\\in\\mathcal\{V\}^\{T\}do

26:

𝒱tS←\{vS:\(vT,vS\)∈ℰT​S\}\\mathcal\{V\}^\{S\}\_\{t\}\\leftarrow\\\{v^\{S\}\\\!:\(v^\{T\},v^\{S\}\)\\\!\\in\\\!\\mathcal\{E\}^\{TS\}\\\}
27:

\(ℐatom,ℐconn\)←ItemLLM​\(vT,𝒱tS\)\(\\mathcal\{I\}^\{\\mathrm\{atom\}\},\\mathcal\{I\}^\{\\mathrm\{conn\}\}\)\\leftarrow\\mathrm\{ItemLLM\}\(v^\{T\},\\mathcal\{V\}^\{S\}\_\{t\}\)
28:for

vI∈ℐatomv^\{I\}\\in\\mathcal\{I\}^\{\\mathrm\{atom\}\}do

29:

𝒱I←𝒱I∪\{vI\}\\mathcal\{V\}^\{I\}\\\!\\leftarrow\\\!\\mathcal\{V\}^\{I\}\\cup\\\{v^\{I\}\\\}; add single

\(vsrcS​\(vI\),vI\)\(v^\{S\}\_\{\\mathrm\{src\}\}\(v^\{I\}\),v^\{I\}\)to

ℰS​I\\mathcal\{E\}^\{SI\}
30:endfor

31:for

vI∈ℐconnv^\{I\}\\in\\mathcal\{I\}^\{\\mathrm\{conn\}\}do

32:

𝒱I←𝒱I∪\{vI\}\\mathcal\{V\}^\{I\}\\\!\\leftarrow\\\!\\mathcal\{V\}^\{I\}\\cup\\\{v^\{I\}\\\}; add edges

\{\(vS,vI\):vS∈𝒱srcS​\(vI\)\}\\\{\(v^\{S\},v^\{I\}\)\\\!:v^\{S\}\\\!\\in\\\!\\mathcal\{V\}^\{S\}\_\{\\mathrm\{src\}\}\(v^\{I\}\)\\\}to

ℰS​I\\mathcal\{E\}^\{SI\}
33:endfor

34:endfor

35:Stage 4: trigger instantiation

36:for

vI∈𝒱Iv^\{I\}\\in\\mathcal\{V\}^\{I\}do

37:

\(𝒯vIEnt,𝒯vIBrg,rvI\)←ItemTrigLLM​\(vI\)\(\\mathcal\{T\}^\{\\mathrm\{Ent\}\}\_\{v^\{I\}\},\\mathcal\{T\}^\{\\mathrm\{Brg\}\}\_\{v^\{I\}\},r\_\{v^\{I\}\}\)\\leftarrow\\mathrm\{ItemTrigLLM\}\(v^\{I\}\)⊳\\trianglerightQ I, Q II\+\+rationale

38:endfor

39:for

vS∈𝒱Sv^\{S\}\\in\\mathcal\{V\}^\{S\}do

40:

\(𝒯vSScn,𝒯vSHor\)←SceneTrigLLM​\(vS\)\(\\mathcal\{T\}^\{\\mathrm\{Scn\}\}\_\{v^\{S\}\},\\mathcal\{T\}^\{\\mathrm\{Hor\}\}\_\{v^\{S\}\}\)\\leftarrow\\mathrm\{SceneTrigLLM\}\(v^\{S\}\)⊳\\trianglerightQ IV, Q III

41:endfor

42:Persona \(per speakeruu, in parallel\)

43:

𝒳←\{\(u,PersonaLLM​\(\{xt:spk​\(xt\)=u\}\)\)\}u\\mathcal\{X\}\\leftarrow\\\{\\,\(u,\\mathrm\{PersonaLLM\}\(\\\{x\_\{t\}\\\!:\\\!\\mathrm\{spk\}\(x\_\{t\}\)\\\!=\\\!u\\\}\)\)\\,\\\}\_\{u\}
44:return

ℳ\\mathcal\{M\}

Algorithm 4T\-Mem topic assignment: theTopicLLM\\mathrm\{TopicLLM\}call of Algorithm[3](https://arxiv.org/html/2606.15405#alg3)\(Stage 2\) unfolded into three subroutines\.1:new scene

vSv^\{S\}; existing topic pool

𝒱T\\mathcal\{V\}^\{T\}; batch size

bb
2:updated topic pool

𝒱T\\mathcal\{V\}^\{T\}and edge set

ℰT​S\\mathcal\{E\}^\{TS\}
3:if

𝒱T=∅\\mathcal\{V\}^\{T\}=\\varnothingthen

4:

vT←CreateNewTopic​\(\{vS\}\)v^\{T\}\\\!\\leftarrow\\\!\\textsc\{CreateNewTopic\}\(\\\{v^\{S\}\\\}\)
5:

𝒱T←𝒱T∪\{vT\}\\mathcal\{V\}^\{T\}\\\!\\leftarrow\\\!\\mathcal\{V\}^\{T\}\\cup\\\{v^\{T\}\\\}; add

\(vT,vS\)\(v^\{T\},v^\{S\}\)to

ℰT​S\\mathcal\{E\}^\{TS\}
6:return

7:endif

8:Match phase \(batched\)\.

9:

𝒜←∅\\mathcal\{A\}\\leftarrow\\varnothing
10:foreach batch

ℬ⊆𝒱T\\mathcal\{B\}\\subseteq\\mathcal\{V\}^\{T\}of size

≤b\\leq bdo

11:

𝒜←𝒜∪MatchBatch​\(vS,ℬ\)\\mathcal\{A\}\\leftarrow\\mathcal\{A\}\\cup\\textsc\{MatchBatch\}\(v^\{S\},\\mathcal\{B\}\)⊳\\trianglerightLLM admitsvSv^\{S\}to multiple topics

12:endfor

13:if

𝒜=∅\\mathcal\{A\}=\\varnothingthen

14:

vT←CreateNewTopic​\(\{vS\}\)v^\{T\}\\\!\\leftarrow\\\!\\textsc\{CreateNewTopic\}\(\\\{v^\{S\}\\\}\)
15:

𝒱T←𝒱T∪\{vT\}\\mathcal\{V\}^\{T\}\\\!\\leftarrow\\\!\\mathcal\{V\}^\{T\}\\cup\\\{v^\{T\}\\\}; add

\(vT,vS\)\(v^\{T\},v^\{S\}\)to

ℰT​S\\mathcal\{E\}^\{TS\}
16:return

17:endif

18:Update phase\.

19:foreach

vT∈𝒜v^\{T\}\\in\\mathcal\{A\}do

20:

vT←UpdateTopic​\(vT,vS\)v^\{T\}\\\!\\leftarrow\\\!\\textsc\{UpdateTopic\}\(v^\{T\},v^\{S\}\)
21:add

\(vT,vS\)\(v^\{T\},v^\{S\}\)to

ℰT​S\\mathcal\{E\}^\{TS\}
22:endfor

23:

24:Subroutine

MatchBatch​\(vS,ℬ\)\\textsc\{MatchBatch\}\(v^\{S\},\\mathcal\{B\}\):

25:ask the matcher LLM, for each

vT∈ℬv^\{T\}\\\!\\in\\\!\\mathcal\{B\}, whether

vSv^\{S\}continues the same specific event thread as

vTv^\{T\};

26:return the subset for which the LLM answerstrue\.

27:Subroutine

CreateNewTopic​\(\{vS\}\)\\textsc\{CreateNewTopic\}\(\\\{v^\{S\}\\\}\):

28:ask an extractor LLM for a specific title and a keyword list from

vSv^\{S\};

29:store

summary=title×2∥keywords\\mathrm\{summary\}\\\!=\\\!\\mathrm\{title\}\{\\times\}2\\,\\\|\\,\\mathrm\{keywords\}for the BM25 index of §[3\.3](https://arxiv.org/html/2606.15405#S3.SS3)\.

30:Subroutine

UpdateTopic​\(vT,vS\)\\textsc\{UpdateTopic\}\(v^\{T\},v^\{S\}\):

31:ask an updater LLM to fold

vSv^\{S\}into

vTv^\{T\}while keeping the title’s specific identity stable; merge new keywords into the existing list and recompute

summary\\mathrm\{summary\}\.

Algorithm 5T\-Mem offline indexing\.1:memory

ℳ\\mathcal\{M\}with node sets

𝒱T,𝒱S,𝒱I\\mathcal\{V\}^\{T\},\\mathcal\{V\}^\{S\},\\mathcal\{V\}^\{I\}and trigger sets

𝒯Ent,𝒯Brg,𝒯Scn,𝒯Hor\\mathcal\{T\}^\{\\mathrm\{Ent\}\},\\mathcal\{T\}^\{\\mathrm\{Brg\}\},\\mathcal\{T\}^\{\\mathrm\{Scn\}\},\\mathcal\{T\}^\{\\mathrm\{Hor\}\}; encoder

Enc​\(⋅\)\\mathrm\{Enc\}\(\\cdot\)
2:BM25 corpus

ℬ\\mathcal\{B\}, dense tables

𝒟∙\\mathcal\{D\}^\{\\bullet\}, multi\-view trigger indices

3:Node\-level lexical and dense

4:for

∙∈\{T,S,I\}\\bullet\\in\\\{T,S,I\\\}and

v∈𝒱∙v\\in\\mathcal\{V\}^\{\\bullet\}do

5:register

Concatw​\(v\)\\mathrm\{Concat\}\_\{\\mathrm\{w\}\}\(v\)into shared BM25 corpus

ℬ\\mathcal\{B\}⊳\\trianglerightper\-type weighted field concat

6:

𝒟∙​\[v\]←Enc​\(v\)\\mathcal\{D\}^\{\\bullet\}\[v\]\\leftarrow\\mathrm\{Enc\}\(v\)
7:endfor

8:Item\-level multi\-view trigger index

9:for

vI∈𝒱Iv^\{I\}\\in\\mathcal\{V\}^\{I\}with

\(𝒯vIEnt,𝒯vIBrg,rvI\)\(\\mathcal\{T\}^\{\\mathrm\{Ent\}\}\_\{v^\{I\}\},\\mathcal\{T\}^\{\\mathrm\{Brg\}\}\_\{v^\{I\}\},r\_\{v^\{I\}\}\)do

10:

𝐞vIcon←Enc​\(𝒯vIEnt\)\\mathbf\{e\}^\{\\mathrm\{con\}\}\_\{v^\{I\}\}\\\!\\leftarrow\\\!\\mathrm\{Enc\}\(\\mathcal\{T\}^\{\\mathrm\{Ent\}\}\_\{v^\{I\}\}\);

𝐞vIbrg←Enc​\(𝒯vIBrg\)\\mathbf\{e\}^\{\\mathrm\{brg\}\}\_\{v^\{I\}\}\\\!\\leftarrow\\\!\\mathrm\{Enc\}\(\\mathcal\{T\}^\{\\mathrm\{Brg\}\}\_\{v^\{I\}\}\);

𝐞vIjoint←Enc​\(𝒯vIEnt​‖𝒯vIBrg‖​rvI\)\\mathbf\{e\}^\{\\mathrm\{joint\}\}\_\{v^\{I\}\}\\\!\\leftarrow\\\!\\mathrm\{Enc\}\(\\mathcal\{T\}^\{\\mathrm\{Ent\}\}\_\{v^\{I\}\}\\\!\\\|\\mathcal\{T\}^\{\\mathrm\{Brg\}\}\_\{v^\{I\}\}\\\!\\\|r\_\{v^\{I\}\}\)
11:store

\(vI,𝐞con,𝐞brg,𝐞joint\)\(v^\{I\},\\mathbf\{e\}^\{\\mathrm\{con\}\},\\mathbf\{e\}^\{\\mathrm\{brg\}\},\\mathbf\{e\}^\{\\mathrm\{joint\}\}\); empty fields written as

NaN\\mathrm\{NaN\}sentinel

12:endfor

13:Scene\-level multi\-view trigger index

14:for

vS∈𝒱Sv^\{S\}\\in\\mathcal\{V\}^\{S\}with

\(𝒯vSScn,𝒯vSHor\)\(\\mathcal\{T\}^\{\\mathrm\{Scn\}\}\_\{v^\{S\}\},\\mathcal\{T\}^\{\\mathrm\{Hor\}\}\_\{v^\{S\}\}\)do

15:

𝐞vSdlg←Enc​\(dialogue​\(vS\)\)\\mathbf\{e\}^\{\\mathrm\{dlg\}\}\_\{v^\{S\}\}\\\!\\leftarrow\\\!\\mathrm\{Enc\}\(\\mathrm\{dialogue\}\(v^\{S\}\)\);

𝐞vSscn←Enc​\(𝒯vSScn\)\\mathbf\{e\}^\{\\mathrm\{scn\}\}\_\{v^\{S\}\}\\\!\\leftarrow\\\!\\mathrm\{Enc\}\(\\mathcal\{T\}^\{\\mathrm\{Scn\}\}\_\{v^\{S\}\}\);

𝐞vShor←\{Enc​\(s\):s∈𝒯vSHor\}\\mathbf\{e\}^\{\\mathrm\{hor\}\}\_\{v^\{S\}\}\\\!\\leftarrow\\\!\\\{\\mathrm\{Enc\}\(s\)\\\!:\\\!s\\\!\\in\\\!\\mathcal\{T\}^\{\\mathrm\{Hor\}\}\_\{v^\{S\}\}\\\}
16:endfor

17:return

ℬ\\mathcal\{B\},

\{𝒟∙\}\\\{\\mathcal\{D\}^\{\\bullet\}\\\}, item\-trigger index, scene\-trigger index

## Appendix BPrompts

For completeness, we reproduce the matched LoCoMo QA prompt \(Figure[6](https://arxiv.org/html/2606.15405#A5.F6)\) at the end of the appendix; for the QA\-pipeline decomposition of §[E](https://arxiv.org/html/2606.15405#A5), we additionally reproduce HyperMem’s 7\-step CoT QA prompt \(Figure[7](https://arxiv.org/html/2606.15405#A5.F7)\), so that the two QA prompts underlying the matched and the†\\daggerrows of Table[2](https://arxiv.org/html/2606.15405#S3.T2)can be read side by side\.

## Appendix CCase studies

We illustrate T\-Mem’s behaviour with five representative cases drawn from real test instances: four from LoCoMo \(covering the single\-hop, multi\-hop, temporal, and open\-domain question types ofMaharanaet al\.,[2024](https://arxiv.org/html/2606.15405#bib.bib9)\) and one from LoCoMo\-Plus \(covering the cognitive cue\-continuation regime ofLiet al\.,[2026](https://arxiv.org/html/2606.15405#bib.bib16)\)\. Each case shows the recalled evidence, the query, the gold answer, T\-Mem’s prediction, and the mechanism behind the correct call\. The five cases are collected at the end of the appendix as Figures[8](https://arxiv.org/html/2606.15405#A5.F8)–[12](https://arxiv.org/html/2606.15405#A5.F12): single\-hop \(Figure[8](https://arxiv.org/html/2606.15405#A5.F8)\), temporal \(Figure[9](https://arxiv.org/html/2606.15405#A5.F9)\), multi\-hop \(Figure[10](https://arxiv.org/html/2606.15405#A5.F10)\), open\-domain \(Figure[11](https://arxiv.org/html/2606.15405#A5.F11)\), and cognitive \(Figure[12](https://arxiv.org/html/2606.15405#A5.F12)\)\.

## Appendix DTerminology: “trigger query” vs\. trigger

Our*triggers*\(Entity, Bridge, Scene, Horizon; §[3](https://arxiv.org/html/2606.15405#S3)\) are memory\-side indexing objects attached at write time\. The phrase “trigger query” inLiet al\.\([2026](https://arxiv.org/html/2606.15405#bib.bib16)\)refers, by contrast, to a query\-side probe issued after a time gap\. The two usages target opposite ends of the same regime and are not in conflict\.

## Appendix EReproducibility

##### Decomposing the two QA pipelines\.

The 15\.72\-pp accuracy gap between HyperMem’s reported 92\.73 and our re\-run of HyperMem under the official LoCoMo pipeline \(77\.01\) is driven jointly by two confounded factors bundled inside HyperMem’s own QA pipeline\. The first is a stronger answer\-generation LLM \(GPT\-4\.1\-mini vs\. GPT\-4o\-mini\); the second is a 7\-step CoT QA prompt that encourages long, exhaustive answers, against the official LoCoMo prompt that caps answers at 5–6 words\. The two prompts can be read side by side in Figures[6](https://arxiv.org/html/2606.15405#A5.F6)and[7](https://arxiv.org/html/2606.15405#A5.F7); the full HyperMem template is available in their released code\. To disentangle the two factors we freeze HyperMem’s retrieval output on all 1,540 LoCoMo questions and vary only the QA stage along these two axes\. This defines three configurations: \(A\) HyperMem’s reported setup \(GPT\-4\.1\-mini \+ 7\-step CoT\); \(B\) the official LoCoMo setup \(GPT\-4o\-mini \+ official prompt\); and \(C\) a controlled rerun that swaps only the answer\-generation LLM, keeping the 7\-step CoT prompt \(GPT\-4o\-mini \+ 7\-step CoT\)\. Mean prediction length is 76\.95 tokens under \(A\), 45\.71 under \(C\), and 5\.87 under \(B\), so prompt and model each move length by a comparable factor and stack multiplicatively\. Crucially, both factors push answers longer in exactly the way that benefits an LLM\-as\-judge over a token\-level F1 metric, which is why the†\\daggerrows in Table[2](https://arxiv.org/html/2606.15405#S3.T2)simultaneously show a∼\\sim16\-pp accuracy inflation and an∼\\sim30\-pp F1 collapse against the matched\-pipeline rows\.

##### Worked case — temporal disambiguation\.

The mechanism by which the 7\-step CoT prompt converts retrieval output into LLM\-judge gains is most cleanly visible on temporal questions whose evidence supports several plausible answers\. Figure[13](https://arxiv.org/html/2606.15405#A5.F13)contrasts the two pipelines on the same \(retrieval, question\) pair\.

##### Takeaway\.

The 7\-step CoT prompt is not directly comparable with prior LoCoMo numbers tied to the 5–6\-word official prompt\. We therefore report T\-Mem under both: 80\.26 in the matched\-pipeline rows of Table[2](https://arxiv.org/html/2606.15405#S3.T2), and 93\.70 in the†\\daggerrows\. T\-Mem leads under both\.

##### Hyperparameters\.

T\-Mem runs the same memory store and the same retrieval cascade on both LoCoMo and LoCoMo\-Plus\. The dense encoder isbge\-m3\(Xiaoet al\.,[2024](https://arxiv.org/html/2606.15405#bib.bib18)\)and the reranker isbge\-reranker\-v2\-m3\(§[4\.1](https://arxiv.org/html/2606.15405#S4.SS1)\)\. Final QA budgets:\(kT,kS,kI\)=\(15,5,15\)\(k^\{\\mathrm\{T\}\},k^\{\\mathrm\{S\}\},k^\{\\mathrm\{I\}\}\)=\(15,5,15\)on both benchmarks; the item\-trigger union top\-KKis1010\. The item\-trigger hard cosine gate is0\.850\.85\. RRF smoothing constant isk0=60k\_\{0\}=60throughout\. QA decoding uses temperature0\. All retrieval components are CPU\-bound on the client side, so a single workstation with no local GPU is sufficient to reproduce both end\-to\-end evaluations\.

LoCoMo QA prompt — matched configurationRoleYou are an intelligent memory assistant tasked with retrieving accurate information from episodic memories\.ContextYou have access to episodic memories from conversations between two speakers\. These memories contain timestamped information that may be relevant to answering the question\.Instructions•Carefully analyze all provided episodic memories from both speakers\.•Pay special attention to the timestamps to determine the answer\.•If the question asks about a specific event or fact, look for direct evidence in the memories\.•If the memories contain contradictory information, prioritize the most recent memory\.•Convert relative time references \(e\.g\., “last year”, “two months ago”\) to specific dates / months / years based on the memory timestamp; ignore the relative reference in your final answer\.•If the original memory explicitly mentions an exact day of the week \(e\.g\., “Monday”\), include that weekday in your answer\.•Focus only on the content of the episodic memories from both speakers\. Do not confuse character names mentioned in memories with the actual users who created those memories\.•The answer should be less than 5–6 words\.Approach \(Think step by step\)•Examine all memories whose timestamps and content are related to the question\.•If the answer requires calculation \(e\.g\., converting relative time references\), show your work; otherwise extract the answer directly\.•Formulate a precise, concise answer based solely on the evidence in the memories, including the weekday if it is explicitly mentioned in the original memory\.•Double\-check that the final answer is specific and avoids vague time references\.Input\{context\},\{question\}\.Outputshort answer string\.Figure 6:Prompt template for LoCoMo QA generation \(matched\-row systems\), reproduced verbatim\. The LoCoMo judge prompt \(CORRECT/WRONG grader\) is reused verbatim fromLiet al\.\([2025](https://arxiv.org/html/2606.15405#bib.bib8)\); the LoCoMo\-Plus QA and judge prompts follow the official protocol ofLiet al\.\([2026](https://arxiv.org/html/2606.15405#bib.bib16)\)\.HyperMem 7\-step CoT QA prompt —ANSWER\_PROMPT\_NEMORI\_COTRoleYou are an intelligent memory assistant tasked with retrieving accurate information from episodic memories\.ContextYou have access to episodic memories from conversations between two speakers\. These memories contain timestamped information that may be relevant to answering the question\.InstructionsSynthesize information from all relevant memories to provide a comprehensive and accurate answer\. You MUST follow a structured Chain\-of\-Thought process to ensure no details are missed\. Actively look for connections between people, places, and events to build a complete picture; synthesize information from different memories to answer the user’s question\. It is CRITICAL that you move beyond simple fact extraction and perform logical inference\. When the evidence strongly suggests a connection, you must state that connection\. Do not dismiss reasonable inferences as “speculation”\.Critical requirements•neveromit specific names — use “Amy’s colleague Rob” not “a colleague”\.•alwaysinclude exact numbers, amounts, prices, percentages, dates, times\.•preservefrequencies exactly — “every Tuesday and Thursday” not “twice a week”\.•maintainall proper nouns and entities as they appear\.Response format \(7 steps\)•Step 1 Relevant memories extraction\.List each memory that relates to the question, with its timestamp\.•Step 2 Key information identification\.Extract all specific details: names, numbers / quantities, dates / times, frequencies, and other entities \(brands, products, etc\.\)\.•Step 3 Cross\-memory linking\.Identify entities that appear in multiple memories and link related information; make reasonable inferences when entities are strongly connected, listing shared entities, explicit connections, and inferred facts\.•Step 4 Time\-reference calculation\.If applicable, convert each relative time reference to its calculated actual time\.•Step 5 Contradiction check\.If multiple memories contain different information, describe the conflict and explain which is most recent / reliable\.•Step 6 Detail\-verification checklist\.Verify that all person names, locations, exact numbers, frequencies, dates / times, and proper nouns from the relevant memories appear in the answer\.•Step 7 Answer formulation\.Explain how you are combining the information to answer the question\.Final AnswerProvide the concise answer with all specific details preserved\.Input\{context\},\{question\}\.Output7\-step CoT trace followed by a final answer\.Figure 7:HyperMem’s QA prompt \(ANSWER\_PROMPT\_NEMORI\_COT\), reproduced verbatim\. It contrasts with the matched LoCoMo QA prompt of Figure[6](https://arxiv.org/html/2606.15405#A5.F6)along two axes: \(i\) a 7\-step CoT scaffold that requires enumerating every candidate atStep 2, and \(ii\) no length cap \(vs\. the official prompt’s 5–6\-word cap\)\. §[E](https://arxiv.org/html/2606.15405#A5)decomposes the resulting accuracy gap\.Case 1 — single\-hop; scene\-level Trigger recallEvidence at write time“John: We were lucky to find a lovely greenhouse venue for a smaller, more intimate gathering\.”Query“What type of venue did John and his girlfriend choose for their wedding ceremony?”Gold answerGreenhouseT Mem predictionGreenhouse venueWhy correctThe query’s surface phrase “wedding ceremony venue” shares no strong lexical anchor with the host scene’s wording, and the topic\-label prefilter alone does not surface this scene\. T\-Mem’s scene\-level Triggers, fired by the cue’s wedding / intimate\-gathering / venue\-choice associative signal, recall the host scene independently of the prefilter, after which the QA LLM extracts “greenhouse” from the recalled evidence\.Figure 8:Single\-hop case: scene\-level Trigger recall surfaces a host scene that the topic\-label prefilter alone misses\.Case 2 — temporal; scene\-level Trigger recall \+ time anchorEvidence at write time\(host scene timestamped Sep 4, 2022\) “James: Yesterday, when we were at the theater … I asked her to become my girlfriend, and she agreed\.”Query“When did James ask Samantha to be his girlfriend?”Gold answerSeptember 3, 2022T Mem predictionSeptember 3, 2022Why correctThe host scene is recalled by scene\-level Triggers \(activated by the cue’s propose / theater / girlfriend associative signal\) rather than by the topic prefilter\. The QA LLM then resolves the relative phrase “yesterday” against the recalled scene’s timestamp to produce the absolute date\.Figure 9:Temporal case: scene\-level Trigger recall combined with a recalled scene timestamp resolves a relative time reference\.Case 3 — multi\-hopEvidence at write time\(two turns within one recalled scene\)•“Caroline: I’ve known these friends for 4 years, since I moved from my home country\.”•“Caroline: This necklace is super special to me — a gift from my grandma in my home country, Sweden\.”Query“Where did Caroline move from 4 years ago?”Gold answerSwedenT Mem predictionFrom her home country, Sweden\.Why correctNo single turn states “moved from Sweden” verbatim\. T\-Mem’s scene\-level recall surfaces the host scene where the two turns coexist; the QA LLM bridges “home country 4 years ago” with the later “home country, Sweden” to recover the answer\.Figure 10:Multi\-hop case: two turns of a recalled scene must be composed to answer a single query\.Case 4 — open\-domainEvidence at write time\(host scene timestamped Dec 26, 2023\) “Evan: Got married last week…”Query“Which major holiday season coincides with Evan’s wedding?”Gold answerChristmasT Mem predictionChristmas seasonWhy correctThe recalled evidence does not mention Christmas\. T\-Mem retrieves the host scene whose timestamp anchors the wedding to the week ending Dec 25, 2023; the QA LLM combines the absolute scene date with general world knowledge to identify the matching holiday season\.Figure 11:Open\-domain case: a recalled scene timestamp combined with general world knowledge yields an answer not present in the evidence text\.Case 5 — cognitive; Horizon TriggerCue dialogue at write time“Evan: I’m saving all my vacation days for one big trip because I want to spend a month exploring Japan’s countryside\. Sam: That sounds like an unforgettable cultural experience\.”Trigger query \(six months later\)“I’ve been spending my weekends visiting local historical sites, and it’s surprisingly rich without needing a passport\.”T Mem prediction\(B’s continuation, excerpt\) “That sounds amazing, Evan\! … I remember how excited you were about exploring Japan’s countryside, and it’s wonderful that you’re discovering cultural richness right in your backyard …”Why correctThe trigger query carries no surface anchor to “Japan” or to travel commitment\. T\-Mem’s Horizon Trigger, which at write time registered Evan’s stated travel commitment as a forward\-looking signal, recalls the host scene under the cue’s implicit “exploration without travelling far” reframing\. The QA LLM then explicitly references the recalled commitment, which the judge cites as the ground for thecorrectverdict\.Figure 12:Cognitive case \(LoCoMo\-Plus\): a Horizon\-Trigger channel surfaces a temporally distant scene whose connection to the trigger query is associative rather than lexical\.Worked case — temporal disambiguationQuestionWhen was Calvin’s concert in Tokyo?Goldlast week of May 2023\.\(A\) 7\-step CoT — 480 chars“Calvin’s concert in Tokyo took place during his tour with Frank Ocean fromMay 24–30, 2023, where he described the experience as unreal with an insanely energetic crowd\. He also performed at a Tokyo music festival aroundAugust 21, 2023… Additionally, onAugust 14, 2023, Calvin had a concert in Tokyo with an engaged audience…”\(B\) Official LoCoMo — 24 chars“August 14, 2023 \(Monday\)\.”MechanismThe retrieved evidence supports several real Tokyo performance dates; \(A\) enumerates all of them under theStep 2 key\-informationclause and is credited as soon as any one matches the gold; \(B\), capped at 5–6 words by the official prompt, must commit to a single date and selects the August one — factually a real performance, but not the one the gold labels\. Accuracy difference here reflects the answer format, not which system understood the question better\.Figure 13:Temporal worked case\. Same retrieval, same question; only the QA pipeline differs\.

Similar Articles

Scaling Self-Evolving Agents via Parametric Memory

arXiv cs.AI

Researchers from Alibaba/Qwen and Peking University introduce TMEM, a self-evolving parametric memory framework that uses online LoRA weight updates to let LLM agents genuinely learn from experience within a single episode, rather than relying solely on prompt-space memory. TMEM outperforms summary-based and retrieval-based baselines across multiple benchmarks including LoCoMo, LongMemEval-S, and CL-Bench.