Infini Memory: Maintainable Topic Documents for Long-Term LLM Agent Memory

arXiv cs.AI Papers

Summary

Introduces Infini Memory, a maintainable text-based persistent memory architecture for LLM agents that uses topic-structured documents and iterative retrieval to improve long-term memory usage, achieving 64.7% on MemoryAgentBench.

arXiv:2606.10677v1 Announce Type: new Abstract: Long-term LLM agents need persistent memory that can track changing facts and provide relevant evidence across sessions. Existing memory systems often store observations as isolated records, summaries, or indexed fragments, which makes evidence aggregation, fact revision, and memory maintenance difficult. We propose Infini Memory, a maintainable text-based persistent memory architecture that treats agent memory as topic-structured documents. Each topic document serves as a semantic unit for collecting related evidence, preserving metadata, and revising facts over time. New observations are first staged in a buffer and periodically consolidated into coherent textual contexts. At inference time, an agentic retrieval procedure lets the LLM read memory through iterative tool calls rather than a single retrieval step. On MemoryAgentBench, Infini Memory achieves 64.7% overall score. Ablations show that topic-structured maintenance and iterative evidence inspection improve complementary aspects of long-term memory use.
Original Article
View Cached Full Text

Cached at: 06/10/26, 06:16 AM

# Infini Memory: Maintainable Topic Documents for Long-Term LLM Agent Memory
Source: [https://arxiv.org/html/2606.10677](https://arxiv.org/html/2606.10677)
Suozhao Ji1Baodong Wu1,∗Zehao Wang2Lei Xia1Qingping Li1 Ruisong Wang1Wenbo Ding2Zhenhua Zhu2Boxun Li1Guohao Dai3,1Yu Wang2,∗ 1Infinigence AI2Tsinghua University3Shanghai Jiaotong University \{jisuozhao, wubaodong, xialei, liqingping, wangruisong, liboxun\}@infini\-ai\.com \{wangzeha24@mails, ding\.wenbo@sz, zhuzhenhua@mail, yu\-wang@mail\}\.tsinghua\.edu\.cn daiguohao@sjtu\.edu\.cn ∗Corresponding authors Code:[https://github\.com/infinigence/Infini\-Memory](https://github.com/infinigence/Infini-Memory)

###### Abstract

Long\-term LLM agents need persistent memory that can track changing facts and provide relevant evidence across sessions\. Existing memory systems often store observations as isolated records, summaries, or indexed fragments, which makes evidence aggregation, fact revision, and memory maintenance difficult\. We proposeInfini Memory, a maintainable text\-based persistent memory architecture that treats agent memory as topic\-structured documents\. Each topic document serves as a semantic unit for collecting related evidence, preserving metadata, and revising facts over time\. New observations are first staged in a buffer and periodically consolidated into coherent textual contexts\. At inference time, an agentic retrieval procedure lets the LLM read memory through iterative tool calls rather than a single retrieval step\. On MemoryAgentBench, Infini Memory achieves 64\.7% overall score\. Ablations show that topic\-structured maintenance and iterative evidence inspection improve complementary aspects of long\-term memory use\.

Infini Memory: Maintainable Topic Documents for Long\-Term LLM Agent Memory

Suozhao Ji1Baodong Wu1,∗Zehao Wang2Lei Xia1Qingping Li1Ruisong Wang1Wenbo Ding2Zhenhua Zhu2Boxun Li1Guohao Dai3,1Yu Wang2,∗1Infinigence AI2Tsinghua University3Shanghai Jiaotong University\{jisuozhao, wubaodong, xialei, liqingping, wangruisong, liboxun\}@infini\-ai\.com\{wangzeha24@mails, ding\.wenbo@sz, zhuzhenhua@mail, yu\-wang@mail\}\.tsinghua\.edu\.cndaiguohao@sjtu\.edu\.cn∗Corresponding authorsCode:[https://github\.com/infinigence/Infini\-Memory](https://github.com/infinigence/Infini-Memory)

## 1Introduction

LLM agents increasingly operate over long horizon across many sessions, but an LLM’s context window only bounds how much input the model can attend to in a single forward pass\. Increasing its length exposes more history to the model, but does not by itself specify which information should be retained, how stale information should be revised, or how related evidence should be organized for future usePackeret al\.\([2023](https://arxiv.org/html/2606.10677#bib.bib13)\); Sumerset al\.\([2024](https://arxiv.org/html/2606.10677#bib.bib17)\)\.

To address this gap, recent agent systems introduce memory mechanisms that retain information beyond the immediate prompt, including prior interactions, tool observations, task facts, and user preferencesYaoet al\.\([2023](https://arxiv.org/html/2606.10677#bib.bib21)\); Parket al\.\([2023](https://arxiv.org/html/2606.10677#bib.bib12)\); Shinnet al\.\([2023](https://arxiv.org/html/2606.10677#bib.bib15)\)\. Memory has been studied as a component of language\-agent architectures and long\-term conversational agentsZhonget al\.\([2024](https://arxiv.org/html/2606.10677#bib.bib22)\); Chhikaraet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib1)\); Xuet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib6)\); Sumerset al\.\([2024](https://arxiv.org/html/2606.10677#bib.bib17)\), with a parallel line of work introducing benchmarks for multi\-turn memory capabilitiesMaharanaet al\.\([2024](https://arxiv.org/html/2606.10677#bib.bib9)\); Wuet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib20)\); Huet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib5)\)\. In this paper, we usepersistent memoryto refer to an external, editable memory state that survives across interaction sessions and can be written, updated, and queried by an agent during inference\.

These memory systems typically maintain information outside the model and retrieve relevant content back into the prompt at inference time\. Existing designs include text memories with summary\-based dialogue memoriesTanet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib25)\), vector retrievalParket al\.\([2023](https://arxiv.org/html/2606.10677#bib.bib12)\); Chhikaraet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib1)\), hierarchical memory managersPackeret al\.\([2023](https://arxiv.org/html/2606.10677#bib.bib13)\); Kanget al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib26)\); Fanget al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib2)\), and knowledge\-graph memory layersRasmussenet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib14)\); Gutiérrezet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib4)\); Shuet al\.\([2026](https://arxiv.org/html/2606.10677#bib.bib16)\)\. These systems demonstrate the usefulness of external memory, but they also exhibit four recurring failure modes \(Figure[1](https://arxiv.org/html/2606.10677#S1.F1)\)\. When memory is represented mainly as independent retrievable items, evidence about the same user, task, or event can be distributed across many small records \(memory fragmentation\)\. When newer observations contradict older ones, append\-style storage can leave both versions available for retrieval \(memory conflict\)\. When long histories are compressed into summaries, temporal order and source cues may be weakened \(compression loss\)\. Standard retrieval methods based on vector similarity, keyword matching, or fixed top\-kkprocedures may return isolated fragments rather than enough evidence for temporally grounded reasoning \(insufficient retrieval\)\. We usefragmentsto refer to such retrieved pieces: evidence that may be relevant to a query but lacks the local context needed to resolve it\.

![Refer to caption](https://arxiv.org/html/2606.10677v1/x1.png)Figure 1:Four recurring challenges in long\-term agent memory maintenance: \(1\) related evidence is scattered across isolated records; \(2\) old and new versions of the same fact coexist without reconciliation; \(3\) summarization loses temporal, source, and contextual cues; \(4\) single\-shot retrieval returns fragments insufficient for multi\-hop reasoning\.These observations motivate treating persistent memory as a lifecycle maintenance problem\. A memory module for long\-term agents should support three coupled operations\. First, it shouldwritedurable information from interactions into the memory state\. Second, it shouldmaintainthat state by grouping related evidence so that facts about the same user, task, or event are no longer scattered, reconciling new observations against contradicted older ones, and preserving temporal and source cues even as content is condensed\. Third, it shouldreadmemory by retrieving evidence with sufficient local context rather than isolated fragments\.

We propose Infini Memory, a maintainable persistent memory architecture that represents external memory as topic\-structured documents\. Each topic document groups related evidence under a shared subject and carries entry\-level metadata that retains temporal and source cues as content is rewritten\. The system separates frequent writes from less frequent structural consolidation: new memory candidates are first appended to a buffer document, then periodically rewritten, split, updated, and merged into topic documents\. At query time, the agent retrieves from this document library and expands local context around matching evidence\. Retrieval over this library can use lexical indexing over plain\-text documents rather than a vector or graph database backend\.

The main contributions of this work are as follows:

- •We introduce a document\-based persistent memory architecture for long\-term LLM agents\. It organizes memory as topic documents and maintains them through buffered writing and periodic consolidation, avoiding a mandatory dependency on vector or graph databases\.
- •We present an agentic retrieval strategy in which the LLM controls a multi\-step search process over structured memory tools\. The strategy supports iterative evidence inspection, local context expansion, and answer\-oriented evidence assembly\.
- •We evaluate Infini Memory on MemoryAgentBench, where its agentic retrieval variant achieves 64\.7% overall, and use controlled variants to analyze maintenance and retrieval choices\.

## 2Related Work

### 2\.1Persistent Memory Representation and Maintenance

Recent work on LLM agents has increasingly treated memory as an external state that must be written, updated, and retrieved across interactions\. Early persistent\-memory systems mainly extend the effective context available to an LLM\. MemGPT introduces virtual context management, in which an agent moves information between limited in\-context memory and external storage through explicit control operationsPackeret al\.\([2023](https://arxiv.org/html/2606.10677#bib.bib13)\)\. MemoryBank stores long\-term user memories and updates them over time with mechanisms inspired by the Ebbinghaus forgetting curveZhonget al\.\([2024](https://arxiv.org/html/2606.10677#bib.bib22)\)\. Mem0 further develops this line by dynamically extracting, consolidating, and retrieving salient information from conversations, with a graph\-based variant for relational structureChhikaraet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib1)\)\. These systems show that persistent memory is useful for long\-term interaction, but they often store memory as compact entries, summaries, or indexed fragments, which can make later revision and evidence reconstruction difficult when information is distributed across many interactions\.

A second line of work focuses on structured memory organization\. Graph\-based and associative approaches, such as HippoRAG\-v2, use non\-parametric memory structures to support factual, associative, and sense\-making retrievalGutiérrezet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib4)\)\. REMem represents episodic memory with a hybrid graph over time\-aware gists and facts, targeting recollection and reasoning over event historiesShuet al\.\([2026](https://arxiv.org/html/2606.10677#bib.bib16)\)\. A\-MEM proposes an agentic memory system inspired by the Zettelkasten method, where each memory is stored as an atomic note with structured attributes and dynamically generated links to related memoriesXuet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib6)\)\. These methods improve memory organization beyond flat retrieval, but they often rely on atomic notes, embeddings, or graph structures as the main substrate\.

LightMem is also relevant because it separates memory processing into stages: sensory filtering, topic\-aware short\-term consolidation, and offline long\-term updateFanget al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib2)\)\. Infini Memory shares the motivation of reducing online maintenance overhead, but differs in emphasizing plain\-text topic documents and explicit consolidation operations\. This makes its design more aligned with systems where interpretability, editable state, and infrastructure simplicity are important\.

### 2\.2Retrieval and Evaluation for Long\-Term Memory Agents

Retrieval is a key difficulty for long\-term memory because relevant evidence may be scattered across many interactions\. Standard retrieval pipelines usually rely on vector similarity, keyword matching, or a fixed top\-kkprocedure\. These methods are efficient, but they may return isolated fragments rather than enough evidence for temporally grounded reasoning or contradiction resolution\. Recent systems therefore move toward more active retrieval procedures\. REMem, for example, uses an agentic retriever with curated tools to iteratively retrieve and reason over episodic memory graphsShuet al\.\([2026](https://arxiv.org/html/2606.10677#bib.bib16)\)\. A\-MEM introduces agency mainly in memory construction and organization, dynamically creating notes, attributes, and links when new memories arriveXuet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib6)\)\. Infini Memory extends this direction to the read path over structured text memory: the LLM can iteratively choose memory tools, inspect intermediate results, expand local context, and assemble evidence before answering\.

Benchmarks for long\-term memory have also shifted from static long\-context understanding toward interactive memory evaluation\. LoCoMo evaluates very long\-term conversational memory over multi\-session dialogues with question answering, event summarization, and multimodal dialogue generation tasksMaharanaet al\.\([2024](https://arxiv.org/html/2606.10677#bib.bib9)\)\. LongMemEval focuses on long\-term interactive memory for chat assistants and evaluates information extraction, multi\-session reasoning, temporal reasoning, knowledge updates, and abstentionWuet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib20)\)\. These benchmarks are useful for testing long\-context recall and temporally grounded dialogue understanding, but they do not fully isolate the operational abilities required by memory agents that incrementally store, revise, and retrieve information\.

MemoryAgentBench is more directly aligned with the goals of this work\. It evaluates memory agents through incremental multi\-turn interactions and identifies four core competencies: accurate retrieval, test\-time learning, long\-range understanding, and selective forgettingHuet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib5)\)\. These competencies match the main design questions addressed by Infini Memory: whether the system can retrieve relevant evidence, acquire new information during deployment, integrate long\-range context, and revise outdated memory\. We therefore use MemoryAgentBench as the main evaluation setting, while interpreting results as benchmark\-level evidence rather than as a complete characterization of all long\-term memory use cases\.

## 3Memory Design

### 3\.1Infini Memory Design Overview

Infini Memory represents persistent memory as a library of*topic documents*, where a topic denotes a maintenance scope that groups entries handled together by later operations \(routing, splitting, merging\)\. This scope is defined by how the memory will be used in future interactions\. For example, entries about a stable user preference or an ongoing project may form a topic because they provide context for the same class of future questions and updates\. This design avoids two less desirable extremes\. \(1\) If memory is stored as isolated records, related evidence may be separated and later retrieval may return fragments without enough context; \(2\) if all memory is stored in a single chronological log, later operations may need to scan or rewrite unrelated history\. We discuss the alternatives we rejected \(pure vector store, pure knowledge graph, no buffer\) in Appendix[A\.2](https://arxiv.org/html/2606.10677#A1.SS2)\. Topic documents provide a bounded unit that can preserve related entries and their metadata under local headings, while keeping each document focused enough for rewriting, splitting, and merging\.

Based on topic documents, Infini Memory organizes its memory pipeline around representation, writing, consolidation, and retrieval\. Each topic document stores a summary, a hierarchical body, and entry\-level metadata to organize related evidence within a bounded document scope \(Section[3\.2](https://arxiv.org/html/2606.10677#S3.SS2)\)\. New memories first enter a short\-term buffer namedCURRENT, so frequent writes do not repeatedly rewrite the topic library; consolidation is triggered after enough related evidence accumulates \(Section[3\.3](https://arxiv.org/html/2606.10677#S3.SS3)\)\. At inference time, the LLM iteratively searches, inspects, and expands context from the maintained topic library to recover evidence beyond a single retrieval result \(Section[3\.4](https://arxiv.org/html/2606.10677#S3.SS4)\)\. The structured text backend keeps the default system self\-contained and leaves room for optional retrieval or maintenance extensions \(Section[3\.5](https://arxiv.org/html/2606.10677#S3.SS5)\)\.

![Refer to caption](https://arxiv.org/html/2606.10677v1/x2.png)Figure 2:Topic document format used by Infini Memory\.
### 3\.2Topic Document Representation

Infini Memory stores persistent memory as topic documents \(Figure[2](https://arxiv.org/html/2606.10677#S3.F2)\), where each document groups related facts, preferences, and event cues under a shared topic\. A document contains a metadata header,\{id, summary, token\_count, created\_time, update\_log, aux\}, and a hierarchical body\. The body uses topic and subtopic headings to organize unordered\-list memory entries, each prefixed with a parsable signature<seq=…, time=…, source=…\>\. This representation preserves local context while keeping temporal order, provenance, and later revision operations explicit\.

##### Entry\-level metadata\.

Each memory entry carries at least a sequence numberseq, which increases monotonically with each write call\. When temporal information is available from the interaction content, atimefield is recorded\. When information is extracted from the model response, asource=AItag can be attached\. The metadata signature can be extended with domain\-specific fields such as entity type, namespace, machine identifier, IP address, or sensitivity level\.

This storage format has three practical benefits\. First, the document summary and body can be refreshed together when a topic document is maintained, keeping retrieval metadata aligned with the underlying evidence\. Second, temporal and source cues move with each entry when content is modified, e\.g\., rewritten, split, or merged\. Third, each memory entry carries metadata that makes later revision explicit\. When new evidence updates earlier content,seqandtimeprovide ordering cues for superseding outdated entries or applying explicit deletion rules\.

### 3\.3Buffered Writing and Consolidation

The writing and consolidation pipeline \(Figure[3](https://arxiv.org/html/2606.10677#S3.F3); full procedure in Algorithm[1](https://arxiv.org/html/2606.10677#alg1)\) is designed around the mismatch between the frequency of memory extraction and the scope of memory maintenance\. Memory extraction may happen after every interaction, but consolidation should not\. Updating the topic library after each extracted entry would require repeated topic routing, contradiction checking, and document rewriting\. Such eager maintenance may also introduce unstable edits before enough related evidence is available\. Infini Memory therefore introducesCURRENTas a buffer for recent entries\.

TheCURRENTbuffer collects extracted memories in append form\. Appending to this buffer does not require scanning or rewriting existing topic documents\. More importantly, the buffer preserves the short\-range coherence of recent interactions\. Several adjacent turns often describe the same task, correct the same fact, or refine the same preference\. Keeping them together before consolidation allows the system to resolve local redundancy and contradictions before they enter the topic library\.

The buffer is flushed when it reaches a token threshold or remains active for a predefined time window:

flush​\(C\)=\(\|C\|≥τtok\)∨\(Δ​t​\(C\)≥τtime\)\.\\mathrm\{flush\}\(C\)=\\bigl\(\|C\|\\geq\\tau\_\{\\mathrm\{tok\}\}\\bigr\)\\lor\\bigl\(\\Delta t\(C\)\\geq\\tau\_\{\\mathrm\{time\}\}\\bigr\)\.\(1\)whereCCdenotes the current buffer,τtok\\tau\_\{\\mathrm\{tok\}\}is a token threshold, andτtime\\tau\_\{\\mathrm\{time\}\}is a time threshold\.

When the buffer is flushed, the system rewritesCURRENTintoREWRITE\_CURRENT\. This intermediate draft is not a persistent memory store\. It is a normalized view of the recent buffer, created to make library update easier\. The rewrite step groups locally related entries, removes redundant statements, preserves useful metadata, and marks possible updates to earlier facts\. For example, several adjacent entries may be merged into one statement with a sequence range, while a correction may be marked as superseding an earlier entry\. The full prompt invariants enforced at this stage are listed in Appendix[A\.1](https://arxiv.org/html/2606.10677#A1.SS1)\.

The normalized draft is then routed into the topic library\. For each block inREWRITE\_CURRENT, the planner decides whether it should update an existing topic document or create a new one\. If the block extends an existing topic, it is inserted into the relevant document region\. If it changes an earlier fact, the planner records the update relation and rewrites the affected local context\. If it does not fit any existing maintenance scope, a new topic document is created\. This step combines topic assignment and revision, because the correct target document is the one in which the block can be maintained together with related evidence\. This procedure is given in Algorithm[2](https://arxiv.org/html/2606.10677#alg2)\.

After the update is applied,CURRENTis cleared for the next writing interval\. Recent buffer content remains available to the retrieval module before consolidation, so newly written information can still be used in answers\. This avoids a gap between extraction and retrievability\.

The topic library is periodically updated through split and merge operations\. Overgrown documents are split to reduce overly broad local context, while fragmented documents are merged when they describe the same maintenance scope\. After each structural update, summaries and metadata are refreshed to support future routing and retrieval\.

![Refer to caption](https://arxiv.org/html/2606.10677v1/x3.png)Figure 3:Memory writing and consolidation pipeline\.
### 3\.4Agentic Retrieval over Topic Documents

The retrieval module is responsible for turning the topic library into evidence for answer generation\. Infini Memory supports two retrieval variants: a hybrid reader \(Infini Memory\-H\) that combines LLM\-based summary selection with BM25 partition retrieval \(Figure[4](https://arxiv.org/html/2606.10677#S3.F4)\), and an agentic reader \(Infini Memory\-A\) in which the LLM controls a multi\-step search process over memory tools \(Figure[5](https://arxiv.org/html/2606.10677#S3.F5)\)\. In the agentic variant, the model selects which tool to call, inspects intermediate results, expands local context when needed, and decides when the collected evidence is sufficient\. It is a tool\-guided retrieval workflow built on top of topic documents\.

Long\-term memory questions often require more than isolated snippets\. They may depend on related events, updated facts, or surrounding context\. A single\-shot retrieval can miss these connections\. With topic documents, a matched entry can be expanded into its local block, where temporal and source metadata help the system identify the relevant evidence\.

At the start of retrieval, the system exposes a document catalog and a set of memory tools\. The catalog contains document identifiers and summaries\. For small libraries, the catalog can be provided directly\. For larger libraries, the agent can inspect the catalog through paging or search\. The default tools include global lexical search, document\-local pattern search, catalog inspection, and line\-range reading\. These tools correspond to different retrieval behaviors: broad search finds candidate regions, local search verifies precise matches, and line\-range reading recovers the context around evidence\.

During retrieval, the agent alternates between tool calls and evidence inspection\. Early steps usually identify candidate documents or headings\. Later steps read local regions and check whether the evidence supports the query\. The loop stops when the agent returns a stop decision, reaches a maximum number of iterations, reaches an evidence budget, or fails to obtain new useful evidence\. These limits are important because agentic retrieval increases test\-time computation compared with single\-shot retrieval\. The full retrieval loop, including the BM25 fallback path, is given in Algorithm[3](https://arxiv.org/html/2606.10677#alg3); the prompt that drives the per\-step search behavior appears in Appendix[A\.1](https://arxiv.org/html/2606.10677#A1.SS1)\.

The final evidence set may contain document\-level selections, heading\-level blocks, or expanded line ranges\. Snippet\-level results are expanded to the nearest coherent heading block when possible\. The final context also includes recent entries fromCURRENT, so unconsolidated memories remain accessible\. If the agent returns no evidence or too little evidence, the system runs a conservative lexical fallback over the topic library\. This fallback is used as a recall guard and does not replace the agentic retrieval policy\.

![Refer to caption](https://arxiv.org/html/2606.10677v1/x4.png)Figure 4:Hybrid retrieval variant \(LLM Summary \+ BM25 Partitions\)\. The LLM selects candidate documents by summary, and BM25 supplements with lexically matched partitions\.![Refer to caption](https://arxiv.org/html/2606.10677v1/x5.png)Figure 5:Agentic retrieval variant\. The LLM agent iteratively calls memory tools \(grep,grep\_doc,search,list\_docs,read\_lines\) to search, verify, and expand evidence across topic documents and theCURRENTbuffer before generating the final answer\.
### 3\.5Deployment and Extensibility

Infini Memory uses structured text as the default memory carrier\. This choice keeps the default backend simple because the system can operate with ordinary document storage, lexical indexing, and deterministic file inspection tools\. It does not require configuring a vector database, graph database, or external memory service before the system can run\.

This design should be interpreted as backend\-light rather than computation\-free\. Agentic retrieval may use more tool calls or LLM tokens than a single retrieval step\. Periodic consolidation also introduces maintenance cost\. The intended benefit is that the default memory state remains readable, editable, and portable, while more specialized retrieval backends can be added when needed\.

The same abstraction can support domain\-specific extensions\. The metadata area of each memory entry can store namespaces, entity schemas, access rules, or retention policies\. Additional tools can expose vector search, graph traversal, database lookup, or permission checks\. These extensions can be integrated as retrieval tools or maintenance rules while preserving topic documents as the shared memory state\.

Note\.AR = Accurate Retrieval; TTL = Test\-Time Learning; LRU = Long\-Range Understanding; SF = Selective Forgetting\. Sub\-datasets \(scores are accuracy unless otherwise noted\): SH\-QA / MH\-QA = single\-/multi\-hop document QA; LME = LongMemEval \(S⋆\), a reconstructed multi\-session dialogue variant; Event = EventQA, temporal\-event reasoning over long narratives; MCC = multi\-class classification; Rec\. = movie recommendation \(Recall@5\); Summ\. = novel summarization \(Fluency×\\timesF1\); DetQA = detective reasoning QA; FC\-SH / FC\-MH = FactConsolidation single\-/multi\-hop selective forgetting\. Bold marks the best value in each column\.

Table 1:Full MemoryAgentBench results\. All methods usegpt\-5\-minias the backbone model with an input chunk size of 4096 tokens\.

## 4Experiments

### 4\.1Experimental Setup

##### Benchmark and Models\.

We evaluate our method on MemoryAgentBenchHuet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib5)\), a benchmark for external memory mechanisms across four capabilities: Accurate Retrieval \(AR\) for factual recall from long histories, Test\-Time Learning \(TTL\) for in\-context rule acquisition, Long\-Range Understanding \(LRU\) for extended narrative comprehension, and Selective Forgetting \(SF\) for updating outdated information\.

We usegpt\-5\-minias the base model\. System outputs are evaluated with an LLM\-as\-Judge protocol usinggpt\-5, where the judge assigns a binary correctness judgment based on the question, reference answer, and model output\.

##### Baselines\.

We compare Infini Memory against seven memory baselines\. We use the official integrations supplied with MemoryAgentBench for RAPTORSarthiet al\.\([2024](https://arxiv.org/html/2606.10677#bib.bib24)\), MemoRAGQianet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib23)\), HippoRAG\-v2Gutiérrezet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib4)\), Mem0Chhikaraet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib1)\), MemGPTPackeret al\.\([2023](https://arxiv.org/html/2606.10677#bib.bib13)\), LightMemFanget al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib2)\), and REMemShuet al\.\([2026](https://arxiv.org/html/2606.10677#bib.bib16)\)\. All baselines follow the same 4096\-token chunking strategy as the benchmark’s standard configuration\. Each baseline retains its own retrieval and indexing logic under its official integration\.

##### Memory and retrieval configuration\.

For memory maintenance, Infini Memory triggers aCURRENTbuffer rewrite upon reaching either 5000 tokens or a time threshold; it splits topic documents exceeding 5000 tokens and merges documents under 1000 tokens based on summary similarity\. During retrieval, the LLM receives the query alongside anid \+ summarycatalog\. The agent can invoke memory tools for up to seven rounds—covering corpus search, regex matching, catalog browsing, and line reading\. If agentic retrieval yields insufficient evidence, a BM25\-based retriever supplements non\-duplicate candidates\.

### 4\.2Main Benchmark Results

As shown in Table[1](https://arxiv.org/html/2606.10677#S3.T1),Infini Memory\-Aachieves the highest overall score of 64\.7%, improving over the strongest baseline by 19\.2% points and leading on all four MemoryAgentBench capabilities, with gains of \+12\.5% on AR, \+4\.4% on TTL, \+25\.4% on LRU, and \+26\.5% on SF\. Compared with the hybrid variant Infini Memory\-H \(61\.3%\), agentic retrieval adds 3\.4% points on average; the largest gain falls on Selective Forgetting \(\+7\.0%\), where explicit temporal and source cues help the reader track revised facts\.

The pattern within Long\-Range Understanding is more nuanced: the agentic reader leads on summary\-oriented questions while the hybrid reader leads on detailed QA, reflecting a trade\-off between targeted inspection and broader partition\-level coverage\.

##### Discussion: multi\-hop selective forgetting \(FC\-MH\)\.

The 81\.0%/35\.0% gap between FC\-SH and FC\-MH reveals a structural limit of write\-time consolidation\. Single\-hop forgetting is settled at write time: the rewrite stage applies recency overrides within a topic document, so the latest version supersedes earlier ones before retrieval\. Multi\-hop forgetting cannot be settled this way, because a reasoning chain often spans several topic documents updated at different times\. No single rewrite pass enforces cross\-document consistency, and missing any hop causes cascade failure\. This difficulty is intrinsic to the task: the MemoryAgentBench authors report that even o4\-mini drops from 80\.0% to 14\.0% on FC\-MH as context grows from 6K to 32K tokensHuet al\.\([2025](https://arxiv.org/html/2606.10677#bib.bib5)\)\.

### 4\.3Ablation Experiments

VariantMaint\.RetrievalAcc\.Δ\\DeltaInfini Memory\-A \(full\)✓Agentic79\.3—Retrieval ablation \(maintenance fixed\)Infini Memory\-H \(hybrid\)✓Summary\+BM2576\.0−\-3\.3Summary\-only✓Summary41\.7−\-37\.6Maintenance ablation \(retrieval fixed at hybrid\)w/o Split & MergeSummary\+BM2569\.3−\-10\.0Table 2:Component ablation on LongMemEval \(S⋆\)\.Maint\.: plan\-driven split/update and small\-document merging\.Δ\\Deltais the absolute accuracy drop relative to Infini Memory\-A\.

Table 3:Split\-threshold sensitivity \(in tokens\) on LongMemEval \(S⋆\) with the read path fixed at Infini Memory\-H\.†denotes our default;Δ\\Deltais the accuracy gap to the default\.

#### 4\.3\.1Structural Maintenance Ablation

We hold the read path fixed at the hybrid reader \(Infini Memory\-H, Section[3\.4](https://arxiv.org/html/2606.10677#S3.SS4)\) and disable structural maintenance: no plan\-driven split/update and no small\-document merging, while retaining append\-only writes and theCURRENTrewrite\. Accuracy on LongMemEval \(S⋆\) falls from 76\.0% to 69\.3%, a drop of 6\.7 points \(Table[2](https://arxiv.org/html/2606.10677#S4.T2)\)\.

![Refer to caption](https://arxiv.org/html/2606.10677v1/x6.png)

![Refer to caption](https://arxiv.org/html/2606.10677v1/x7.png)

Figure 6:Ablation results after removing structural maintenance\. Top: overall accuracy on LongMemEval \(S⋆\)\. Bottom: accuracy by question type\.Figure[6](https://arxiv.org/html/2606.10677#S4.F6)breaks this gap down by question type\. The shortfall concentrates on knowledge\-update and multi\-session questions, both of which depend on evidence drawn from distant turns and reconciled at read time\. Without split and merge, related facts remain in whichever document they were first appended to, and superseded entries continue to coexist with their replacements, so the reader can no longer assemble a coherent and up\-to\-date answer across the relevant turns\. Question types that resolve within a single session degrade much less, since theCURRENTrewrite alone already removes local duplicates and contradictions when the supporting evidence is nearby\.

#### 4\.3\.2Retrieval Strategy Ablation

We compare three read paths over the same maintained memory\. Single\-shot summary selection reaches only 41\.7%: document summaries alone miss fine\-grained facts such as exact values, timestamps, and entity mentions\. Adding BM25 partition retrieval \(Infini Memory\-H\) lifts accuracy to 76\.0%, recovering most of the gap\. The agentic reader \(Infini Memory\-A\) reaches 79\.3% \(Figure[7](https://arxiv.org/html/2606.10677#S4.F7)\) by issuing follow\-up searches, inspecting local context, and combining complementary evidence spans before answering\.

![Refer to caption](https://arxiv.org/html/2606.10677v1/x8.png)Figure 7:Retrieval\-strategy ablation on LongMemEval \(S⋆\)\. Hybrid retrieval substantially improves over single\-shot summary selection; the agentic reader provides an additional gain \(Table[2](https://arxiv.org/html/2606.10677#S4.T2)\)\.
#### 4\.3\.3Split\-Threshold Sensitivity

We sweep the document split threshold, defined as the minimum token count above which a topic document becomes a split candidate, on LongMemEval \(S⋆\) with the read path fixed at Infini Memory\-H\.

Table[3](https://arxiv.org/html/2606.10677#S4.T3)shows that the accuracy curve is asymmetric\. Lowering the threshold to≥\\geq3000 or≥\\geq1000 costs only 1\.7 and 2\.0 points, even though the library expands to 762 documents at the most aggressive setting\. The partitions stay topically clustered at this granularity, so over\-fragmentation is recoverable: the hybrid reader still reaches most of the relevant evidence through summary plus BM25 matching\. Raising the threshold to≥\\geq7000 or≥\\geq9000, by contrast, costs 5\.3 and 6\.3 points while the document count moves only modestly from 322 to 280 and then 255\. The sharp accuracy drop between≥\\geq5000 and≥\\geq7000 therefore reflects the cost of letting documents grow past a single coherent topic rather than a count\-based artefact: a small number of oversized documents accumulate content from multiple chunks and begin to mix unrelated subtopics, which degrades both topic routing during consolidation and partition\-level retrieval at read time\.

The default≥\\geq5000 setting sits just above the 4096\-token chunking budget \(Section[4\.1](https://arxiv.org/html/2606.10677#S4.SS1)\)\. With this margin, only documents that have absorbed content from more than one chunk become split candidates, so splitting acts as a safety valve for that minority rather than reshaping the bulk of the library\. Figure[8](https://arxiv.org/html/2606.10677#S4.F8)shows the resulting token\-count distributions: at≥\\geq1000, nearly all documents collapse below 1000 tokens; at≥\\geq9000, a long tail extends beyond 7000 tokens; at≥\\geq5000, the bulk stays below 4000 tokens with only a thin tail approaching the threshold\.

![Refer to caption](https://arxiv.org/html/2606.10677v1/x9.png)Figure 8:Distribution of document token counts under different split thresholds\.Overall, the ablations attribute the gains to two complementary sources\. Holding the hybrid reader fixed, removing structural maintenance costs 6\.7 points \(76\.0→\\rightarrow69\.3\), while holding the maintained memory fixed, upgrading from Infini Memory\-H to Infini Memory\-A adds 3\.3 points \(76\.0→\\rightarrow79\.3\)\. Structural maintenance therefore contributes more than the retrieval upgrade in this setting, and neither component is sufficient on its own\.

## 5Conclusion

We presented Infini Memory, a persistent memory architecture that represents agent memory as topic\-structured text documents and maintains them through buffered writing, periodic consolidation, and structural maintenance\. At inference, the LLM iteratively queries memory through tool calls, keeping the memory state inspectable and editable across long\-term interaction\. On MemoryAgentBench, its agentic retrieval variant achieves 64\.7% overall and 81\.2% on Accurate Retrieval under our evaluation protocol, with notable gains on Factual Recall, Test\-Time Learning, and Selective Forgetting; the hybrid summary\-plus\-BM25 reader remains useful for long\-range detailed QA\. These results suggest that persistent agent memory quality depends on both how memory is maintained and how evidence is retrieved\.

## References

- Mem0: building production\-ready AI agents with scalable long\-term memory\.External Links:2504\.19413Cited by:[§1](https://arxiv.org/html/2606.10677#S1.p2.1),[§1](https://arxiv.org/html/2606.10677#S1.p3.1),[§2\.1](https://arxiv.org/html/2606.10677#S2.SS1.p1.1),[§4\.1](https://arxiv.org/html/2606.10677#S4.SS1.SSS0.Px2.p1.1)\.
- J\. Fang, X\. Deng, H\. Xu, Z\. Jiang, Y\. Tang, Z\. Xu, S\. Deng, Y\. Yao, M\. Wang, S\. Qiao, H\. Chen, and N\. Zhang \(2025\)LightMem: lightweight and efficient memory\-augmented generation\.Note:To appear in ICLR 2026External Links:2510\.18866Cited by:[§1](https://arxiv.org/html/2606.10677#S1.p3.1),[§2\.1](https://arxiv.org/html/2606.10677#S2.SS1.p3.1),[§4\.1](https://arxiv.org/html/2606.10677#S4.SS1.SSS0.Px2.p1.1)\.
- B\. J\. Gutiérrez, Y\. Shu, W\. Qi, S\. Zhou, and Y\. Su \(2025\)From RAG to memory: non\-parametric continual learning for large language models\.InProceedings of the 42nd International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.267,pp\. 21497–21515\.External Links:2502\.14802,[Link](https://proceedings.mlr.press/v267/gutierrez25a.html)Cited by:[§1](https://arxiv.org/html/2606.10677#S1.p3.1),[§2\.1](https://arxiv.org/html/2606.10677#S2.SS1.p2.1),[§4\.1](https://arxiv.org/html/2606.10677#S4.SS1.SSS0.Px2.p1.1)\.
- Y\. Hu, Y\. Wang, and J\. McAuley \(2025\)Evaluating memory in LLM agents via incremental multi\-turn interactions\.External Links:2507\.05257Cited by:[§1](https://arxiv.org/html/2606.10677#S1.p2.1),[§2\.2](https://arxiv.org/html/2606.10677#S2.SS2.p3.1),[§4\.1](https://arxiv.org/html/2606.10677#S4.SS1.SSS0.Px1.p1.1),[§4\.2](https://arxiv.org/html/2606.10677#S4.SS2.SSS0.Px1.p1.1)\.
- J\. Kang, M\. Ji, Z\. Zhao, and T\. Bai \(2025\)Memory OS of AI agent\.InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing,pp\. 25961–25970\.External Links:[Document](https://dx.doi.org/10.18653/v1/2025.emnlp-main.1318)Cited by:[§1](https://arxiv.org/html/2606.10677#S1.p3.1)\.
- A\. Maharana, D\. Lee, S\. Tulyakov, M\. Bansal, F\. Barbieri, and Y\. Fang \(2024\)Evaluating very long\-term conversational memory of LLM agents\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 13851–13870\.External Links:2402\.17753,[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.747)Cited by:[§1](https://arxiv.org/html/2606.10677#S1.p2.1),[§2\.2](https://arxiv.org/html/2606.10677#S2.SS2.p2.1)\.
- C\. Packer, S\. Wooders, K\. Lin, V\. Fang, S\. G\. Patil, I\. Stoica, and J\. E\. Gonzalez \(2023\)MemGPT: towards LLMs as operating systems\.External Links:2310\.08560Cited by:[§1](https://arxiv.org/html/2606.10677#S1.p1.1),[§1](https://arxiv.org/html/2606.10677#S1.p3.1),[§2\.1](https://arxiv.org/html/2606.10677#S2.SS1.p1.1),[§4\.1](https://arxiv.org/html/2606.10677#S4.SS1.SSS0.Px2.p1.1)\.
- J\. S\. Park, J\. C\. O’Brien, C\. J\. Cai, M\. R\. Morris, P\. Liang, and M\. S\. Bernstein \(2023\)Generative agents: interactive simulacra of human behavior\.InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology,External Links:[Document](https://dx.doi.org/10.1145/3586183.3606763)Cited by:[§1](https://arxiv.org/html/2606.10677#S1.p2.1),[§1](https://arxiv.org/html/2606.10677#S1.p3.1)\.
- H\. Qian, Z\. Liu, P\. Zhang, K\. Mao, D\. Lian, Z\. Dou, and T\. Huang \(2025\)MemoRAG: boosting long context processing with global memory\-enhanced retrieval augmentation\.InProceedings of the ACM on Web Conference 2025,pp\. 2366–2377\.External Links:[Document](https://dx.doi.org/10.1145/3696410.3714805)Cited by:[§4\.1](https://arxiv.org/html/2606.10677#S4.SS1.SSS0.Px2.p1.1)\.
- P\. Rasmussen, P\. Paliychuk, T\. Beauvais, J\. Ryan, and D\. Chalef \(2025\)Zep: a temporal knowledge graph architecture for agent memory\.External Links:2501\.13956Cited by:[§1](https://arxiv.org/html/2606.10677#S1.p3.1)\.
- P\. Sarthi, S\. Abdullah, A\. Tuli, S\. Khanna, A\. Goldie, and C\. Manning \(2024\)RAPTOR: recursive abstractive processing for tree\-organized retrieval\.InInternational Conference on Learning Representations,External Links:2401\.18059,[Link](https://openreview.net/forum?id=GN921JHCRw)Cited by:[§4\.1](https://arxiv.org/html/2606.10677#S4.SS1.SSS0.Px2.p1.1)\.
- N\. Shinn, F\. Cassano, A\. Gopinath, K\. Narasimhan, and S\. Yao \(2023\)Reflexion: language agents with verbal reinforcement learning\.InAdvances in Neural Information Processing Systems,Vol\.36\.External Links:2303\.11366,[Link](https://proceedings.neurips.cc/paper_files/paper/2023/hash/1b44b878bb782e6954cd888628510e90-Abstract-Conference.html)Cited by:[§1](https://arxiv.org/html/2606.10677#S1.p2.1)\.
- Y\. Shu, S\. P\. Jonnalagedda, X\. Gao, B\. J\. Gutiérrez, W\. Qi, K\. Das, H\. Sun, and Y\. Su \(2026\)REMem: reasoning with episodic memory in language agent\.Note:To appear in ICLR 2026External Links:2602\.13530Cited by:[§1](https://arxiv.org/html/2606.10677#S1.p3.1),[§2\.1](https://arxiv.org/html/2606.10677#S2.SS1.p2.1),[§2\.2](https://arxiv.org/html/2606.10677#S2.SS2.p1.1),[§4\.1](https://arxiv.org/html/2606.10677#S4.SS1.SSS0.Px2.p1.1)\.
- T\. R\. Sumers, S\. Yao, K\. Narasimhan, and T\. L\. Griffiths \(2024\)Cognitive architectures for language agents\.Transactions on Machine Learning Research\.External Links:2309\.02427,[Link](https://openreview.net/forum?id=1i6ZCvflQJ)Cited by:[§1](https://arxiv.org/html/2606.10677#S1.p1.1),[§1](https://arxiv.org/html/2606.10677#S1.p2.1)\.
- Z\. Tan, J\. Yan, I\. Hsu, R\. Han, Z\. Wang, L\. T\. Le, Y\. Song, Y\. Chen, H\. Palangi, G\. Lee, A\. R\. Iyer, T\. Chen, H\. Liu, C\. Lee, and T\. Pfister \(2025\)In prospect and retrospect: reflective memory management for long\-term personalized dialogue agents\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 8416–8439\.External Links:[Document](https://dx.doi.org/10.18653/v1/2025.acl-long.413)Cited by:[§1](https://arxiv.org/html/2606.10677#S1.p3.1)\.
- D\. Wu, H\. Wang, W\. Yu, Y\. Zhang, K\. Chang, and D\. Yu \(2025\)LongMemEval: benchmarking chat assistants on long\-term interactive memory\.InInternational Conference on Learning Representations,External Links:2410\.10813,[Link](https://openreview.net/forum?id=pZiyCaVuti)Cited by:[§1](https://arxiv.org/html/2606.10677#S1.p2.1),[§2\.2](https://arxiv.org/html/2606.10677#S2.SS2.p2.1)\.
- W\. Xu, Z\. Liang, K\. Mei, H\. Gao, J\. Tan, and Y\. Zhang \(2025\)A\-MEM: agentic memory for LLM agents\.InAdvances in Neural Information Processing Systems,External Links:2502\.12110,[Link](https://proceedings.neurips.cc/paper_files/paper/2025/hash/19909c36f51abc4856b4560aff3d36d6-Abstract-Conference.html)Cited by:[§1](https://arxiv.org/html/2606.10677#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.10677#S2.SS1.p2.1),[§2\.2](https://arxiv.org/html/2606.10677#S2.SS2.p1.1)\.
- S\. Yao, J\. Zhao, D\. Yu, N\. Du, I\. Shafran, K\. Narasimhan, and Y\. Cao \(2023\)ReAct: synergizing reasoning and acting in language models\.InInternational Conference on Learning Representations,External Links:2210\.03629,[Link](https://openreview.net/forum?id=WE_vluYUL-X)Cited by:[§1](https://arxiv.org/html/2606.10677#S1.p2.1)\.
- W\. Zhong, L\. Guo, Q\. Gao, H\. Ye, and Y\. Wang \(2024\)MemoryBank: enhancing large language models with long\-term memory\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.38,pp\. 19724–19731\.External Links:[Document](https://dx.doi.org/10.1609/aaai.v38i17.29946)Cited by:[§1](https://arxiv.org/html/2606.10677#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.10677#S2.SS1.p1.1)\.

## Appendix APrompt Templates and Algorithms

The released reference implementation contains functionally equivalent prompts in Chinese\. For reproducibility, this appendix provides normalized English templates that preserve the same control logic, metadata constraints, and output schemas used by the system\.

### A\.1Normalized English Prompt Templates

##### Shared prompt invariants\.

All write\-side prompts follow four invariants\. First, they must remain fully faithful to the source text and may reorganize content but must not invent new facts\. Second, they must preserve entry\-level metadata such asseq,time,source, and any additional fields such aslabel\. Third, they must resolve exact duplicates and contradictions through explicit recency rules rather than free\-form summarization\. Fourth, they must return machine\-parseable output only, with no explanatory prose outside the requested format\.

Prompt A: Memory ExtractionYou are a memory curator for a long\-term interactive agent\. Extract only durable, reusable memory candidates from the interaction\. Retain information such as: \-\- user facts, preferences, plans, constraints, commitments, and corrections \-\- important tool or environment observations that may matter later \-\- assistant restatements only when they explicitly confirm user information; mark such items with source=AI \-\- explicit metadata fields already present in the text, such as label=… Do not retain: \-\- greetings, small talk, generic acknowledgments, or filler politeness \-\- transient reasoning, unsupported guesses, or speculative interpretation \-\- duplicate paraphrases that do not change the underlying state Hard constraints: \-\- stay fully faithful to the source text \-\- do not add explanations, comments, or fact checking \-\- preserve metadata fields exactly when they already appear in the input \-\- keep the literal placeholder @@SEQ@@ unchanged Output format: 1\. Return Markdown only\. 2\. Begin with: \-\-\- summary: <topic keywords; concise factual summary, <= \{summary\_length\} tokens\> \-\-\- 3\. Organize the body with first\-level headings\. 4\. Encode each memory item as: \- <seq=@@SEQ@@,time=TIMESTAMP\[,label=…\]\[,source=AI\]\> fact 5\. If no durable memory is present, return a minimal empty\-memory document\.

Prompt B: CURRENT RewriteYou will receive the append\-only CURRENT document\. Rewrite it into a clean, topic\-structured Markdown draft\. Requirements: \-\- group semantically related items under first\-level headings \-\- preserve every seq / time / source / label field exactly \-\- remove exact duplicates; if duplicates differ only by seq, keep the newer one \-\- resolve contradictory facts by keeping the more recent item \-\- if the same fact appears once as user input and once as source=AI, prefer the user\-sourced version \-\- do not introduce new facts, commentary, or interpretation Return Markdown only, including YAML frontmatter with an updated summary\.

Prompt C: Update PlanningYou will receive: 1\. NEW\_CONTENT: a rewritten Markdown draft 2\. DOCS: the current document library as \(id, summary\) pairs Task: \-\- decide which existing documents should be updated \-\- decide which content should become new topic documents Planning rules: \-\- preserve seq / time / source / label fields exactly \-\- never invent document ids; updates must target ids already listed in DOCS \-\- split content by topic when necessary \-\- keep each produced document within the token budget \{markdown\_length\} \-\- ensure no content overlap across outputs Return JSON only: \{ "updates": \[\{"id": string, "new\_content": string\}\], "new\_docs": \[\{"title": string, "content": string\}\] \}

Prompt D: Document RewriteYou will receive: 1\. OLD\_DOC: an existing Markdown document 2\. DELTA: new content that should be merged into it Task: \-\- produce the updated full document \-\- reorganize headings when needed \-\- preserve all entry metadata exactly \-\- deduplicate repeated facts \-\- resolve contradictions by recency, again preferring user\-sourced facts when two otherwise identical items differ only by source Return Markdown only, including YAML frontmatter with a refreshed summary\.

Prompt E: Agentic RetrievalYou will receive: 1\. a user query 2\. the document list as \(id, summary\) pairs You are a memory\-search agent\. Iteratively inspect the topic document library before stopping\. You may either: \-\- call tools to search the corpus, inspect a single document, browse more document ids, or read bounded line ranges \-\- or finish once you have enough evidence Prefer using broad lexical search and exact\-pattern search early, then use document\-local inspection to verify hits and expand context\. For aggregation queries, continue until the main matching evidence is covered\. For update or temporal questions, pay attention to seq, time, and source fields\. Return JSON only: \{ "done": boolean, "tool\_calls": \[\{"name": string, "arguments": object\}\], "relevant\_snippets": \[\{"doc\_id": string, "start\_line": int, "end\_line": int\}\], "relevant\_docs": \[string, …\] \}

##### Merge\-related prompts\.

The implementation also uses two auxiliary prompts during scheduled consolidation: one selects non\-overlapping groups of small, topically similar documents for merging, and the other rewrites the selected group into a single normalized document\. Both reuse the same invariants as above: exact metadata preservation, no invented facts, deduplication by recency, and Markdown\-only output\.

### A\.2Strategy Selection and Design Rationale

The architecture deliberately prioritizes note\-like structured text over a purely discrete vector store or a graph\-only memory layer\.

##### Why not rely only on discrete vector memory?

Discrete entries are easy to append and index, but they couple retrieval quality tightly to fragment granularity\. As the topic document library grows, topic\-related evidence becomes scattered and top\-kkretrieval struggles to recover complete evidence sets\.

##### Why not rely only on a knowledge graph?

Knowledge graphs are powerful for representing explicit relational structure, but many interaction memories do not naturally map to graph edges\. They also increase modeling and maintenance cost, especially when temporal revision and evidence provenance matter\.

##### Why use a buffer plus consolidation?

The buffer allows append\-only high\-frequency writes during interaction, while consolidation amortizes the cost of organizing, revising, and redistributing memory\. This separation mirrors human note\-taking and makes the write path predictable\.

### A\.3Algorithmic Pseudocode

Algorithm 1Memory Writing and Consolidation1:procedureWriteAndConsolidate\(

zt,C,D,cfgz\_\{t\},C,D,\\textit\{cfg\}\)

2:

M←Extract​\(zt\)M\\leftarrow\\textsc\{Extract\}\(z\_\{t\}\)
3:if

M=∅M=\\emptysetthen

4:return

C,DC,D
5:endif

6:if

CCdoes not existthen

7:

C←CreateCurrent​\(M\)C\\leftarrow\\textsc\{CreateCurrent\}\(M\)
8:else

9:

C←AppendToCurrent​\(C,M\)C\\leftarrow\\textsc\{AppendToCurrent\}\(C,M\)
10:endif

11:if

Tokens​\(C\)≤cfg\.current\_threshold\\textsc\{Tokens\}\(C\)\\leq\\textit\{cfg\}\.\\text\{current\\\_threshold\}then

12:return

C,DC,D
13:endif

14:

R←RewriteCurrent​\(C\)R\\leftarrow\\textsc\{RewriteCurrent\}\(C\)
15:

SD←Summaries​\(D∖\{current\}\)S\_\{D\}\\leftarrow\\textsc\{Summaries\}\(D\\setminus\\\{\\textsc\{current\}\\\}\)
16:

P←SafePlanUpdate​\(R,SD,cfg\)P\\leftarrow\\textsc\{SafePlanUpdate\}\(R,\\;S\_\{D\},\\;\\textit\{cfg\}\)
17:for allupdate

u∈P\.updatesu\\in P\.\\text\{updates\}do

18:

D\[u\.id\]←RewriteDoc\(D\[u\.id\],u\.new\_content\)D\[u\.\\text\{id\}\]\\leftarrow\\textsc\{RewriteDoc\}\(D\[u\.\\text\{id\}\],\\;u\.\\text\{new\\\_content\}\)
19:endfor

20:for allnew document

n∈P\.new\_docsn\\in P\.\\text\{new\\\_docs\}do

21:

D←D∪\{RewriteDoc\(∅,n\.content\)\}D\\leftarrow D\\cup\\\{\\textsc\{RewriteDoc\}\(\\varnothing,\\;n\.\\text\{content\}\)\\\}
22:endfor

23:

Clear​\(current\)\\textsc\{Clear\}\(\\textsc\{current\}\)
24:IncrementCurrentEpoch

25:if

MergeMaintenanceEnabled​\(cfg\)\\textsc\{MergeMaintenanceEnabled\}\(\\textit\{cfg\}\)then

26:

Gsmall←SmallSimilarDocs​\(D\)G\_\{\\text\{small\}\}\\leftarrow\\textsc\{SmallSimilarDocs\}\(D\)
27:

G←SelectMergeGroups​\(Gsmall\)G\\leftarrow\\textsc\{SelectMergeGroups\}\(G\_\{\\text\{small\}\}\)
28:for allgroup

g∈Gg\\in Gdo

29:

D←ReplaceGroupWithMergedDoc​\(D,g\)D\\leftarrow\\textsc\{ReplaceGroupWithMergedDoc\}\(D,g\)
30:endfor

31:endif

32:

RefreshLibraryMetadata​\(D\)\\textsc\{RefreshLibraryMetadata\}\(D\)
33:return

C,DC,D
34:endprocedure

Algorithm 2Split\-Aware Update Planning1:procedureSafePlanUpdate\(

content,summaries,cfg\\textit\{content\},\\textit\{summaries\},\\textit\{cfg\}\)

2:if

Tokens​\(content\)\>cfg\.plan\_split\_threshold\\textsc\{Tokens\}\(\\textit\{content\}\)\>\\textit\{cfg\}\.\\text\{plan\\\_split\\\_threshold\}then

3:

chunks←SplitByHeading\(content,cfg\.markdown\_length\)\\textit\{chunks\}\\leftarrow\\textsc\{SplitByHeading\}\(\\textit\{content\},\\;\\textit\{cfg\}\.\\text\{markdown\\\_length\}\)
4:if

\|chunks\|=1\|\\textit\{chunks\}\|=1then

5:

chunks←RecursiveFallbackSplit\(content,cfg\.plan\_split\_threshold\)\\textit\{chunks\}\\leftarrow\\textsc\{RecursiveFallbackSplit\}\(\\textit\{content\},\\;\\textit\{cfg\}\.\\text\{plan\\\_split\\\_threshold\}\)
6:endif

7:else

8:

chunks←\{content\}\\textit\{chunks\}\\leftarrow\\\{\\textit\{content\}\\\}
9:endif

10:

plans←\[\]\\textit\{plans\}\\leftarrow\[\\,\]
11:for all

chunk∈chunks\\textit\{chunk\}\\in\\textit\{chunks\}do

12:

plan←PlanUpdate​\(chunk,summaries\)\\textit\{plan\}\\leftarrow\\textsc\{PlanUpdate\}\(\\textit\{chunk\},\\;\\textit\{summaries\}\)
13:ifplanis not valid JSONthen

14:

sub←RecursiveFallbackSplit\(chunk,cfg\.plan\_split\_threshold\)\\textit\{sub\}\\leftarrow\\textsc\{RecursiveFallbackSplit\}\(\\textit\{chunk\},\\;\\textit\{cfg\}\.\\text\{plan\\\_split\\\_threshold\}\)
15:

subplans←SafePlanUpdate​\(sub,summaries,cfg\)\\textit\{subplans\}\\leftarrow\\textsc\{SafePlanUpdate\}\(\\textit\{sub\},\\textit\{summaries\},\\textit\{cfg\}\)
16:

plan←MergePlans​\(subplans\)\\textit\{plan\}\\leftarrow\\textsc\{MergePlans\}\(\\textit\{subplans\}\)
17:endif

18:

plans←plans∪\{plan\}\\textit\{plans\}\\leftarrow\\textit\{plans\}\\cup\\\{\\textit\{plan\}\\\}
19:endfor

20:return

MergePlans​\(plans\)\\textsc\{MergePlans\}\(\\textit\{plans\}\)
21:endprocedure

Algorithm 3Agentic Memory Retrieval1:procedureRetrieveAndAnswer\(

qt,C,D,cfgq\_\{t\},C,D,\\textit\{cfg\}\)

2:

S←LibrarySummaries​\(D\)S\\leftarrow\\textsc\{LibrarySummaries\}\(D\)
3:

H←\{\(query=qt,catalog=S\)\}H\\leftarrow\\\{\(\\text\{query\}=q\_\{t\},\\;\\text\{catalog\}=S\)\\\}
4:for

i←1i\\leftarrow 1to

cfg\.agentic\_max\_iterations\\textit\{cfg\}\.\\text\{agentic\\\_max\\\_iterations\}do

5:

o←AgentStep​\(H\)o\\leftarrow\\textsc\{AgentStep\}\(H\)
6:if

o\.doneo\.\\text\{done\}then

7:

Z←MaterializeSelections\(o\.snippets,o\.docs,D\)Z\\leftarrow\\textsc\{MaterializeSelections\}\(o\.\\text\{snippets\},\\;o\.\\text\{docs\},\\;D\)
8:break

9:endif

10:

T←ExecuteTools\(o\.tool\_calls,D,cfg\)T\\leftarrow\\textsc\{ExecuteTools\}\(o\.\\text\{tool\\\_calls\},\\;D,\\;\\textit\{cfg\}\)
11:

H←H∪\{T\}H\\leftarrow H\\cup\\\{T\\\}
12:endfor

13:if

Z=∅Z=\\emptysetor

Tokens​\(Z\)<cfg\.min\_retrieval\_tokens\\textsc\{Tokens\}\(Z\)<\\textit\{cfg\}\.\\text\{min\\\_retrieval\\\_tokens\}then

14:

PD←HeadingPartitions​\(D\)P\_\{D\}\\leftarrow\\textsc\{HeadingPartitions\}\(D\)
15:

Z←Z∪BM25Fallback\(qt,PD,cfg\.fallback\_topk\)Z\\leftarrow Z\\cup\\textsc\{BM25Fallback\}\(q\_\{t\},\\;P\_\{D\},\\;\\textit\{cfg\}\.\\text\{fallback\\\_topk\}\)
16:endif

17:

E←Z∪Recent​\(C\)E\\leftarrow Z\\cup\\textsc\{Recent\}\(C\)
18:

at←LLMAnswer​\(qt,E\)a\_\{t\}\\leftarrow\\textsc\{LLMAnswer\}\(q\_\{t\},\\;E\)
19:return

ata\_\{t\}
20:endprocedure

Similar Articles

SimpleMem: Efficient Lifelong Memory for LLM Agents

Papers with Code Trending

Introduces SimpleMem, an efficient memory framework for LLM agents that uses semantic lossless compression to improve accuracy and reduce token consumption, achieving 26.4% F1 improvement and up to 30x reduction in inference-time token usage.

Human-Inspired Memory Architecture for LLM Agents

arXiv cs.AI

Microsoft researchers propose a biologically-inspired memory architecture for LLM agents that incorporates mechanisms like sleep-phase consolidation and interference-based forgetting to manage persistent memory efficiently.