AdMem: Advanced Memory for Task-solving Agents

arXiv cs.AI 06/08/26, 04:00 AM Papers
memory llm-agents multi-agent task-solving framework automatic-memory
Summary
This paper introduces AdMem, a unified memory framework for LLM-based agents that integrates semantic, episodic, and procedural memory with a bi-level short-term and long-term store, using a multi-agent architecture for automatic memory generation and adaptive retrieval. Experiments show improved robustness and success on long multi-turn tasks.
arXiv:2606.06787v1 Announce Type: new Abstract: Large Language Models (LLMs) show promise as tool-using agents but remain limited in long-horizon tasks that require remembering, organizing, and reusing knowledge. Prior memory approaches aim to resolve the situation, but mainly focus on storing factual information. Recent work on procedural memory improves task reuse, yet often reduces to replaying past successes without addressing failure cases or online scalability. We introduce a unified and automatic memory framework that integrates semantic, episodic, and procedural memory in a bi-level design combining short-term and long-term stores. A multi-agent architecture with actor, memory, and critic agents enables automatic memory generation, reward annotation, and adaptive retrieval. Long-term memory is managed through reward-based evaluation, merging, and pruning, ensuring scalability and continual improvement. Experiments across various environments show that our approach improves robustness and success on long multi-turn tasks compared to existing baselines. This work highlights the importance of comprehensive, adaptive memory for advancing LLM-based agents.
Original Article
View Cached Full Text
Cached at: 06/08/26, 09:14 AM
# AdMem: Advanced Memory for Task-solving Agents
Source: [https://arxiv.org/html/2606.06787](https://arxiv.org/html/2606.06787)
Runzhe Wang Princeton University runzhew@princeton\.edu &Huilin Lu Amazon huilinlu@amazon\.com &Shengjie Liu Amazon zycjlsj@amazon\.com

###### Abstract

Large Language Models \(LLMs\) show promise as tool\-using agents but remain limited in long\-horizon tasks that require remembering, organizing, and reusing knowledge\. Prior memory approaches aim to resolve the situation, but mainly focus on storing factual information\. Recent work on procedural memory improves task reuse, yet often reduces to replaying past successes without addressing failure cases or online scalability\. We introduce a unified and automatic memory framework that integrates semantic, episodic, and procedural memory in a bi\-level design combining short\-term and long\-term stores\. A multi\-agent architecture with actor, memory, and critic agents enables automatic memory generation, reward annotation, and adaptive retrieval\. Long\-term memory is managed through reward\-based evaluation, merging, and pruning, ensuring scalability and continual improvement\. Experiments across various environments show that our approach improves robustness and success on long multi\-turn tasks compared to existing baselines\. This work highlights the importance of comprehensive, adaptive memory for advancing LLM\-based agents\.

## 1Introduction

Large Language Models \(LLMs\) \(Brownet al\.\([2020](https://arxiv.org/html/2606.06787#bib.bib104)\); Chowdheryet al\.\([2023](https://arxiv.org/html/2606.06787#bib.bib105)\); DeepSeek\-AIet al\.\([2025](https://arxiv.org/html/2606.06787#bib.bib108)\); Touvronet al\.\([2023](https://arxiv.org/html/2606.06787#bib.bib106)\); Zenget al\.\([2023](https://arxiv.org/html/2606.06787#bib.bib107)\)\) have driven major progress in artificial intelligence, achieving breakthroughs across many areas\. While they have improved in reasoning and tool use within agentic settings, they still struggle with long multi\-turn tasks that require remembering, organizing, and applying knowledge across sessions or large inputsZhanget al\.\([2024](https://arxiv.org/html/2606.06787#bib.bib150)\)\. These challenges highlight the central role of memory, which is widely regarded as an essential component of intelligence and crucial for agent adaptivity\. Two main research directions have emerged to tackle the memory challenge in agentic LLMsWuet al\.\([2025](https://arxiv.org/html/2606.06787#bib.bib151)\): \(1\)architectural memory, where additional capacity is built directly into the network \(e\.g\., through layers, modulesXuet al\.\([2023](https://arxiv.org/html/2606.06787#bib.bib152)\)\), and \(2\)symbolic / textual memorysystems that the model reads and writes explicitly \(often via providing APIs or policies\)\. Across both directions, many efforts are made in an attempt to enhance the LLM’s information handling capability in the long task solving process\.

One line of work integrates memory directly into the model parameters to enhance the effective context length of the model\. End\-to\-End Memory NetworksSukhbaataret al\.\([2015](https://arxiv.org/html/2606.06787#bib.bib161)\)introduced differentiable attention over external memory for question answering, and later work[Bergeset al\.](https://arxiv.org/html/2606.06787#bib.bib5); Behrouzet al\.\([2024](https://arxiv.org/html/2606.06787#bib.bib162)\)extend the idea to introduce learned long\-term memory modules that retains historical context while keeping training parallelizable and inference efficient\. Parametric approaches provide speed and differentiability but sacrifice readability and controllable persistence, since the remembered information is bound to model parameters and is neither interpretable nor controllable by designers\.

In contrast, textual stores are auditable and tool\-friendly, but require well\-designed policies to be effective and efficient\. Early worksModarressiet al\.\([2025](https://arxiv.org/html/2606.06787#bib.bib153);[2024](https://arxiv.org/html/2606.06787#bib.bib154)\)fine\-tunes models to write knowledge as triples and to read via retrieval over that store, improving language modeling and knowledge\-heavy tasks with interpretable memory traces; Memory SandboxHuanget al\.\([2023](https://arxiv.org/html/2606.06787#bib.bib155)\)foregrounds user control, letting human user add / merge / delete memories directly; MemGPTPackeret al\.\([2024](https://arxiv.org/html/2606.06787#bib.bib156)\)introduces OS\-style virtual context management, with a focus on memory paging for the LLM to perform CRUD operations; Mem0Chhikaraet al\.\([2025](https://arxiv.org/html/2606.06787#bib.bib157)\)extracts memory units from running conversations, consolidates them \(often into graphs\), and reports major latency and token\-cost wins in production\-style settings; Mem1Zhouet al\.\([2025](https://arxiv.org/html/2606.06787#bib.bib158)\)goes further, training a compact internal textual state that’s recurrently updated across turns;Zhonget al\.\([2023](https://arxiv.org/html/2606.06787#bib.bib159)\)layers episodic histories, long\-term summaries, and evolving user “portraits,” with decay/boost rules inspired by forgetting curves; merging systems like MemAgentYuet al\.\([2025](https://arxiv.org/html/2606.06787#bib.bib160)\)train a reinforcement\-learned memory policy that reads long texts in segments and overwrites / consolidates memory to achieve linear\-time long\-context processing\. HiAgent\(Huet al\.,[2025](https://arxiv.org/html/2606.06787#bib.bib176)\)decompose task\-solving into multiple sub\-task trunks for better context condensation\.

While the above work shows miscellaneous designs on memory generation and management policies, they primarily build the memory in two considerations: 1\. learn factual information about the world and the user; 2\. condense the long history context into short and portable text segments useful for subsequent inferences\. From the perspective of cognitive architectures\(Cohenet al\.,[1997](https://arxiv.org/html/2606.06787#bib.bib178)\), they revisit the classical division of semantic and episodic memoryWuet al\.\([2025](https://arxiv.org/html/2606.06787#bib.bib151)\)\. Meanwhile, an important part in the division is missing in the process as the procedural memory, with LLMs functioning as decision makers in a probabilistic production system and prompts serving as control flow\. Therefore several recent workFanget al\.\([2025](https://arxiv.org/html/2606.06787#bib.bib167)\); Tanget al\.\([2025](https://arxiv.org/html/2606.06787#bib.bib172)\); Wang and Chen \([2025](https://arxiv.org/html/2606.06787#bib.bib171)\); Wanget al\.\([2024](https://arxiv.org/html/2606.06787#bib.bib164)\); Yanget al\.\([2024](https://arxiv.org/html/2606.06787#bib.bib170)\)incorporate the procedural memory as a documentation of successful decision making process for future reference\. Agent workflow memory \(AWM\)Wanget al\.\([2024](https://arxiv.org/html/2606.06787#bib.bib164)\)induces reusable workflows from past successes in web navigation\. By storing and applying these workflows, AWM achieves notable improvements in success rates and efficiency on challenging benchmarks such as Mind2WebDenget al\.\([2023](https://arxiv.org/html/2606.06787#bib.bib165)\)and WebArenaZhouet al\.\([2024](https://arxiv.org/html/2606.06787#bib.bib166)\); building on this, MempFanget al\.\([2025](https://arxiv.org/html/2606.06787#bib.bib167)\)develops a lifelong procedural memory that distills experiences into granular instructions and higher\-level templates, dynamically updating and transferring them across tasks and models, with strong results on TravelPlannerXieet al\.\([2024](https://arxiv.org/html/2606.06787#bib.bib168)\)and ALFWorldShridharet al\.\([2021](https://arxiv.org/html/2606.06787#bib.bib169)\); beyond action trajectories, Buffer of Thoughts \(BoT\)Yanget al\.\([2024](https://arxiv.org/html/2606.06787#bib.bib170)\)focuses on reasoning patterns by storing distilled ‘thought templates’ to guide problem\-solving, improving accuracy and robustness across reasoning tasks with minimal overhead; extending the scope further, MIRIXWang and Chen \([2025](https://arxiv.org/html/2606.06787#bib.bib171)\)introduces a multi\-agent, multi\-modal memory architecture with six structured types, enabling long\-term, personalized, and efficient memory management across diverse benchmarks; Agent KBTanget al\.\([2025](https://arxiv.org/html/2606.06787#bib.bib172)\)advances cross\-agent procedural memory, combining high\-level strategies and execution logs in a hierarchical store to enable transfer across domains, achieving large gains on GAIAMialonet al\.\([2023](https://arxiv.org/html/2606.06787#bib.bib173)\)and SWE\-benchJimenezet al\.\([2024](https://arxiv.org/html/2606.06787#bib.bib174)\)\.

However, these design of the procedural memory has some main limitations\. First, they view the procedural memory as a recipe of instructions extracted from successful task completions, therefore the memory is not effective in guiding the critical task steps that the model usually fails at\. Second, the binary task\-level feedback considered in their setting is sparse and is not sufficient in guiding a long task solving process that involves many steps, a common challenge termed as credit assignment in reinforcement learning designs\. Third, they mostly consider the memory generation in an offline training\-inference setting, with evaluations built on Question\-Answering\-type datasets, and there is very limited scalable memory management design in online deployments, lacking in automatic memory entry evaluation, active erasing and reranking memory, and adaptive memory recall\.

Theoretically, elegant and general\-purpose agent frameworks are proposed\(Sumerset al\.,[2023](https://arxiv.org/html/2606.06787#bib.bib175); Gaoet al\.,[2025](https://arxiv.org/html/2606.06787#bib.bib177)\)\. Two direction of thoughts shape an ideal agent with strong intelligence: a memory architecture perspective with short\-term vs long\-term memory, episodic, semantic and procedural memory; a learning cognitive perspective with a feedback loop and memory encoding/decoding as motor that propels an agent to self evolve through experience\. However a gap persist between these ideas and practical memory implementations in the literature\. Conceptual frameworks lack implementation on any practical LLM applications\. In particular, there are substantial mismatches between rule\-based, deterministic systems, statistics\-based numerical algorithms, and stochastic, token\-level reasoning processes in LLMs\. Practical systems often simplify or omit critical components required for learning in interactive environments\. Systems implementing learning memory typically lack mechanisms for reward\-driven updates, belief modeling, transition learning, or credit assignment, limiting their ability to improve over time\.

Given the above, we hope to design a better memory system for a life\-long task\-solving agentic environment, aiming towards bridging the aforementioned theory\-practice gap\. Our motivation for adding memory to LLM\-based agents is \(1\) to enable user and task customization, and \(2\) to support self\-improvement in the agent’s decision making process in a long\-run environment\. In our work, we introduce AdMem, a unified framework of

- •A comprehensive memory system that supports the generation, storage, management, and retrieval of procedural, semantic, and episodic memories\. A bi\-level memory architecture is established, from short\-term memory designed for context compaction, to scalable long\-term memory with automatic memory evaluation, consolidation and pruning, and adaptive memory retrieval\.
- •An agent planning paradigm designed for effective memory generation, incorporating adaptive task planning, expectation commenting, and automatic reflections\.

## 2Methodology

### 2\.1Set up

We consider the task\-solving agentic setting where an agent aims to solve a series of tasks by interacting with an exterior environment in turns\. The environment possibly comprises different components including human users, other agents or tool\-calling infrastructures, and the interactions can be realized by natural language communication or LLM tool uses\. In each roundtt, the agent takes actionata\_\{t\}which is transferred to the environment\. Then the environment evolves and responses with an observationoto\_\{t\}\. The agent’s goal, defined by each task, is to achieve certain state of the environment respectively with its actions\. Notice that in real applications like human assistants, the environment usually does not reset after each task, making a life\-long horizon for the problem, which is different from typical task solving benchmarks where different tasks utilize independent environment states\.

We formalize the agent decision making process to also be a Partially\-Observable Markov Decision Process \(POMDP\)\(S,A,T,R,Ω\)\(S,A,T,R,\\Omega\)\. The agent preserves an agent states∈Ss\\in Swhich we also refer to as memory\. In each roundtt, the agent chooses actionat∼π\(st\)a\_\{t\}\\sim\\pi\(s\_\{t\}\)with an LLM\-based policyπ\(ct\)∈Δ\(A\)\\pi\(c\_\{t\}\)\\in\\Delta\(A\)that maps an LLM context \(ctc\_\{t\}, generated from memorysts\_\{t\}\) to a probability distribution over the action space\. Then it receives observationot∈Ωo\_\{t\}\\in\\Omegafrom the environment and performs memory update as state transitionst\+1=T\(st,at,ot\)s\_\{t\+1\}=T\(s\_\{t\},a\_\{t\},o\_\{t\}\)\.

Standard vanilla baselines for multi\-turn task solving benchmarks \(e\.g\.Maet al\.\([2024](https://arxiv.org/html/2606.06787#bib.bib180)\)\) use a naive implementation of the POMDP to test the native decision making ability of LLM models, namely settingct=st=\[a1,o1,a2,o2,a3,o3⋯at,ot\]c\_\{t\}=s\_\{t\}=\[a\_\{1\},o\_\{1\},a\_\{2\},o\_\{2\},a\_\{3\},o\_\{3\}\\cdots a\_\{t\},o\_\{t\}\]to be the trajectory so far\. Then an LLM model is used as a policy to generate the next actionat\+1∼πLLM\(ct\)a\_\{t\+1\}\\sim\\pi\_\{LLM\}\(c\_\{t\}\)\. As the trajectory can be very long for complex tasks, these benchmarks raise challenges for modern LLMs to support and effectively reason over long contexts\. Meanwhile, as an agentic approach to this challenge, prior work has proposed two main directions: \(1\) Context Compaction, maintaining the statests\_\{t\}at a moderate length that fits within a single LLM context window, typically through truncation or LLM summarization, and \(2\) Retrieval Augmented Generation, providing only a partial state to the LLM when generating actions\. These two handling patterns naturally correspond to the short\-term and long\-term memory in human cognition\.

### 2\.2Framework

EnvironmentActor AgentCriticAgentLong\-TermMemory Agent<Task instructions\><Task tools\><Memory tools\><Task STM\><Retrieved LTM\>Semantic HeapEpisodic StackObservationActionTask STMActionExpectationObservationProceduralsSemantics& episodicsRetrievedLTM

Figure 1:Interaction diagram between the external environment, actor, critic, and memory agents\. The actor agent maintains short\-term memories \(STM\) to get all the necessary information about the task, and is provided with long\-term memories \(LTM\) to utilize past experiences\.In our memory framework, we incorporate both short\-term memory and long\-term memory as our agentic memory\. Specifically, our agent stateS=Sshort×SlongS=S\_\{\\text\{short\}\}\\times S\_\{\\text\{long\}\}comprises the short term memory partSshortS\_\{\\text\{short\}\}and the long term memory partSlongS\_\{\\text\{long\}\}\. To enable efficient memory management, we build our system as a multi\-agent framework that incorporates three important parts, an actor agent, a memory agent, and a critic agent\.

- •Actor agent:an LLM\-based agent that utilizes memory to interact with the environment to solve tasks\. The agent maintains a short\-term memory statest∈Sshorts\_\{t\}\\in S\_\{\\text\{short\}\}in itself that is updated per turn and is cleared at the end of each task, where the information is encoded into the long term memory\. We will refer this short term memory as state in this paper\.
- •Long\-term memory agent:a retrieval\-based agent that maintains the long\-term memory storeM∈SlongM\\in S\_\{\\text\{long\}\}\. It stores three types of memory: semantic, episodic and procedural memory that are used across turns and tasks\.
- •Critic agent:an LLM\-based agent that helps the memory generation process\. It aims to compress the raw memory material, annotate the procedural memories with guidance and reward signals so that the memories can help the actor agent to improve its later decision\-making process\.

We consider three types of informationM=MS⊕ME⊕MPM=M\_\{S\}\\oplus M\_\{E\}\\oplus M\_\{P\}\(the semantic, episodic and procedural memory\) stored in memory to follow the classic taxonomy in cognitive science\.

- •𝐌𝐒\\bf\{M\_\{S\}\}: semantic memory stores the facts and general knowledge about the environment and the tasks\.
- •𝐌𝐄\\bf\{M\_\{E\}\}: episodic memory stores the summary of past events in the agent’s interactions\.
- •𝐌𝐏\\bf\{M\_\{P\}\}: procedural memory stores the decision making guidance extracted from past actions\. With each action made and outcome observed, the agent will think about whether the outcome meets its expectation and contributes to the task, and whether the action can be improved\. It then encodes everything as a procedural memory entry that guides subsequent decision\-making under similar circumstances\.

The semantic memories and episodic memories are relatively easy to form as we use a native LLM to summarize facts and events\. The procedural memories are more complicated as \(1\) we can only evaluate the action until all the related outcomes are observed, and \(2\) there may not be explicit reward provided by the environment\. In our setting, the success of a solution attempt may depend on multiple steps, so we introduce a critic agent to temporarily cache action scenarios and conclude the action at a proper time later\. Besides, when the agent generates an action, we ask it to explicit formulate the expected purpose or result of the action, and the critic agent will conclude the action by comparing the actual outcome with the expected outcome\. The complete memory generation and usage pipeline is illustrated in the next section\.

### 2\.3Agent Pipeline

Algorithm 1Actor Agent PipelineInitial agent state

s0s\_\{0\}\. Round number

t←0t\\leftarrow 0\.

whileactivedo

t←t\+1t\\leftarrow t\+1\. Receive observation

oto\_\{t\}from the user / environment\.

Report

oto\_\{t\}to the critic agent\.

Report the context

ct=\(st−1,ot\)c\_\{t\}=\(s\_\{t\-1\},o\_\{t\}\)to the memory agent to generate semantic and episodic memory\.

Call the memory agent to retrieve procedural memory

Mt=\[mt\(1\),mt\(1\)…mt\(n\)\]M\_\{t\}=\[m\_\{t\}^\{\(1\)\},m\_\{t\}^\{\(1\)\}\\dots m\_\{t\}^\{\(n\)\}\], as well as semantic and episodic memories

Mt′M^\{\\prime\}\_\{t\}, given the context

ctc\_\{t\}\.

Choose action

ata\_\{t\}based on

\(ct,Mt,Mt′\)\(c\_\{t\},M\_\{t\},M^\{\\prime\}\_\{t\}\), and set up an expected outcome

o^t\+1\\hat\{o\}\_\{t\+1\}for the action\. Transit to state

st←p\(st−1,ot,at\)s\_\{t\}\\leftarrow p\(s\_\{t\-1\},o\_\{t\},a\_\{t\}\)\(namely update the short term memory\)\.

Report the half\-baked memory

k=\(ct,at,o^t\+1,Mt\)k=\(c\_\{t\},a\_\{t\},\\hat\{o\}\_\{t\+1\},M\_\{t\}\)to the critic\.

endwhile

Algorithm 2Critic Agent PipelineMemory Queue

Q←\[\]Q\\leftarrow\[\]\.

whileactivedo

ifreceived half\-baked

kkfrom the actorthen

put

kkinto

QQ\.

endif

ifreceived observation

oofrom the actorthen

forevery half\-baked memory

k=\(ck,ak,o^k,Mk\)k=\(c\_\{k\},a\_\{k\},\\hat\{o\}\_\{k\},M\_\{k\}\)in Qdo

if

oois relevant to

kkthen

Append

ooto the entry

kk\.

if

kkhas received all the observations for criticismthen

Generate reward

rk∈\[0,1\]r\_\{k\}\\in\[0,1\]and reflection

fkf\_\{k\}for

kk\.

Report a new procedural memory entry

\(ck,ak,fk\)\(c\_\{k\},a\_\{k\},f\_\{k\}\)to the memory agent

\(to be inserted into the memory store\)\.

Report reward for the retrieved memories

\(Mk,rk\)\(M\_\{k\},r\_\{k\}\)to the memory agent

\(to be used to update the memory evaluation scores\)\.

endif

endif

endfor

endif

endwhile

[Figure1](https://arxiv.org/html/2606.06787#S2.F1)provides an overview of the information passing pipeline in the operation of the agent\. The critic and memory agent backup the actor agent by supporting a long\-term memory for the actor agent to refer to, and they work in parallel to solve tasks\.[Algorithm1](https://arxiv.org/html/2606.06787#alg1)and[2](https://arxiv.org/html/2606.06787#alg2)provide a detailed description of the agent execution pipelines\.

In the execution pipelines, the actor agent is responsible for handling communication with the user and the environment, when receiving memory provided by the long term memory agent\. Beside talking to the user and the environment, it can also use a set of memory planning tools to generate plans and thoughts that are helpful for task solving and memory generation\. The critic agent is responsible for the generation of procedural memories\. It receives agent action context \(the state and observation before the action\), a current agent action, the agent’s expected purpose or outcome of the action, and the actual outcome as the raw material for generation the procedural memory\. Then it condenses the raw material to summarize the context and action, proposes better alternatives as well as labels the procedure with rewardrt∈\{0,1\}r\_\{t\}\\in\\\{0,1\\\}by evaluating whether the outcome meets the expectation and is helpful for solving the task\. To help the critic agent obtain enough observations for criticism, we designed several planning tools for the actor agent that relates to memory generation:

- •Plan for the new task: the actor agent generate a plan first when solving a new task, and a procedural memory for the planning is generated at the end of the task when the critic agent received all the observations related to the plan\.
- •Add sub\-goal: generate a sub\-plan for a sub\-goal of the task\.
- •Conclude sub\-goal: a procedural memory is generated for the sub\-goal planning action, with everything between adding sub\-goal and concluding sub\-goal as its observations\.
- •Think: generate some thoughts in the short term memory\.
- •Act: act in the environment\. A procedural memory for the action is generated after immediate observation\.

Adding sub\-goals and concluding sub\-goals provide the agent with the ability to explore the environment like a search tree\. Meanwhile, we can manage the context in a way similar to the stack in the execution of a computer program, and save context space with better scope management, as elaborated in the next section\.

Moreover, notice that each time the critic agent generate a reward, it also passes reward to the long\-term memory store together with indices of the retrieved procedural memory entries that guide the action that induces the reward\. This reward is used for the memory agent to evaluate the effectiveness of each procedural memory to guide downstream tasks, as we illustrate in the next section\.

### 2\.4Short term memory for context compaction

For the actor agent, an efficient context management is needed for the agent to focus on the current goal and carry out the current plan; for the critic agent, an efficient summarization of a plan execution is also needed\. For context compaction, we adopted the idea from traditional computer science that a program maintains a running stack in its memory space\. In our agent, the current stack includes the current sub\-goal to be attained as well as a plan made earlier by the agent\. Turns outside the sub\-goal are compressed and summarized by the actor agent LLM\. This stack\-based structure allows the agent to maintain a clear execution context, attend only to information relevant to the current step, and resume higher\-level objectives once lower\-level sub\-goals are completed\.

### 2\.5Reward\-based long\-term memory management

The memory agent is a storage that manages long\-term memories in our framework\. It is responsible for long\-term memory retrieval and management\.

- •Memory retrieval:for episodic and semantic memory, we use dense retrieval to enable retrieval with similarity match\. For procedural memory, we incorporate both the evaluation of memory entry effectiveness and the context similarity into consideration for retrieval\. The details are explained later\.
- •Memory management:the agent prunes and evicts redundant memories to avoid excess occupancy\. We use an LLM to manage all the semantic memories for information updates\. For the procedural memories, we set a thresholdϵ\>0\\epsilon\>0for the disuse of memories that has stayed long enough in the store\. Any memory retrieved at a frequency belowϵ\\epsilonare deleted from the store, and if any two memory entries are constantly retrieved or not retrieved together with frequency above1−ϵ1\-\\epsilon, the agent will merge the two memory entry into one, enabling a more effective retrieval process\.

As we can see from above, the retrieval process is central to the memory management as the managements are built upon the retrieving statistics\. Therefore, to enabling a more effective retrieval process, we wish to incorporate a bandit\-type evaluation of the memory entries especially for procedural memories that directly link with reward information\. Therefore, we build a simple model to estimate the effectiveness of each procedural memory recipe in guiding downstream tasks\.

For each procedural memory entrym∈MPm\\in M\_\{P\}in the store, we set up an adaptive parametervm∈\[0,1\]v\_\{m\}\\in\[0,1\]indicating whether the suggestion in the entry is effective\. In an actor step, the actor agent makes the decision in the face of contextctc\_\{t\}, as well as retrieved procedural memoriesMt=\[mt\(1\),mt\(1\)…mt\(n\)\]M\_\{t\}=\[m\_\{t\}^\{\(1\)\},m\_\{t\}^\{\(1\)\}\\dots m\_\{t\}^\{\(n\)\}\], as illustrated in[Algorithm1](https://arxiv.org/html/2606.06787#alg1)\. Then after the actor receives rewardrt∈\{0,1\}r\_\{t\}\\in\\\{0,1\\\}, we build a lightweight model by assuming that the reward is produced through a binary stochastic process :rt\|ct=lt\(1\)∨lt\(2\)∨⋯∨lt\(n\)∨lt\.r\_\{t\}\|\_\{c\_\{t\}\}=l\_\{t\}^\{\(1\)\}\\vee l\_\{t\}^\{\(2\)\}\\vee\\cdots\\vee l\_\{t\}^\{\(n\)\}\\vee l\_\{t\}\.Herelt\(i\)∈\{0,1\}l\_\{t\}^\{\(i\)\}\\in\\\{0,1\\\}is a random variable indicating whether memory entryiihelps the action given context, andlt∈\{0,1\}l\_\{t\}\\in\\\{0,1\\\}indicates whether the action is successful without the help of any memory\. For a memory entry to be helpful, we assume that the its content should be effective in its own context, and also the context in the memory entry and the current task context should be similar\. Therefore we have model:

lt\(i\)∼ℬ\(vmt\(i\)ψ\(ct\)⊤ψ\(c\(i\)\)\)\.l\_\{t\}^\{\(i\)\}\\sim\\mathcal\{B\}\(v\_\{m\_\{t\}^\{\(i\)\}\}\\psi\(c\_\{t\}\)^\{\\top\}\\psi\(c^\{\(i\)\}\)\)\.
Hereℬ\(p\)\\mathcal\{B\}\(p\)is the Bernoulli distribution with meanpp,ψ\\psiis the normalized dense retrieval embedding map used in our memory retriever, andc\(i\)c^\{\(i\)\}is the context of the action in memory entrymt\(i\)m\_\{t\}^\{\(i\)\}\. Therefore, each time we observe a rewardrtr\_\{t\}, we can infer an update to the estimation ofvmt\(1\)⋯vmt\(n\)v\_\{m\_\{t\}^\{\(1\)\}\}\\cdots v\_\{m\_\{t\}^\{\(n\)\}\}in our model, and given the estimations ofvmv\_\{m\}, we can retrieve the most helpful memory entries greedily by selecting those with top\-nnscoresvmψ\(ct\)⊤ψ\(cm\)v\_\{m\}\\psi\(c\_\{t\}\)^\{\\top\}\\psi\(c\_\{m\}\)\.

In practice we use an Expectation\-Maximization \(EM\) algorithm to update our estimate of thevmv\_\{m\}parameters\. In the spirit of optimism in the face of uncertainty, we set the initialvmv\_\{m\}values to 1\.

## 3Experiments

We evaluates our memory mechanisms under streaming multi\-task conditions that simulates real task\-solving agent environments\. AdMem is tested on a variety of task domains bundled by the implementation of AgentBoard environment\(Maet al\.,[2024](https://arxiv.org/html/2606.06787#bib.bib180)\)\. The benchmark provides a variety of tasks over several domains ranging from embodied AIs, text games, web research and tool calling\. In each domain, we set up the agent with the same system prompts \([AppendixA](https://arxiv.org/html/2606.06787#A1)\) and the vanilla task instructions in AgentBoard, and launch a life\-long run over the set of all tasks once, while accumulating and using memory in an online sense\. We tested the task completion over the run\.

In building our agent, we adopted Claude Haiku 4\.5 model as our LLM backbone\. We baseline our methodology against existing memory frameworks including Agent pipelines without across\-task memory \(ReAct,Yaoet al\.\([2022](https://arxiv.org/html/2606.06787#bib.bib179)\)\) and Memory\-based agents for procedural knowledge \(AWM,Wanget al\.\([2024](https://arxiv.org/html/2606.06787#bib.bib164)\)\)\. Our current baseline selection \(AWM\) is intended to represent agents with lifelong memory accumulation in multi\-turn task settings, while many existing memory systems \(e\.g\., Mem0, MemGPT, MIRIX\) primarily focus on retrieval optimization over static knowledge bases, which are well\-suited for question\-answering tasks but not directly designed for interactive tool\-use or long\-horizon decision\-making environments\. The result is exhibited inLABEL:tab:results2with both task completeness \(the percentage of task completed\) and the average progress \(pi∈\[0,1\]p\_\{i\}\\in\[0,1\]for each taskii,pi=0p\_\{i\}=0for no progress andpi=1p\_\{i\}=1for completion\)\. AdMem has been shown to boost the model performance in most domains while maintaining on\-par best performances in the others\.

The benefit of memory usually depends on the domain and the task sets because they vary with the amount of transferable knowledge and experience across different tasks\. For previous procedural memory implementations \(AWM inLABEL:tab:results2\) or our naive implementation of procedural memory \(LLM \+ LTP inLABEL:tab:results\), we often observe that adding memory harms the performance when the transferred information causes more confusion than assistance across tasks\. This actually manifests the challenge to distinguish task\-only vs\. transferable knowledge, as well as balancing memory\-reliance vs\. improvising\. With the use of planning tools and careful memory encoding,LABEL:tab:results2shows that: on the domains where memory is useful, our implementation can boost performance; on the domains where memory may not be useful, our implementation does not greatly hurt performance with the additional structure\.

To further understand the effect of different components in our build, we decompose the system into \(i\) the acting and planning component \(including short\-term memory management\), and \(ii\) the long\-term memory component\. Specifically, we evaluate performance when removing \(a\) procedural memory together with the reward model, and \(b\) planning and semantic\-episodic memory encoding\. Furthermore, we also test AdMem when the task set is streamed for multiple epochs, to see if the memory incur benefits over longer horizons\. The result is captured inLABEL:tab:results\. Short\-term planning, supported by memory planning tools and context management, contributes to the task progress but does not scale with the horizon as no information is shared across tasks\. Long\-term memory, on the other hand, genuinely improves task\-performance over long horizons when similar situations and tasks are revisited, and thus is central to the self\-evolution of the agent itself\. However, when added alone, it burns\-in with early performance degradation\.

Our design intentionally prioritizes task performance, learning capability and robustness in long\-horizon settings over minimal per\-step cost\. Meanwhile, in the implementation of AdMem, the actor, critic, and memory agent work in parallel through multi\-threading, so memory encoding and management do not incur much time delay to the whole pipeline\. It is still a limitation for the current pipeline to incur extra prompts, planning steps, and a number of extra background LLM callings in solving the tasks\. For the worst case in the above environments, per\-step time cost is 2 times that for vanilla LLM with 3 times LLM calling, which happens when the task is simple so vanilla LLM is extremely fast\.

## 4Conclusion

In this work, we present AdMem, a systematic memory design for task\-solving agents, incorporating a short\-to\-long term hierarchy a splitting of semantic, episodic, and procedural components\. This design not only supports task customization but also enhances agent performance by enabling learning from past experiences\. Our framework demonstrates strong effectiveness in the simulated environments in AgentBoard, particularly when agents face repetitive tasks\. Furthermore, it demonstrates the potential for agents to autonomously evolve via memory\-based self\-reflection in online environments, paving the way for scalable and sustained long\-term development\.

## References

- A\. Behrouz, P\. Zhong, and V\. Mirrokni \(2024\)Titans: learning to memorize at test time\.External Links:2501\.00663,[Link](https://arxiv.org/abs/2501.00663)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p2.1)\.
- \[2\]V\. Berges, B\. Oguz, D\. HAZIZA, W\. Yih, L\. Zettlemoyer, and G\. GhoshMemory layers at scale\.InForty\-second International Conference on Machine Learning,Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p2.1)\.
- T\. B\. Brown, B\. Mann, N\. Ryder, M\. Subbiah, J\. Kaplan, P\. Dhariwal, A\. Neelakantan, P\. Shyam, G\. Sastry, A\. Askell, S\. Agarwal, A\. Herbert\-Voss, G\. Krueger, T\. Henighan, R\. Child, A\. Ramesh, D\. M\. Ziegler, J\. Wu, C\. Winter, C\. Hesse, M\. Chen, E\. Sigler, M\. Litwin, S\. Gray, B\. Chess, J\. Clark, C\. Berner, S\. McCandlish, A\. Radford, I\. Sutskever, and D\. Amodei \(2020\)Language models are few\-shot learners\.External Links:2005\.14165,[Link](https://arxiv.org/abs/2005.14165)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p1.1)\.
- P\. Chhikara, D\. Khant, S\. Aryan, T\. Singh, and D\. Yadav \(2025\)Mem0: building production\-ready ai agents with scalable long\-term memory\.External Links:2504\.19413,[Link](https://arxiv.org/abs/2504.19413)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p3.1)\.
- A\. Chowdhery, S\. Narang, J\. Devlin, M\. Bosma, G\. Mishra, A\. Roberts, P\. Barham, H\. W\. Chung, C\. Sutton, S\. Gehrmann, P\. Schuh, K\. Shi, S\. Tsvyashchenko, J\. Maynez, A\. Rao, P\. Barnes, Y\. Tay, N\. Shazeer, V\. Prabhakaran, E\. Reif, N\. Du, B\. Hutchinson, R\. Pope, J\. Bradbury, J\. Austin, M\. Isard, G\. Gur\-Ari, P\. Yin, T\. Duke, A\. Levskaya, S\. Ghemawat, S\. Dev, H\. Michalewski, X\. Garcia, V\. Misra, K\. Robinson, L\. Fedus, D\. Zhou, D\. Ippolito, D\. Luan, H\. Lim, B\. Zoph, A\. Spiridonov, R\. Sepassi, D\. Dohan, S\. Agrawal, M\. Omernick, A\. M\. Dai, T\. S\. Pillai, M\. Pellat, A\. Lewkowycz, E\. Moreira, R\. Child, O\. Polozov, K\. Lee, Z\. Zhou, X\. Wang, B\. Saeta, M\. Diaz, O\. Firat, M\. Catasta, J\. Wei, K\. Meier\-Hellstern, D\. Eck, J\. Dean, S\. Petrov, and N\. Fiedel \(2023\)PaLM: scaling language modeling with pathways\.Journal of Machine Learning Research24\(240\),pp\. 1–113\.External Links:[Link](http://jmlr.org/papers/v24/22-1144.html)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p1.1)\.
- N\. J\. Cohen, R\. A\. Poldrack, and H\. Eichenbaum \(1997\)Memory for items and memory for relations in the procedural/declarative memory framework\.Memory5\(1\-2\),pp\. 131–178\.Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p4.1)\.
- DeepSeek\-AI, D\. Guo, D\. Yang, H\. Zhang, J\. Song, R\. Zhang, R\. Xu, Q\. Zhu, S\. Ma, P\. Wang, X\. Bi, X\. Zhang, X\. Yu, Y\. Wu, Z\. F\. Wu, Z\. Gou, Z\. Shao, Z\. Li, Z\. Gao, A\. Liu, B\. Xue, B\. Wang, B\. Wu, B\. Feng, C\. Lu, C\. Zhao, C\. Deng, C\. Zhang, C\. Ruan, D\. Dai, D\. Chen, D\. Ji, E\. Li, F\. Lin, F\. Dai, F\. Luo, G\. Hao, G\. Chen, G\. Li, H\. Zhang, H\. Bao, H\. Xu, H\. Wang, H\. Ding, H\. Xin, H\. Gao, H\. Qu, H\. Li, J\. Guo, J\. Li, J\. Wang, J\. Chen, J\. Yuan, J\. Qiu, J\. Li, J\. L\. Cai, J\. Ni, J\. Liang, J\. Chen, K\. Dong, K\. Hu, K\. Gao, K\. Guan, K\. Huang, K\. Yu, L\. Wang, L\. Zhang, L\. Zhao, L\. Wang, L\. Zhang, L\. Xu, L\. Xia, M\. Zhang, M\. Zhang, M\. Tang, M\. Li, M\. Wang, M\. Li, N\. Tian, P\. Huang, P\. Zhang, Q\. Wang, Q\. Chen, Q\. Du, R\. Ge, R\. Zhang, R\. Pan, R\. Wang, R\. J\. Chen, R\. L\. Jin, R\. Chen, S\. Lu, S\. Zhou, S\. Chen, S\. Ye, S\. Wang, S\. Yu, S\. Zhou, S\. Pan, S\. S\. Li, S\. Zhou, S\. Wu, S\. Ye, T\. Yun, T\. Pei, T\. Sun, T\. Wang, W\. Zeng, W\. Zhao, W\. Liu, W\. Liang, W\. Gao, W\. Yu, W\. Zhang, W\. L\. Xiao, W\. An, X\. Liu, X\. Wang, X\. Chen, X\. Nie, X\. Cheng, X\. Liu, X\. Xie, X\. Liu, X\. Yang, X\. Li, X\. Su, X\. Lin, X\. Q\. Li, X\. Jin, X\. Shen, X\. Chen, X\. Sun, X\. Wang, X\. Song, X\. Zhou, X\. Wang, X\. Shan, Y\. K\. Li, Y\. Q\. Wang, Y\. X\. Wei, Y\. Zhang, Y\. Xu, Y\. Li, Y\. Zhao, Y\. Sun, Y\. Wang, Y\. Yu, Y\. Zhang, Y\. Shi, Y\. Xiong, Y\. He, Y\. Piao, Y\. Wang, Y\. Tan, Y\. Ma, Y\. Liu, Y\. Guo, Y\. Ou, Y\. Wang, Y\. Gong, Y\. Zou, Y\. He, Y\. Xiong, Y\. Luo, Y\. You, Y\. Liu, Y\. Zhou, Y\. X\. Zhu, Y\. Xu, Y\. Huang, Y\. Li, Y\. Zheng, Y\. Zhu, Y\. Ma, Y\. Tang, Y\. Zha, Y\. Yan, Z\. Z\. Ren, Z\. Ren, Z\. Sha, Z\. Fu, Z\. Xu, Z\. Xie, Z\. Zhang, Z\. Hao, Z\. Ma, Z\. Yan, Z\. Wu, Z\. Gu, Z\. Zhu, Z\. Liu, Z\. Li, Z\. Xie, Z\. Song, Z\. Pan, Z\. Huang, Z\. Xu, Z\. Zhang, and Z\. Zhang \(2025\)DeepSeek\-r1: incentivizing reasoning capability in llms via reinforcement learning\.External Links:2501\.12948,[Link](https://arxiv.org/abs/2501.12948)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p1.1)\.
- X\. Deng, Y\. Gu, B\. Zheng, S\. Chen, S\. Stevens, B\. Wang, H\. Sun, and Y\. Su \(2023\)Mind2Web: towards a generalist agent for the web\.External Links:2306\.06070,[Link](https://arxiv.org/abs/2306.06070)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p4.1)\.
- R\. Fang, Y\. Liang, X\. Wang, J\. Wu, S\. Qiao, P\. Xie, F\. Huang, H\. Chen, and N\. Zhang \(2025\)Memp: exploring agent procedural memory\.External Links:2508\.06433,[Link](https://arxiv.org/abs/2508.06433)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p4.1)\.
- H\. Gao, J\. Geng, W\. Hua, M\. Hu, X\. Juan, H\. Liu, S\. Liu, J\. Qiu, X\. Qi, Y\. Wu,et al\.\(2025\)A survey of self\-evolving agents: on path to artificial super intelligence\.arXiv preprint arXiv:2507\.210461\.Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p6.1)\.
- M\. Hu, T\. Chen, Q\. Chen, Y\. Mu, W\. Shao, and P\. Luo \(2025\)Hiagent: hierarchical working memory management for solving long\-horizon agent tasks with large language model\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 32779–32798\.Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p3.1)\.
- Z\. Huang, S\. Gutierrez, H\. Kamana, and S\. MacNeil \(2023\)Memory sandbox: transparent and interactive memory management for conversational agents\.External Links:2308\.01542,[Link](https://arxiv.org/abs/2308.01542)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p3.1)\.
- C\. E\. Jimenez, J\. Yang, A\. Wettig, S\. Yao, K\. Pei, O\. Press, and K\. Narasimhan \(2024\)SWE\-bench: can language models resolve real\-world github issues?\.External Links:2310\.06770,[Link](https://arxiv.org/abs/2310.06770)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p4.1)\.
- C\. Ma, J\. Zhang, Z\. Zhu, C\. Yang, Y\. Yang, Y\. Jin, Z\. Lan, L\. Kong, and J\. He \(2024\)Agentboard: an analytical evaluation board of multi\-turn llm agents\.Advances in neural information processing systems37,pp\. 74325–74362\.Cited by:[§2\.1](https://arxiv.org/html/2606.06787#S2.SS1.p3.3),[§3](https://arxiv.org/html/2606.06787#S3.p1.1)\.
- G\. Mialon, C\. Fourrier, C\. Swift, T\. Wolf, Y\. LeCun, and T\. Scialom \(2023\)GAIA: a benchmark for general ai assistants\.External Links:2311\.12983,[Link](https://arxiv.org/abs/2311.12983)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p4.1)\.
- A\. Modarressi, A\. Imani, M\. Fayyaz, and H\. Schütze \(2024\)RET\-llm: towards a general read\-write memory for large language models\.External Links:2305\.14322,[Link](https://arxiv.org/abs/2305.14322)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p3.1)\.
- A\. Modarressi, A\. Köksal, A\. Imani, M\. Fayyaz, and H\. Schütze \(2025\)MemLLM: finetuning llms to use an explicit read\-write memory\.External Links:2404\.11672,[Link](https://arxiv.org/abs/2404.11672)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p3.1)\.
- C\. Packer, S\. Wooders, K\. Lin, V\. Fang, S\. G\. Patil, I\. Stoica, and J\. E\. Gonzalez \(2024\)MemGPT: towards llms as operating systems\.External Links:2310\.08560,[Link](https://arxiv.org/abs/2310.08560)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p3.1)\.
- M\. Shridhar, X\. Yuan, M\. Côté, Y\. Bisk, A\. Trischler, and M\. Hausknecht \(2021\)ALFWorld: aligning text and embodied environments for interactive learning\.External Links:2010\.03768,[Link](https://arxiv.org/abs/2010.03768)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p4.1)\.
- S\. Sukhbaatar, A\. Szlam, J\. Weston, and R\. Fergus \(2015\)End\-to\-end memory networks\.External Links:1503\.08895,[Link](https://arxiv.org/abs/1503.08895)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p2.1)\.
- T\. Sumers, S\. Yao, K\. R\. Narasimhan, and T\. L\. Griffiths \(2023\)Cognitive architectures for language agents\.Transactions on Machine Learning Research\.Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p6.1)\.
- X\. Tang, T\. Qin, T\. Peng, Z\. Zhou, D\. Shao, T\. Du, X\. Wei, P\. Xia, F\. Wu, H\. Zhu,et al\.\(2025\)Agent kb: leveraging cross\-domain experience for agentic problem solving\.arXiv preprint arXiv:2507\.06229\.Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p4.1)\.
- H\. Touvron, L\. Martin, K\. Stone, P\. Albert, A\. Almahairi, Y\. Babaei, N\. Bashlykov, S\. Batra, P\. Bhargava, S\. Bhosale, D\. Bikel, L\. Blecher, C\. C\. Ferrer, M\. Chen, G\. Cucurull, D\. Esiobu, J\. Fernandes, J\. Fu, W\. Fu, B\. Fuller, C\. Gao, V\. Goswami, N\. Goyal, A\. Hartshorn, S\. Hosseini, R\. Hou, H\. Inan, M\. Kardas, V\. Kerkez, M\. Khabsa, I\. Kloumann, A\. Korenev, P\. S\. Koura, M\. Lachaux, T\. Lavril, J\. Lee, D\. Liskovich, Y\. Lu, Y\. Mao, X\. Martinet, T\. Mihaylov, P\. Mishra, I\. Molybog, Y\. Nie, A\. Poulton, J\. Reizenstein, R\. Rungta, K\. Saladi, A\. Schelten, R\. Silva, E\. M\. Smith, R\. Subramanian, X\. E\. Tan, B\. Tang, R\. Taylor, A\. Williams, J\. X\. Kuan, P\. Xu, Z\. Yan, I\. Zarov, Y\. Zhang, A\. Fan, M\. Kambadur, S\. Narang, A\. Rodriguez, R\. Stojnic, S\. Edunov, and T\. Scialom \(2023\)Llama 2: open foundation and fine\-tuned chat models\.External Links:2307\.09288,[Link](https://arxiv.org/abs/2307.09288)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p1.1)\.
- Y\. Wang and X\. Chen \(2025\)MIRIX: multi\-agent memory system for llm\-based agents\.External Links:2507\.07957,[Link](https://arxiv.org/abs/2507.07957)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p4.1)\.
- Z\. Z\. Wang, J\. Mao, D\. Fried, and G\. Neubig \(2024\)Agent workflow memory\.External Links:2409\.07429,[Link](https://arxiv.org/abs/2409.07429)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p4.1),[§3](https://arxiv.org/html/2606.06787#S3.p2.4)\.
- Y\. Wu, S\. Liang, C\. Zhang, Y\. Wang, Y\. Zhang, H\. Guo, R\. Tang, and Y\. Liu \(2025\)From human memory to ai memory: a survey on memory mechanisms in the era of llms\.External Links:2504\.15965,[Link](https://arxiv.org/abs/2504.15965)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p1.1),[§1](https://arxiv.org/html/2606.06787#S1.p4.1)\.
- J\. Xie, K\. Zhang, J\. Chen, T\. Zhu, R\. Lou, Y\. Tian, Y\. Xiao, and Y\. Su \(2024\)TravelPlanner: a benchmark for real\-world planning with language agents\.External Links:2402\.01622,[Link](https://arxiv.org/abs/2402.01622)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p4.1)\.
- L\. Xu, H\. Xie, S\. J\. Qin, X\. Tao, and F\. L\. Wang \(2023\)Parameter\-efficient fine\-tuning methods for pretrained language models: a critical review and assessment\.External Links:2312\.12148,[Link](https://arxiv.org/abs/2312.12148)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p1.1)\.
- L\. Yang, Z\. Yu, T\. Zhang, S\. Cao, M\. Xu, W\. Zhang, J\. E\. Gonzalez, and B\. Cui \(2024\)Buffer of thoughts: thought\-augmented reasoning with large language models\.External Links:2406\.04271,[Link](https://arxiv.org/abs/2406.04271)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p4.1)\.
- S\. Yao, J\. Zhao, D\. Yu, N\. Du, I\. Shafran, K\. R\. Narasimhan, and Y\. Cao \(2022\)React: synergizing reasoning and acting in language models\.InThe eleventh international conference on learning representations,Cited by:[§3](https://arxiv.org/html/2606.06787#S3.p2.4)\.
- H\. Yu, T\. Chen, J\. Feng, J\. Chen, W\. Dai, Q\. Yu, Y\. Zhang, W\. Ma, J\. Liu, M\. Wang, and H\. Zhou \(2025\)MemAgent: reshaping long\-context llm with multi\-conv rl\-based memory agent\.External Links:2507\.02259,[Link](https://arxiv.org/abs/2507.02259)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p3.1)\.
- A\. Zeng, X\. Liu, Z\. Du, Z\. Wang, H\. Lai, M\. Ding, Z\. Yang, Y\. Xu, W\. Zheng, X\. Xia, W\. L\. Tam, Z\. Ma, Y\. Xue, J\. Zhai, W\. Chen, P\. Zhang, Y\. Dong, and J\. Tang \(2023\)GLM\-130b: an open bilingual pre\-trained model\.External Links:2210\.02414,[Link](https://arxiv.org/abs/2210.02414)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p1.1)\.
- Z\. Zhang, X\. Bo, C\. Ma, R\. Li, X\. Chen, Q\. Dai, J\. Zhu, Z\. Dong, and J\. Wen \(2024\)A survey on the memory mechanism of large language model based agents\.External Links:2404\.13501,[Link](https://arxiv.org/abs/2404.13501)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p1.1)\.
- W\. Zhong, L\. Guo, Q\. Gao, H\. Ye, and Y\. Wang \(2023\)MemoryBank: enhancing large language models with long\-term memory\.External Links:2305\.10250,[Link](https://arxiv.org/abs/2305.10250)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p3.1)\.
- S\. Zhou, F\. F\. Xu, H\. Zhu, X\. Zhou, R\. Lo, A\. Sridhar, X\. Cheng, T\. Ou, Y\. Bisk, D\. Fried, U\. Alon, and G\. Neubig \(2024\)WebArena: a realistic web environment for building autonomous agents\.External Links:2307\.13854,[Link](https://arxiv.org/abs/2307.13854)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p4.1)\.
- Z\. Zhou, A\. Qu, Z\. Wu, S\. Kim, A\. Prakash, D\. Rus, J\. Zhao, B\. K\. H\. Low, and P\. P\. Liang \(2025\)MEM1: learning to synergize memory and reasoning for efficient long\-horizon agents\.External Links:2506\.15841,[Link](https://arxiv.org/abs/2506.15841)Cited by:[§1](https://arxiv.org/html/2606.06787#S1.p3.1)\.

## Appendix ALLM Prompts

Here we include the prompts used by the actor and the critic agent\. The bold trunks are filled with corresponding information in the task\-solving process\.

Actor Agent: You are a helpful agent that solves tasks in an given environment\. You take turns to interact with the environment and learn about the environment from the observations, until you achieve the goal defined in the task\. A log for your previous turns in the task is given below, where planner messages \(if exists\) marks the plan you made earlier for solving the task and agent\_summary messages provide summarization of one of more turns in the process\.In the next turn you MUST use one of the tools in the <tools\>tag\. Do not output plain text\. Some tools are provided for you to plan for next steps in the task and some tools are provided to interact with the environment\. You need to decide which tool to use, but to solve hard task efficiently, careful planning is important\. Always make sure you generate valid JSON for using the tool\.Before action, the system will also generate some messages by recalling the system memory\. These messages include information from your past interactions with the environment, as well as reflections on your past actions under similar circumstances \(for other tasks in the same environment\)\. Please examine these messages and think about their implications for the possible outcomes or your action, before deciding your action\.<tools\>\[Memory Tool Descriptions\] </tools\>Below is the description of the task and the environment\.\[Task Instructions\]Below is the task execution log:\[Short Term Memory Output\]Below is some memory recall for past experience under similar circumstances:\[Long Term Memory Output\]

Critic Agent: You are a helpful assistant that generates a procedural memory entry from an event\. The event is an action \(or a plan\) taken by an agent when solving a task\. The purpose of the procedural memory entry is to provide a description of the context of the action \(plan\), the action \(plan\) taken by the agent, the expected outcome of the action \(plan\), a summary of the actual outcome after the action \(plan\), and some evaluations and reflections aiming to optimize the action based on the result\. The procedural memory should not be too long, but should be helpful in improving the agent’s behavior when provided to the agent in a similar situation in the future\. Hide private information like user name, account number etc\. to protect privacy, for example use \[user name\] in the place of actual user name\.The context of the action \(or plan\):\[context\]The action \(or plan\) taken by the agent:\[action\]The expected outcome/purpose:\[expectation\]The actual outcome:\[observation\]Below is some domain information that helps you understand the task setting\.Now it’s your turn to generate the procedural memory entry\. To generate, call the tool ’generate\_procedural\_memory’ and fill in the arguments with corresponding information\.<tool\_description\>Name:generate\_procedural\_memoryDescription:Generate procedural memory entry\.Args: context \(str\):the action context, a brief and high\-level background description summarizing the context where the agent action occurs\.result \(str\):a summary of the result of the action or plan, including 1\. the agent’s expected outcome / achievement; 2\. the actual outcome; 3\. whether they are consistent\.success \(bool\):True or False that the action was successful, considering 1\. whether the outcome of the action meets the expectation, 2\. whether the outcome offers progress towards the task solving and plan execution\.reflection \(str\):Think about what can be learned from the result of the action, specifically what to change in the agent’s expectation, and what should the agent do under the same circumstances to better carry out the plan and solve the task\. Specifically, think about whether the agent should set up a subgoal first instead of directly interacting with the environment, to better organize the task solving process\.</tool\_description\>

We also provide the descriptions for the memory tools we built\. Notice that not all memory tools are activated simultaneously to the agent: at the start of a task, the agent can only use plan\_for\_the\_task\. Later it can use add\_subgoal if the stack depth of sub\-goals is within limit; it can use conclude\_subgoal if there is an active sub\-goal; it can use think if the previous turn is an action in the environment, and it can use take\_action always after task planning\.

Memory Tools: Name:plan\_for\_the\_taskDescription:Plan for a solution\. Set the goal and highlight the key points in solving the task, and plan for the task by writing a list of step\-by\-step instructions\. Each instruction can either be high\-level subgoal \(e\.g\. learn something about the environment or achieve certain status\), or some actions to interact with the environment \(e\.g\. use certain tools in the environment or communicate with the environment\)\. If the task is hard, decompose the task into multiple subgoals that are needed to be attained in line to solve the task\.Args: goal \(str\):The goal and key points of the current task\.plan \(str\):the step\-by\-step plan for solving the task\.Name:add\_subgoalDescription:Add the a sub\-goal to the current task and make plans to achieve the sub\-goal\. The sub\-goal should admit an observation\-based success criterion and should be helpful to solving the final task\. Use this method when the task is hard to complete or when you believe the current task plan is not effective to solve the task any more\.Args: subgoal \(str\):The current sub\-goal to be set\.plan \(str\):A list of step\-by\-step instructions that can be followed to achieve the sub\-goal\.Name:conclude\_subgoal Description:Conclude the current sub\-goal judge whether it is successfully achieved\. You should call the method if the current sub\-goal is already achieved or it cannot be achieved anymore\. Args: evaluation \(str\):A short natural language justification of whether the sub\-goal is successful success \(bool\):Whether the sub\-goal is successfully achieved\. Name:take\_action Description:Take an action in the environment\. In this method, you can take an action in the environment by pass the action information in the action argument\. Write what you expect the action to achieve \(or the purpose of the action\) in the expectation argument\. Args: action \(str\):The action to be taken\. expectation \(str\):The expected outcome / purpose of the action\. Name:think Description:Generate some thoughts\. This method will not change the environment\. Args: thoughts \(str\):Some thoughts\.
AdMem: Advanced Memory for Task-solving Agents

Similar Articles

ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning

H-Mem: A Novel Memory Mechanism for Evolving and Retrieving Agent Memory via a Hybrid Structure

@omarsar0: // AutoMem // I quite like this idea of metamemory. (bookmark it) This new research from Stanford treats agent's memory…

DimMem: Dimensional Structuring for Efficient Long-Term Agent Memory

SimpleMem: Efficient Lifelong Memory for LLM Agents

Submit Feedback

Similar Articles

ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning
H-Mem: A Novel Memory Mechanism for Evolving and Retrieving Agent Memory via a Hybrid Structure
@omarsar0: // AutoMem // I quite like this idea of metamemory. (bookmark it) This new research from Stanford treats agent's memory…
DimMem: Dimensional Structuring for Efficient Long-Term Agent Memory
SimpleMem: Efficient Lifelong Memory for LLM Agents