MEMOREPAIR: Barrier-First Cascade Repair in Agentic Memory
Summary
This paper introduces MemoRepair, a barrier-first cascade repair contract for agentic memory that addresses the problem of stale derived artifacts when source data changes. Experiments demonstrate that MemoRepair significantly reduces invalidated memory exposure and repair costs compared to exhaustive repair methods.
View Cached Full Text
Cached at: 05/11/26, 07:15 AM
# MemoRepair: Barrier-First Cascade Repair in Agentic Memory
Source: [https://arxiv.org/html/2605.07242](https://arxiv.org/html/2605.07242)
###### Abstract
Agentic memory evolves across tasks into durable derived artifacts: summaries, cached outputs, embeddings, learned skills, and executable tool procedures\. When a source artifact is deleted, corrected, or invalidated by tool or API migration, descendants derived from that source can remain visible and steer future actions with stale support\. We formalize this failure mode as the cascade update problem, where repair targets the visible derived state of the memory store\. We presentMemoRepair, a barrier\-first cascade\-repair contract for agentic memory\. A repair event induces a controlled transition from invalidated descendant state to validated successor state: affected descendants are withdrawn before repair, successors are constructed from retained support and staged repaired predecessors under the current interface, and republication is restricted to validated predecessor\-closed successors\. This contract induces a scalarized repair\-selection problem for a fixed repair–cost tradeoff\. We show that the induced publication problem reduces to maximum\-weight predecessor closure and can be solved exactly by a singless–ttmin\-cut\. Experiments on ToolBench and MemoryArena show that, with complete influence provenance,MemoRepairreduces invalidated\-memory exposure from69\.869\.8–94\.3%94\.3\\%under systems without cascade repair to0%0\\%\. Compared with exhaustive*Repair all*, it recovers91\.191\.1–94\.3%94\.3\\%of validated successors while reducing normalized repair\-operator cost from1\.001\.00to0\.570\.57–0\.760\.76\.
## 1Introduction
Persistent memory has become a core substrate for agentic AI systems that operate across tasks, tool calls, and sessions\. In these systems, raw interaction records and retrieved documents are transformed over time into durable derived artifacts: summaries of prior interactions, cached tool outputs, reusable skills, and executable tool procedures that compress past experience into state the agent can reuse\(Park et al\.,[2023](https://arxiv.org/html/2605.07242#bib.bib17); Shinn et al\.,[2023](https://arxiv.org/html/2605.07242#bib.bib23); Packer et al\.,[2024](https://arxiv.org/html/2605.07242#bib.bib16); Xu et al\.,[2025](https://arxiv.org/html/2605.07242#bib.bib29)\)\. These artifacts support long\-horizon behavior by letting the agent act from accumulated context rather than reconstructing each decision from scratch\.
As derived state accumulates, a source update is no longer a local edit\. When a source artifact is deleted, corrected, or invalidated by tool or API migration, descendants derived from that source may remain visible and continue to steer future actions with stale support\. A deleted preference may survive in a summary, a corrected tool response may remain in a cached output, and an obsolete API contract may persist in a stored call\-chain procedure\. We call this the*cascade update problem*: a root update induces repair obligations over the visible descendant state of the memory store\. The repair target is therefore the visible derived state of agentic memory, not only the invalidated root artifact\.
Prior work studies how agent memories are constructed, organized, retrieved, and reused\(Park et al\.,[2023](https://arxiv.org/html/2605.07242#bib.bib17); Packer et al\.,[2024](https://arxiv.org/html/2605.07242#bib.bib16); Xu et al\.,[2025](https://arxiv.org/html/2605.07242#bib.bib29)\), while tool\-use systems make persistent traces, cached outputs, API arguments, and executable procedures concrete forms of agent state\(Qin et al\.,[2023](https://arxiv.org/html/2605.07242#bib.bib20); Guo et al\.,[2024](https://arxiv.org/html/2605.07242#bib.bib11); Patil et al\.,[2024](https://arxiv.org/html/2605.07242#bib.bib18)\)\. Provenance and dependency\-tracking methods explain how derived state depends on source inputs\(Buneman et al\.,[2001](https://arxiv.org/html/2605.07242#bib.bib1); Green et al\.,[2007](https://arxiv.org/html/2605.07242#bib.bib9); Cheney et al\.,[2009](https://arxiv.org/html/2605.07242#bib.bib4)\)\. What remains missing is a post\-update transition for agentic memory: after an invalidation event, the system must decide which descendants to withdraw, which successors can be constructed from retained support under the current interface, and which validated successors may re\-enter service\.
We introduceMemoRepair, a barrier\-first repair contract for agentic memory\. After a deletion, correction, or migration, it first removes the affected cascade from service, then constructs successor artifacts from post\-event valid support, repaired predecessors, and the current interface\. Only validated successors whose required predecessors are also repaired may be republished\. This yields a cost\-aware selection problem: recover useful successors without exhaustively repairing every candidate\. For a fixed repair–cost tradeoff parameterλ\\lambda, the valid selections form a predecessor\-closed optimization problem, which we reduce exactly to a singless–ttmin\-cut\.
Experiments on ToolBench and MemoryArena cover deletion, correction, and migration events\(Qin et al\.,[2023](https://arxiv.org/html/2605.07242#bib.bib20); Guo et al\.,[2024](https://arxiv.org/html/2605.07242#bib.bib11); He et al\.,[2026](https://arxiv.org/html/2605.07242#bib.bib12)\)\. Across cascade\-unaware memory systems,92\.492\.4–99\.7%99\.7\\%of post\-event actions still depend on invalidated information, showing that ordinary update and retrieval do not provide cascade\-level withdrawal\. Under complete influence provenance,MemoRepairinstead withdraws affected memory before repair and republishes only validated successors\. It preserves nearly the same validated repairs as exhaustive*Repair all*while executing substantially less repair work\. Ablations show that the min\-cut selector improves over greedy selection and that store\-level repair complements parameter\-level neural repair\.
#### Contributions\.
We make the following contributions:
- •We formalize the cascade update problem in agentic memory\. A root deletion, correction, or migration induces repair obligations over the visible descendant state of summaries, cached outputs, prompt skills, chain procedures, and neural skills\.
- •We defineMemoRepair, a barrier\-first cascade\-repair contract\. The contract specifies the visibility, support, dependency, and validation conditions under which an affected descendant may leave the withdrawn state and re\-enter service as a successor\.
- •We derive the repair\-selection rule induced by this contract\. For a fixed repair–cost tradeoff, valid republication forms a predecessor\-closed selection problem, yielding an exactss–ttmin\-cut solver for the scalarized objective\.
## 2Method
A repair event inMemoRepairis a barrier\-first transition over the affected cascade induced by an invalidated root set\. As shown in Figure[1](https://arxiv.org/html/2605.07242#S2.F1), the system first withdraws the cascade from service, constructs repair candidates for affected descendants, selects a predecessor\-closed subset under a repair–cost tradeoff, and republishes only validated successors\.
Figure 1:Repair transition enforced byMemoRepair: affected artifacts are withdrawn, successor versions are constructed, and validated predecessor\-closed successors are republished\.### 2\.1Problem Setup
Persistent agentic memory is represented by a directed provenance graph
𝒢=\(V,Einf,Esem\),\\mathcal\{G\}=\(V,E^\{\\mathrm\{inf\}\},E^\{\\mathrm\{sem\}\}\),where nodes are durable artifacts\. An influence edge\(u,v\)∈Einf\(u,v\)\\in E^\{\\mathrm\{inf\}\}means thatvvwas produced usinguuas causal support; these edges determine repair scope\. Semantic edgesEsemE^\{\\mathrm\{sem\}\}support retrieval only\. Each artifact has a kind tag
kind\(x\)∈\{𝗋𝖾𝖼𝗈𝗋𝖽,𝖼𝖺𝖼𝗁𝖾,𝗌𝗎𝗆𝗆𝖺𝗋𝗒,𝗌𝗄𝗂𝗅𝗅\},\\mathrm\{kind\}\(x\)\\in\\\{\\mathsf\{record\},\\mathsf\{cache\},\\mathsf\{summary\},\\mathsf\{skill\}\\\},and each skill has an architecture tag
arch\(x\)∈\{𝗇𝖾𝗎𝗋𝖺𝗅,𝗉𝗋𝗈𝗆𝗉𝗍,𝖼𝗁𝖺𝗂𝗇\}\.\\mathrm\{arch\}\(x\)\\in\\\{\\mathsf\{neural\},\\mathsf\{prompt\},\\mathsf\{chain\}\\\}\.
A repair event is
e=\(F,τ,Δ\),F⊆V,τ∈\{𝖽𝖾𝗅𝖾𝗍𝖾,𝖼𝗈𝗋𝗋𝖾𝖼𝗍,𝗆𝗂𝗀𝗋𝖺𝗍𝖾\},e=\(F,\\tau,\\Delta\),\\qquad F\\subseteq V,\\quad\\tau\\in\\\{\\mathsf\{delete\},\\mathsf\{correct\},\\mathsf\{migrate\}\\\},whereFFis the invalidated root set andΔ\\Deltacontains correction or migration information\. The affected cascade is computed only through influence edges:
C\(F\)=Reach\(F;Einf\),D\(F\)=C\(F\)∖F,C\(F\)=\\mathrm\{Reach\}\(F;E^\{\\mathrm\{inf\}\}\),\\qquad D\(F\)=C\(F\)\\setminus F,whereReach\\mathrm\{Reach\}includes the zero\-hop roots inFF\. Thus withdrawal targetsC\(F\)C\(F\), while repair candidates are generated for descendantsD\(F\)D\(F\)\.
Letκe\\kappa\_\{e\}denote the post\-event interface\. The retained valid support is
𝖱𝖾𝗍\(e\)=\{V∖C\(F\),τ=𝖽𝖾𝗅𝖾𝗍𝖾,\(V∖C\(F\)\)∪XΔ,τ=𝖼𝗈𝗋𝗋𝖾𝖼𝗍,V∖C\(F\),τ=𝗆𝗂𝗀𝗋𝖺𝗍𝖾,\\mathsf\{Ret\}\(e\)=\\begin\{cases\}V\\setminus C\(F\),&\\tau=\\mathsf\{delete\},\\\\\[2\.0pt\] \(V\\setminus C\(F\)\)\\cup X\_\{\\Delta\},&\\tau=\\mathsf\{correct\},\\\\\[2\.0pt\] V\\setminus C\(F\),&\\tau=\\mathsf\{migrate\},\\end\{cases\}whereXΔX\_\{\\Delta\}is the set of fresh replacement artifacts materialized for correction events\. Thus𝖱𝖾𝗍\(e\)\\mathsf\{Ret\}\(e\)is interpreted in the post\-event artifact universe\.
### 2\.2Withdrawal Barrier
For the repair planρ\\rhoassociated with eventee,MemoRepairinitializes
Bρ←C\(F\)\.B\_\{\\rho\}\\leftarrow C\(F\)\.All artifacts inBρB\_\{\\rho\}are non\-servable during repair\. A fresh successorziz\_\{i\}for artifactiimay be republished only if
Validatei\(zi\)=1\.\\mathrm\{Validate\}\_\{i\}\(z\_\{i\}\)=1\.Validation checks replay consistency for recomputation, schema and task\-regression behavior for regeneration, sandbox behavior for chain skills, and forget/reference behavior for parametric repair\.
### 2\.3Artifact\-Aware Candidate Construction
Roots inFFare handled by the event semantics: deletion removes them, correction supplies replacements, and migration updatesκe\\kappa\_\{e\}\. For each descendanti∈D\(F\)i\\in D\(F\), a deterministic mode map assigns
μi=Φμe\(i;𝒢,𝖱𝖾𝗍\(e\),κe\)∈\{𝗋𝖾𝗆𝗈𝗏𝖾,𝗋𝖾𝖼𝗈𝗆𝗉𝗎𝗍𝖾,𝗋𝖾𝗀𝖾𝗇,𝗉𝖺𝗋𝖺𝗆\}\.\\mu\_\{i\}=\\Phi\_\{\\mu\}^\{e\}\(i;\\mathcal\{G\},\\mathsf\{Ret\}\(e\),\\kappa\_\{e\}\)\\in\\\{\\mathsf\{remove\},\\mathsf\{recompute\},\\mathsf\{regen\},\\mathsf\{param\}\\\}\.Replayable deterministic artifacts are assigned𝗋𝖾𝖼𝗈𝗆𝗉𝗎𝗍𝖾\\mathsf\{recompute\}; summaries, prompt skills, and chain procedures are assigned𝗋𝖾𝗀𝖾𝗇\\mathsf\{regen\}when retained support suffices; neural skills are assigned𝗉𝖺𝗋𝖺𝗆\\mathsf\{param\}when provenance yields forget/reference partitions\. Otherwise, the artifact is assigned𝗋𝖾𝗆𝗈𝗏𝖾\\mathsf\{remove\}\.
For each non\-removed descendant,
qi=\(μi,πi,Pi,vi,wi,ci\),q\_\{i\}=\(\\mu\_\{i\},\\pi\_\{i\},P\_\{i\},v\_\{i\},w\_\{i\},c\_\{i\}\),whereπi\\pi\_\{i\}is the repair operator,Pi⊆D\(F\)P\_\{i\}\\subseteq D\(F\)is the set of affected descendants whose repaired successors are consumed byπi\\pi\_\{i\},vi∈\{0,1\}v\_\{i\}\\in\\\{0,1\\\}records executability, andwi,ci≥0w\_\{i\},c\_\{i\}\\geq 0are fixed value–cost terms\. Root replacements and interface updates are supplied through𝖱𝖾𝗍\(e\)\\mathsf\{Ret\}\(e\)andκe\\kappa\_\{e\}, not throughPiP\_\{i\}\.
Define the non\-removed candidate index set
D\+\(F\)=\{i∈D\(F\):μi≠𝗋𝖾𝗆𝗈𝗏𝖾\},Qρ=\{qi:i∈D\+\(F\)\}\.D^\{\+\}\(F\)=\\\{i\\in D\(F\):\\mu\_\{i\}\\neq\\mathsf\{remove\}\\\},\\qquad Q\_\{\\rho\}=\\\{q\_\{i\}:i\\in D^\{\+\}\(F\)\\\}\.For eachi∈D\+\(F\)i\\in D^\{\+\}\(F\), let
P¯i=Pi∩D\+\(F\)\.\\bar\{P\}\_\{i\}=P\_\{i\}\\cap D^\{\+\}\(F\)\.IfPi∖D\+\(F\)≠∅P\_\{i\}\\setminus D^\{\+\}\(F\)\\neq\\varnothing, theniirequires a non\-repairable affected descendant and is marked non\-executable \(vi=0v\_\{i\}=0\)\. The repair\-time dependency graph is
Ereq=\{\(j,i\):i∈D\+\(F\),j∈P¯i\}\.E^\{\\mathrm\{req\}\}=\\\{\(j,i\):i\\in D^\{\+\}\(F\),\\,j\\in\\bar\{P\}\_\{i\}\\\}\.An edge\(j,i\)\(j,i\)means that the successor ofjjmust be staged before repairingii\. ThusEreqE^\{\\mathrm\{req\}\}captures repair\-time prerequisites, not raw provenance\.
### 2\.4Cost\-Aware Publication Selection
For eachi∈D\+\(F\)i\\in D^\{\+\}\(F\), letxi∈\{0,1\}x\_\{i\}\\in\\\{0,1\\\}indicate whetherqiq\_\{i\}is selected\. Define
Repair\(x\)=∑i∈D\+\(F\)wixi,Cost\(x\)=∑i∈D\+\(F\)cixi\.\\mathrm\{Repair\}\(x\)=\\sum\_\{i\\in D^\{\+\}\(F\)\}w\_\{i\}x\_\{i\},\\qquad\\mathrm\{Cost\}\(x\)=\\sum\_\{i\\in D^\{\+\}\(F\)\}c\_\{i\}x\_\{i\}\.For a fixed tradeoff parameterλ≥0\\lambda\\geq 0,MemoRepairsolves
maxx\\displaystyle\\max\_\{x\}\\quadRepair\(x\)−λCost\(x\)\\displaystyle\\mathrm\{Repair\}\(x\)\-\\lambda\\,\\mathrm\{Cost\}\(x\)\(1\)s\.t\.xi≤vi,∀i∈D\+\(F\),\\displaystyle x\_\{i\}\\leq v\_\{i\},\\quad\\forall i\\in D^\{\+\}\(F\),\(2\)xi≤xj,∀i∈D\+\(F\),∀j∈P¯i,\\displaystyle x\_\{i\}\\leq x\_\{j\},\\quad\\forall i\\in D^\{\+\}\(F\),\\ \\forall j\\in\\bar\{P\}\_\{i\},\(3\)xi∈\{0,1\},∀i∈D\+\(F\)\.\\displaystyle x\_\{i\}\\in\\\{0,1\\\},\\quad\\forall i\\in D^\{\+\}\(F\)\.\(4\)Constraint \([2](https://arxiv.org/html/2605.07242#S2.E2)\) excludes non\-executable candidates, and \([3](https://arxiv.org/html/2605.07242#S2.E3)\) enforces predecessor closure\. A hard\-budget variant with∑i∈D\+\(F\)cixi≤β\\sum\_\{i\\in D^\{\+\}\(F\)\}c\_\{i\}x\_\{i\}\\leq\\betais a precedence\-constrained knapsack problem in general, so we use the fixed\-λ\\lambdascalarized selector as the default publication rule\. We reserve*budget*for this optional hard capβ\\beta; all main experiments use the fixed\-λ\\lambdaselector and report the realized normalized repair\-operator*Cost*of the executed plan\.
###### Theorem 1\(Scalarized repair via min\-cut\)\.
For fixed eventee, candidate familyQρ=\{qi:i∈D\+\(F\)\}Q\_\{\\rho\}=\\\{q\_\{i\}:i\\in D^\{\+\}\(F\)\\\}with fixed\(Pi,vi,wi,ci\)\(P\_\{i\},v\_\{i\},w\_\{i\},c\_\{i\}\), andλ≥0\\lambda\\geq 0, the scalarized repair\-selection problem \([1](https://arxiv.org/html/2605.07242#S2.E1)\)–\([4](https://arxiv.org/html/2605.07242#S2.E4)\) reduces to maximum\-weight predecessor closure and can be solved exactly by a singless–ttmin\-cut\.
Briefly, settingpi=wi−λcip\_\{i\}=w\_\{i\}\-\\lambda c\_\{i\}yields a maximum\-weight closure problem: positive nodes connect to the source, negative nodes connect to the sink, and large\-capacity edges enforce executability and predecessor closure\. AlthoughEreqE^\{\\mathrm\{req\}\}is oriented from prerequisite to dependent, the min\-cut network uses edgesi→ji\\to jfor eachj∈P¯ij\\in\\bar\{P\}\_\{i\}, so that selectingiiforces selection of every required predecessor\.
Algorithm 1MemoRepair0:Repair event
e=\(F,τ,Δ\)e=\(F,\\tau,\\Delta\), graph
𝒢\\mathcal\{G\}, tradeoff
λ\\lambda
0:Republished validated successors
RρR\_\{\\rho\}
1:Compute
κe\\kappa\_\{e\},
C\(F\)C\(F\),
D\(F\)D\(F\), and
𝖱𝖾𝗍\(e\)\\mathsf\{Ret\}\(e\)
2:Set
Bρ←C\(F\)B\_\{\\rho\}\\leftarrow C\(F\)and withdraw
BρB\_\{\\rho\}
3:Assign modes
μi=Φμe\(i;𝒢,𝖱𝖾𝗍\(e\),κe\)\\mu\_\{i\}=\\Phi\_\{\\mu\}^\{e\}\(i;\\mathcal\{G\},\\mathsf\{Ret\}\(e\),\\kappa\_\{e\}\)for
i∈D\(F\)i\\in D\(F\)
4:Construct
D\+\(F\)D^\{\+\}\(F\),
QρQ\_\{\\rho\},
\{P¯i\}\\\{\\bar\{P\}\_\{i\}\\\}, and
EreqE^\{\\mathrm\{req\}\}
5:Mark unavailable, non\-repairable, or cyclic candidates non\-executable
6:Solve \([1](https://arxiv.org/html/2605.07242#S2.E1)\)–\([4](https://arxiv.org/html/2605.07242#S2.E4)\) and set
Sρ=\{i:xi=1\}S\_\{\\rho\}=\\\{i:x\_\{i\}=1\\\}
7:Execute
SρS\_\{\\rho\}in topological order over
Ereq\[Sρ\]E^\{\\mathrm\{req\}\}\[S\_\{\\rho\}\]
8:Republish validated successors; keep failures and their dependents withdrawn
9:return
RρR\_\{\\rho\}
## 3Experiments
#### Benchmarks\.
We evaluateMemoRepairon ToolBench and MemoryArena\.ToolBench\(Qin et al\.,[2023](https://arxiv.org/html/2605.07242#bib.bib20)\), executed via StableToolBench\(Guo et al\.,[2024](https://arxiv.org/html/2605.07242#bib.bib11)\), provides tool\-use trajectories with API calls, cached outputs, and executable procedures\. For parametric repair, we use the released ToolLLaMA\-7B LoRA checkpoint trained on ToolBench data as a ToolBench\-derived neural artifact\(ToolBench,[2023](https://arxiv.org/html/2605.07242#bib.bib24)\)\.MemoryArena\(He et al\.,[2026](https://arxiv.org/html/2605.07242#bib.bib12)\)provides long\-horizon memory trajectories with preferences, episodic records, and retrieval traces\.
Event and cascade statistics\.Cascades range from single\-tool deletions with a few dependents to collection\-level migrations affecting tens of artifacts, yielding nondegenerate selection instances\.
#### Metrics\.
We report exposure, successor publication, and downstream task effects\.Leak%\(↓\\downarrow\) is the fraction of affected artifacts inC\(F\)C\(F\)that remain servable after repair\.Stale\-use%\(↓\\downarrow\) is the fraction of post\-event action traces that use invalidated information\.Rep\.%\(↑\\uparrow\) is the validated republication rate over non\-remove repair contracts, micro\-averaged over events as100∑e\|Rρ,e\|/∑e\|De\+\(F\)\|100\\sum\_\{e\}\|R\_\{\\rho,e\}\|/\\sum\_\{e\}\|D^\{\+\}\_\{e\}\(F\)\|\.Cost\(↓\\downarrow\) is normalized executed repair\-operator cost, excluding the withdrawal barrier\.Δ\\DeltaTask\(↑\\uparrow\) is a diagnostic task score, reported in percentage points\. For tasktt, letAtA\_\{t\}be the affected artifacts used by the task,Ut⊆AtU\_\{t\}\\subseteq A\_\{t\}those without a validated successor, andLt⊆AtL\_\{t\}\\subseteq A\_\{t\}those whose stale version remains visible\. We setst=1s\_\{t\}=1whenAt=∅A\_\{t\}=\\emptysetand otherwisest=1−\|Ut\|\+12\|Lt\|\|At\|\.s\_\{t\}=1\-\\frac\{\|U\_\{t\}\|\+\\frac\{1\}\{2\}\|L\_\{t\}\|\}\{\|A\_\{t\}\|\}\.We report100\(𝔼t\[st\]−1\)100\(\\mathbb\{E\}\_\{t\}\[s\_\{t\}\]\-1\)\. The setsUtU\_\{t\}andLtL\_\{t\}may overlap:*No action*can be penalized for both missing repair and stale serving, whereas*Remove all*is penalized only for missing repair\. For neural skills,FSP%,RU%, andVal\. pass%report forget\-set suppression, retained utility, and threshold\-passing parametric successors\.
#### Baselines\.
*Internal repair policy baselines\.*\(i\)No action: the store is left unchanged\. \(ii\)Remove all: the affected cascadeC\(F\)C\(F\)is withdrawn, no successors are republished, and no repair operators are executed\. \(iii\)Barrier\+Greedy: a selector\-ablation control that shares the withdrawal barrier, candidates, repair operators, and validation oracle withMemoRepair, replacing only the exact min\-cut selector with a greedy value\-to\-cost heuristic under the sameλ\\lambda\. Differences againstBarrier\+Greedyare therefore attributable to the selector, not to the contract\. \(iv\)Repair all: executes every initially executable repair candidate and republishes only successors that passValidatei\\mathrm\{Validate\}\_\{i\}\. It therefore defines the publication ceiling rather than a100%100\\%publication baseline; its Cost is1\.001\.00whenever the initially executable setE0E\_\{0\}is nonempty, and0\.000\.00otherwise\.
*External memory systems baselines\.*We evaluate Mem0\(Chhikara et al\.,[2025](https://arxiv.org/html/2605.07242#bib.bib5)\), Zep/Graphiti\(Rasmussen et al\.,[2025](https://arxiv.org/html/2605.07242#bib.bib21)\), MemR3\(Du et al\.,[2025](https://arxiv.org/html/2605.07242#bib.bib6)\), TierMem\(Zhu et al\.,[2026](https://arxiv.org/html/2605.07242#bib.bib31)\), A\-MEM\(Xu et al\.,[2025](https://arxiv.org/html/2605.07242#bib.bib29)\), and O\-Mem\(Wang et al\.,[2025b](https://arxiv.org/html/2605.07242#bib.bib27)\)under their native update and retrieval semantics\. These systems do not expose cascade withdrawal, successor validation, or predecessor\-closed republication\. We use them as stale\-resistance reference points, measuring how much invalidated content their native pipelines serve or retrieve after an event on the same traces\.
*Neural\-skill operators\.*For the𝗉𝖺𝗋𝖺𝗆\\mathsf\{param\}repair mode, we compare adapted SBU\(Wang et al\.,[2026](https://arxiv.org/html/2605.07242#bib.bib25)\)and six parameter\-level operators \(NPO\(Zhang et al\.,[2024](https://arxiv.org/html/2605.07242#bib.bib30)\), SimNPO\(Fan et al\.,[2024](https://arxiv.org/html/2605.07242#bib.bib7)\), BLUR\(Reisizadeh et al\.,[2026](https://arxiv.org/html/2605.07242#bib.bib22)\), RMU\(Li et al\.,[2024](https://arxiv.org/html/2605.07242#bib.bib13)\), LUNE\(Liu et al\.,[2025](https://arxiv.org/html/2605.07242#bib.bib14)\), and OOO\(Gao et al\.,[2024](https://arxiv.org/html/2605.07242#bib.bib8)\)\) inside the sameMemoRepairpipeline, using the same provenance\-derived\(𝖥𝗈𝗋i\(e\),𝖱𝖾𝖿i\(e\)\)\(\\mathsf\{For\}\_\{i\}\(e\),\\mathsf\{Ref\}\_\{i\}\(e\)\)partitions and validation oracle\.
### 3\.1Repair Policy Comparison
Table[1](https://arxiv.org/html/2605.07242#S3.T1)reports repair\-policy comparisons on ToolBench and MemoryArena\. Under complete influence provenance, every artifact inC\(F\)C\(F\)becomes non\-servable once the withdrawal barrier is acknowledged, so zero Leak and Stale\-use are contract guarantees for all withdrawal\-based methods\. The empirical question reduces to whether validated successor publication recovers task utility at lower repair\-operator cost than exhaustive repair\.*Remove all*isolates the barrier by withdrawing the cascade without invoking any repair operator, yielding Rep\.==Cost=0=0\.*Repair all*establishes the publication ceiling: it attempts every initially executable candidate and republishes only successors that passValidatei\\mathrm\{Validate\}\_\{i\}\.MemoRepairandBarrier\+Greedyshare the barrier, candidate construction, operators, and validation oracle, and differ only in the publication selector\.
On ToolBench,MemoRepairrepublishes22\.4%22\.4\\%,45\.1%45\.1\\%, and36\.5%36\.5\\%ofD\+\(F\)D^\{\+\}\(F\)under deletion, correction, and migration respectively, recovering91\.191\.1–94\.3%94\.3\\%of the*Repair all*Rep\. ceiling at Cost between0\.570\.57and0\.660\.66\. ItsΔ\\DeltaTask lies within0\.080\.08task points of*Repair all*on all three event types\. On MemoryArena, deletion leaves fewer repairable descendants\. Deterministic record and final\-answer descendants usually require the deleted parent as input, so they become structurally non\-executable once that parent is withdrawn\. Skill descendants can still be repaired by the parametric repair operator, which removes the deleted support from the skill’s retained reference set rather than recomputing from the deleted parent\.
The resulting Rep\. ceiling is5\.5%5\.5\\%, of whichMemoRepairattains5\.1%5\.1\\%at Cost0\.620\.62, withΔ\\DeltaTask within0\.050\.05task points of*Repair all*\. Correction and migration admit larger repair surfaces:MemoRepairrepublishes33\.1%33\.1\\%and17\.0%17\.0\\%ofD\+\(F\)D^\{\+\}\(F\)at Cost0\.760\.76and0\.640\.64respectively, again within0\.070\.07task points of the*Repair all*ceiling\.
MethodDeletionCorrectionMigrationLeakStaleRep\.CostΔ\\DeltaTaskLeakStaleRep\.CostΔ\\DeltaTaskLeakStaleRep\.CostΔ\\DeltaTaskToolBenchNo action100100––\-2\.78\\text\{\-\}2\.78100100––\-2\.78\\text\{\-\}2\.78100100––\-5\.61\\text\{\-\}5\.61Remove all000\.00\.00\-1\.96\\text\{\-\}1\.96000\.00\.00\-1\.96\\text\{\-\}1\.96000\.00\.00\-4\.47\\text\{\-\}4\.47Repair all0024\.61\.00\-1\.54¯\\underline\{\\mathbf\{\\text\{\-\}1\.54\}\}0048\.81\.00\-1\.05¯\\underline\{\\mathbf\{\\text\{\-\}1\.05\}\}0038\.71\.00\-2\.98¯\\underline\{\\mathbf\{\\text\{\-\}2\.98\}\}Barrier\+Greedy0020\.30\.72\-1\.68\\text\{\-\}1\.680041\.20\.62\-1\.20\\text\{\-\}1\.200031\.40\.76\-3\.28\\text\{\-\}3\.28MemoRepair0022\.40\.61\-1\.60\\text\{\-\}1\.600045\.10\.57\-1\.11\\text\{\-\}1\.110036\.50\.66\-3\.06\\text\{\-\}3\.06MemoryArenaNo action100100––\-4\.44\\text\{\-\}4\.44100100––\-4\.44\\text\{\-\}4\.44100100––\-3\.86\\text\{\-\}3\.86Remove all000\.00\.00\-4\.00\\text\{\-\}4\.00000\.00\.00\-4\.00\\text\{\-\}4\.00000\.00\.00\-3\.58\\text\{\-\}3\.58Repair all005\.51\.00\-3\.74¯\\underline\{\\mathbf\{\\text\{\-\}3\.74\}\}0035\.61\.00\-2\.91¯\\underline\{\\mathbf\{\\text\{\-\}2\.91\}\}0018\.41\.00\-3\.18¯\\underline\{\\mathbf\{\\text\{\-\}3\.18\}\}Barrier\+Greedy004\.20\.70\-3\.90\\text\{\-\}3\.900028\.60\.83\-3\.17\\text\{\-\}3\.170014\.20\.71\-3\.34\\text\{\-\}3\.34MemoRepair005\.10\.62\-3\.79\\text\{\-\}3\.790033\.10\.76\-2\.98\\text\{\-\}2\.980017\.00\.64\-3\.23\\text\{\-\}3\.23Table 1:Internal repair policy comparison on ToolBench and MemoryArena\. Underlines mark the column best among methods that produce validated repairs\.Selector frontier\.Figure[3](https://arxiv.org/html/2605.07242#S3.F3)traces the repair–cost frontier obtained by sweepingλ\\lambdain the scalarized min\-cut selector on ToolBench across deletion, correction, and migration\. Each curve reports the mean over three seeds, with shaded bands and horizontal caps showing±1\\pm 1std on Rep and Cost respectively\. The endpoints recover*Remove all*atλ→∞\\lambda\\to\\inftyand*Repair all*atλ=0\\lambda=0, and increasingλ\\lambdashifts the operating point toward lower cost and lower validated publication\. The configuration used in Table[1](https://arxiv.org/html/2605.07242#S3.T1)fixesλ=0\.3\\lambda=0\.3, which sits at the knee of all three curves: the marginal slopeΔRep/ΔCost\\Delta\\mathrm\{Rep\}/\\Delta\\mathrm\{Cost\}falls by a factor of four to seven past this point while still recovering91\.191\.1–94\.3%94\.3\\%of the*Repair all*ceiling at Cost between0\.570\.57and0\.660\.66\.
### 3\.2Cascade\-Unaware Memory System Comparison
Table[2](https://arxiv.org/html/2605.07242#S3.T2)evaluates six external memory systems under their own update and retrieval semantics\. These systems are cascade\-unaware: none of them implements withdrawal barriers, validated successor publication, or predecessor\-closed repair selection\. They therefore serve as stale\-resistance reference points rather than repair\-policy baselines\. A value of0in Rep\. indicates that the system performs no validated successor\-publication operation rather than a failed repair attempt; Cost is undefined because these systems do not execute the repair operators associated with thecic\_\{i\}terms in our selector\.
Across ToolBench and MemoryArena the six systems reduce exposure relative to*No action*, yet a large fraction of stale descendants remains servable\. On ToolBench they leave69\.8%69\.8\\%to93\.1%93\.1\\%of the affected cascade leakable and92\.4%92\.4\\%to99\.7%99\.7\\%of post\-event actions stale; on MemoryArena the corresponding ranges are73\.9%73\.9\\%to94\.3%94\.3\\%and93\.2%93\.2\\%to99\.6%99\.6\\%\. The associatedΔ\\DeltaTask remains within0\.240\.24points of*No action*on ToolBench and within0\.370\.37points on MemoryArena, indicating that retrieval freshness alone does not provide withdrawal or validated successor semantics\. TheMemoRepairrow is included as an explicit cascade\-repair reference under complete influence provenance\.
SystemDeletionCorrectionMigrationLeakStaleRep\.CostΔ\\DeltaTaskLeakStaleRep\.CostΔ\\DeltaTaskLeakStaleRep\.CostΔ\\DeltaTaskToolBenchNo action100100––\-2\.78\\text\{\-\}2\.78100100––\-2\.78\\text\{\-\}2\.78100100––\-5\.61\\text\{\-\}5\.61Mem074\.693\.80–\-2\.61\\text\{\-\}2\.6180\.295\.60–\-2\.66\\text\{\-\}2\.6691\.398\.90–\-5\.46\\text\{\-\}5\.46Zep/Graphiti69\.892\.40–\-2\.54\\text\{\-\}2\.5476\.594\.10–\-2\.59\\text\{\-\}2\.5989\.198\.40–\-5\.39\\text\{\-\}5\.39MemR385\.298\.60–\-2\.72\\text\{\-\}2\.7287\.498\.90–\-2\.74\\text\{\-\}2\.7492\.699\.60–\-5\.48\\text\{\-\}5\.48A\-MEM72\.192\.70–\-2\.56\\text\{\-\}2\.5679\.495\.00–\-2\.63\\text\{\-\}2\.6390\.698\.70–\-5\.42\\text\{\-\}5\.42O\-Mem88\.799\.40–\-2\.73\\text\{\-\}2\.7388\.999\.60–\-2\.74\\text\{\-\}2\.7493\.199\.70–\-5\.51\\text\{\-\}5\.51TierMem88\.499\.20–\-2\.72\\text\{\-\}2\.7288\.699\.50–\-2\.74\\text\{\-\}2\.7493\.099\.70–\-5\.50\\text\{\-\}5\.50MemoRepair0022\.40\.61\-1\.60¯\\underline\{\\mathbf\{\\text\{\-\}1\.60\}\}0045\.10\.57\-1\.11¯\\underline\{\\mathbf\{\\text\{\-\}1\.11\}\}0036\.50\.66\-3\.06¯\\underline\{\\mathbf\{\\text\{\-\}3\.06\}\}MemoryArenaNo action100100––\-4\.44\\text\{\-\}4\.44100100––\-4\.44\\text\{\-\}4\.44100100––\-3\.86\\text\{\-\}3\.86Mem078\.694\.80–\-4\.31\\text\{\-\}4\.3182\.496\.10–\-4\.18\\text\{\-\}4\.1891\.798\.00–\-3\.72\\text\{\-\}3\.72Zep/Graphiti73\.993\.20–\-4\.25\\text\{\-\}4\.2578\.194\.60–\-4\.07\\text\{\-\}4\.0788\.497\.30–\-3\.65\\text\{\-\}3\.65MemR386\.798\.30–\-4\.39\\text\{\-\}4\.3988\.298\.70–\-4\.31\\text\{\-\}4\.3193\.599\.10–\-3\.79\\text\{\-\}3\.79A\-MEM75\.493\.60–\-4\.27\\text\{\-\}4\.2780\.395\.20–\-4\.12\\text\{\-\}4\.1290\.597\.60–\-3\.68\\text\{\-\}3\.68O\-Mem89\.699\.50–\-4\.42\\text\{\-\}4\.4290\.199\.60–\-4\.36\\text\{\-\}4\.3694\.399\.50–\-3\.83\\text\{\-\}3\.83TierMem89\.299\.30–\-4\.41\\text\{\-\}4\.4189\.899\.50–\-4\.35\\text\{\-\}4\.3593\.999\.40–\-3\.82\\text\{\-\}3\.82MemoRepair005\.10\.62\-3\.79¯\\underline\{\\mathbf\{\\text\{\-\}3\.79\}\}0033\.10\.76\-2\.98¯\\underline\{\\mathbf\{\\text\{\-\}2\.98\}\}0017\.00\.64\-3\.23¯\\underline\{\\mathbf\{\\text\{\-\}3\.23\}\}Table 2:Stale\-resistance comparison with memory systems without cascade repair\. For external systems, Rep\.=0=0means no validated successor publication and Cost is undefined;MemoRepairis included only as a cascade repair reference\.
### 3\.3Parametric Repair
Figure[2](https://arxiv.org/html/2605.07242#S3.F2)ablates the two repair levels\. Without repair, the LoRA checkpoint hasStale=100\\text\{Stale\}=100,FSP=8\.7\\text\{FSP\}=8\.7, andΔTask=−7\.42\\Delta\\text\{Task\}=\-7\.42, indicating both visible stale serving and parametric persistence of invalidated support\. Param\-only LUNE raises FSP to85\.785\.7and recoversΔTask\\Delta\\text\{Task\}to−3\.00\-3\.00, but leaves Stale at83\.483\.4because materialized descendants remain visible\. Cascade\-onlyMemoRepairdrives Stale to0atΔTask=−5\.22\\Delta\\text\{Task\}=\-5\.22, yet leaves FSP near baseline at12\.512\.5because model weights are not updated\. The two repair levels fix different failure modes: parameter\-side repair reduces stale influence inside the neural skill, while cascade\-side repair prevents stale materialized artifacts from remaining visible\. Composing them reachesStale=0\\text\{Stale\}=0,FSP=86\.4\\text\{FSP\}=86\.4, andΔTask=−1\.86\\Delta\\text\{Task\}=\-1\.86, improvingΔTask\\Delta\\text\{Task\}by5\.565\.56points over no repair and by1\.141\.14and3\.363\.36points over LUNE only and cascade only respectively\. The composed run is costed as a single pipeline, so shared replay or sandbox work is charged once; its Cost is0\.740\.74, below the independent\-run sum0\.42\+0\.38=0\.800\.42\+0\.38=0\.80\.
Figure 2:Parametric\-repair pipeline ablation on the ToolBench neural\-skill subset\.Table[3](https://arxiv.org/html/2605.07242#S3.T3)compares parameter\-level operators inside the sameMemoRepairpipeline\. All operators use the same\(𝖥𝗈𝗋i\(e\),𝖱𝖾𝖿i\(e\)\)\(\\mathsf\{For\}\_\{i\}\(e\),\\mathsf\{Ref\}\_\{i\}\(e\)\)partitions and validation oracle\. Full retraining reachesFSP=91\.8\\text\{FSP\}=91\.8andVal\. pass=92\.5\\text\{Val\.\\,pass\}=92\.5at8\.4×8\.4\\timesrelative compute, serving as a high\-compute quality ceiling\. Among practical operators, LUNE gives the best observed compute–quality tradeoff: it has the highest RU \(93\.493\.4\) and Val\. pass \(84\.984\.9\), while BLUR attains the highest FSP \(88\.288\.2\) at3\.7×3\.7\\timesrelative compute\. NPO, SimNPO, and RMU trail in this setting \(FSP<82\\text\{FSP\}<82,Val\. pass<76\\text\{Val\.\\,pass\}<76\)\. We use LUNE as the default𝗉𝖺𝗋𝖺𝗆\\mathsf\{param\}operator inMemoRepair; BLUR is a higher\-compute alternative when maximizing FSP is the priority\.
OperatorFSP%↑\\uparrowRU%↑\\uparrowVal\. pass%↑\\uparrowRel\. comp\.↓\\downarrowFull retrain91\.896\.292\.58\.4×8\.4\\timesSBU86\.992\.883\.61\.8×1\.8\\timesNPO81\.788\.972\.41\.3×1\.3\\timesSimNPO80\.490\.775\.81\.2×1\.2\\timesRMU78\.991\.171\.61\.5×1\.5\\timesBLUR88\.291\.982\.13\.7×3\.7\\timesLUNE86\.493\.484\.91\.0ׯ\\underline\{\\mathbf\{1\.0\\times\}\}OOO87\.492\.583\.21\.4×1\.4\\timesTable 3:Parameter\-level repair operators insideMemoRepair\. Full retrain uses retained support𝖱𝖾𝖿i\(e\)\\mathsf\{Ref\}\_\{i\}\(e\)\. Rel\. comp\. is wall\-clock compute normalized to LUNE, not the Cost metric\.

Figure 3:Min\-cut repair–cost frontier on ToolBench, traced by sweepingλ\\lambda\.
### 3\.4Ablations and Robustness
TypeSettingLeak%↓\\downarrowStale\-use%↓\\downarrowRepair time rel\.WithdrawalNo barrier100\.0100\.00\.52×0\.52\\timesMemoRepair0\.00\.01\.00×1\.00\\timesValidationSchema\-only0\.088\.6–Task\-regression\-only0\.029\.8–FullValidatei\\mathrm\{Validate\}\_\{i\}0\.00\.0–Provenancepdrop=0\.005p\_\{\\mathrm\{drop\}\}=0\.0058\.69\.4–pdrop=0\.010p\_\{\\mathrm\{drop\}\}=0\.01017\.719\.7–pdrop=0\.020p\_\{\\mathrm\{drop\}\}=0\.02034\.238\.0–pdrop=0\.050p\_\{\\mathrm\{drop\}\}=0\.05061\.868\.5–ConcurrencyOverlapping events0\.00\.01\.86×1\.86\\timesReplicaPartition \+ rejoin0\.00\.01\.12×1\.12\\timesTable 4:Component ablations and robustness checks\. Withdrawal rows report repair\-interval exposure \(not directly comparable to the post\-repair Leak/Stale in Tables[1](https://arxiv.org/html/2605.07242#S3.T1)–[2](https://arxiv.org/html/2605.07242#S3.T2)\); Validation rows report the oracle’s false\-pass rate in the Stale\-use column\. Provenance, Concurrency, and Replica share theMemoRepairbaseline\.Table[4](https://arxiv.org/html/2605.07242#S3.T4)reports component ablations and robustness checks\. The withdrawal barrier trades a0\.52×0\.52\\timestime saving for zero stale exposure during repair: removing it leaves the cascade servable throughout \(Leak=Stale=100\\text\{Leak\}=\\text\{Stale\}=100\), an availability/correctness choice that no barrier\-free design can avoid\. The validation ablation isolates two failure modes: schema\-only and task\-regression\-only validation each miss what the other catches \(88\.6%88\.6\\%and29\.8%29\.8\\%stale\-use\), and only the composed oracle reaches zero\. Provenance fidelity is leveraged: dropping1%1\\%of influence edges yields17\.7%17\.7\\%Leak \(≈18×\\approx 18\\timesamplification\), a graceful linear degradation that identifies complete influence provenance as the main system\-level invariant behind the contract guarantee\. Overlapping events and partition\+rejoin add1\.86×1\.86\\timesand1\.12×1\.12\\timesrepair time while preservingLeak=Stale=0\\text\{Leak\}=\\text\{Stale\}=0\.
## 4Related Work
#### Agent memory and tool\-derived state\.
LLM agents increasingly maintain state beyond the immediate context window\. Generative Agents and Reflexion store memories or reflections for later decisions, while MemGPT and A\-MEM study mechanisms for managing long\-term agent memory across interactions\(Park et al\.,[2023](https://arxiv.org/html/2605.07242#bib.bib17); Shinn et al\.,[2023](https://arxiv.org/html/2605.07242#bib.bib23); Packer et al\.,[2024](https://arxiv.org/html/2605.07242#bib.bib16); Xu et al\.,[2025](https://arxiv.org/html/2605.07242#bib.bib29)\)\. Skill\-oriented agents further show that past experience can be distilled into reusable procedures or skills\(Mi et al\.,[2026](https://arxiv.org/html/2605.07242#bib.bib15)\)\. A related source of persistent state appears in tool\-use systems, where API calls, arguments, outputs, cached responses, and traces become reusable workflow artifacts\(Qin et al\.,[2023](https://arxiv.org/html/2605.07242#bib.bib20); Guo et al\.,[2024](https://arxiv.org/html/2605.07242#bib.bib11)\)\. These works motivate the memory substrate considered here\.MemoRepairstudies a later stage in the lifecycle: after an artifact has already been materialized, a deletion, correction, or migration can make its descendants stale, requiring explicit withdrawal and repair\.
#### Unlearning and memory privacy\.
Machine unlearning removes or suppresses the influence of deleted data in learned models\(Cao and Yang,[2015](https://arxiv.org/html/2605.07242#bib.bib2); Guo et al\.,[2020](https://arxiv.org/html/2605.07242#bib.bib10); Xu et al\.,[2023](https://arxiv.org/html/2605.07242#bib.bib28)\)\. Recent work extends forgetting to agent settings, where information may persist in behavior, retrieval, or memory access\(Wang et al\.,[2026](https://arxiv.org/html/2605.07242#bib.bib25)\)\. Privacy studies of LLM\-agent memory similarly show that stored interactions can remain exposed through memory retrieval\(Wang et al\.,[2025a](https://arxiv.org/html/2605.07242#bib.bib26)\)\. These lines address model\-side, retriever\-side, or access\-side influence\.MemoRepairaddresses a separate store\-level failure mode: summaries, cached outputs, prompt skills, chain procedures, and neural\-skill artifacts may remain visible even after the source artifact has been updated or removed\.
#### Provenance, view maintenance, and selection\.
Database provenance and view maintenance provide tools for relating source data to derived state and propagating updates through materialized views\(Buneman et al\.,[2001](https://arxiv.org/html/2605.07242#bib.bib1); Cheney et al\.,[2009](https://arxiv.org/html/2605.07242#bib.bib4); Ceri and Widom,[1991](https://arxiv.org/html/2605.07242#bib.bib3)\)\. Agent memory requires additional machinery because descendants are heterogeneous artifacts, repair operators may fail validation, and publishing one successor may require other repaired successors to be available first\. Our selector uses the classical maximum\-closure/min\-cut connection\(Picard,[1976](https://arxiv.org/html/2605.07242#bib.bib19)\)\. The contribution is not a new graph\-cut algorithm, but the cascade\-repair contract that turns successor publication in agent memory into a predecessor\-closed selection problem\.
## 5Limitations and Conclusion
#### Limitations\.
MemoRepairrelies on complete influence provenance: each durable artifact must record the artifacts used to derive it\. Missing edges can leave affected descendants outside the withdrawn cascade\. This assumption is visible in Table[4](https://arxiv.org/html/2605.07242#S3.T4): dropping1%1\\%of influence edges yields17\.7%17\.7\\%Leak, with similar scaling for smallpdropp\_\{\\mathrm\{drop\}\}\. The method also depends on the coverage of the validation suite\. A successor that passes the implemented checks is not guaranteed to be semantically correct in all future contexts\. For neural skills,MemoRepairinherits the guarantees and failure modes of the chosen parameter\-level repair operator; the store\-level contract should therefore not be read as a claim of exact parameter erasure\.
#### Conclusion\.
We studied cascade repair for the visible derived state of agentic memory\. When a source artifact is deleted, corrected, or migrated, downstream summaries, cached outputs, procedures, and skills may remain servable with stale support\.MemoRepairaddresses this post\-update state by withdrawing the affected cascade, constructing successors from retained support and staged repaired predecessors, and republishing only successors that pass validation and satisfy predecessor closure\. For a fixed repair–cost tradeoff, the publication problem reduces to maximum\-weight predecessor closure and is solved by a single min\-cut\. Under complete influence provenance, the withdrawal barrier gives store\-level stale\-serving isolation, while the selector recovers most of the repair\-all publication ceiling at lower repair\-operator cost\.
## References
- Buneman et al\. \[2001\]Peter Buneman, Sanjeev Khanna, and Wang Chiew Tan\.Why and Where: A Characterization of Data Provenance\.In*Proceedings of the 8th International Conference on Database Theory*, ICDT ’01, page 316–330, Berlin, Heidelberg, 2001\. Springer\-Verlag\.ISBN 3540414568\.
- Cao and Yang \[2015\]Yinzhi Cao and Junfeng Yang\.Towards Making Systems Forget with Machine Unlearning\.In*Proceedings of the 2015 IEEE Symposium on Security and Privacy*, SP ’15, page 463–480, USA, 2015\. IEEE Computer Society\.ISBN 9781467369497\.doi:10\.1109/SP\.2015\.35\.URL[https://doi\.org/10\.1109/SP\.2015\.35](https://doi.org/10.1109/SP.2015.35)\.
- Ceri and Widom \[1991\]Stefano Ceri and Jennifer Widom\.Deriving Production Rules for Incremental View Maintenance\.In*Proceedings of the 17th International Conference on Very Large Data Bases*, VLDB ’91, page 577–589, San Francisco, CA, USA, 1991\. Morgan Kaufmann Publishers Inc\.ISBN 1558601503\.
- Cheney et al\. \[2009\]James Cheney, Laura Chiticariu, and Wang\-Chiew Tan\.Provenance in Databases: Why, How, and Where\.*Found\. Trends Databases*, 1\(4\):379–474, April 2009\.ISSN 1931\-7883\.doi:10\.1561/1900000006\.URL[https://doi\.org/10\.1561/1900000006](https://doi.org/10.1561/1900000006)\.
- Chhikara et al\. \[2025\]Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav\.Mem0: Building Production\-Ready AI Agents with Scalable Long\-Term Memory\.*arXiv preprint arXiv:2504\.19413*, 2025\.
- Du et al\. \[2025\]Xingbo Du, Loka Li, Duzhen Zhang, and Le Song\.MemR3: Memory Retrieval via Reflective Reasoning for LLM Agents\.*arXiv preprint arXiv:2512\.20237*, 2025\.
- Fan et al\. \[2024\]Chongyu Fan, Jiancheng Liu, Licong Lin, Jinghan Jia, Ruiqi Zhang, Song Mei, and Sijia Liu\.Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning\.In*Neurips Safe Generative AI Workshop 2024*, 2024\.URL[https://openreview\.net/forum?id=pVACX02m0p](https://openreview.net/forum?id=pVACX02m0p)\.
- Gao et al\. \[2024\]Chongyang Gao, Lixu Wang, Kaize Ding, Chenkai Weng, Xiao Wang, and Qi Zhu\.On Large Language Model Continual Unlearning\.*arXiv preprint arXiv:2407\.10223*, 2024\.
- Green et al\. \[2007\]Todd J\. Green, Grigoris Karvounarakis, and Val Tannen\.Provenance Semirings\.In*Proceedings of the Twenty\-Sixth ACM SIGMOD\-SIGACT\-SIGART Symposium on Principles of Database Systems*, PODS ’07, page 31–40, New York, NY, USA, 2007\. Association for Computing Machinery\.ISBN 9781595936851\.doi:10\.1145/1265530\.1265535\.URL[https://doi\.org/10\.1145/1265530\.1265535](https://doi.org/10.1145/1265530.1265535)\.
- Guo et al\. \[2020\]Chuan Guo, Tom Goldstein, Awni Hannun, and Laurens Van Der Maaten\.Certified Data Removal From Machine Learning Models\.In*Proceedings of the 37th International Conference on Machine Learning*, ICML’20\. JMLR\.org, 2020\.
- Guo et al\. \[2024\]Zhicheng Guo, Sijie Cheng, Hao Wang, Shihao Liang, Yujia Qin, Peng Li, Zhiyuan Liu, Maosong Sun, and Yang Liu\.StableToolBench: Towards Stable Large\-Scale Benchmarking on Tool Learning of Large Language Models\.In Lun\-Wei Ku, Andre Martins, and Vivek Srikumar, editors,*Findings of the Association for Computational Linguistics: ACL 2024*, pages 11143–11156, Bangkok, Thailand, August 2024\. Association for Computational Linguistics\.doi:10\.18653/v1/2024\.findings\-acl\.664\.URL[https://aclanthology\.org/2024\.findings\-acl\.664/](https://aclanthology.org/2024.findings-acl.664/)\.
- He et al\. \[2026\]Zexue He, Yu Wang, Churan Zhi, Yuanzhe Hu, Tzu\-Ping Chen, Lang Yin, Ze Chen, Tong Arthur Wu, Siru Ouyang, Zihan Wang, et al\.MemoryArena: Benchmarking Agent Memory in Interdependent Multi\-Session Agentic Tasks\.*arXiv preprint arXiv:2602\.16313*, 2026\.
- Li et al\. \[2024\]Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D\. Li, Ann\-Kathrin Dombrowski, Shashwat Goel, Gabriel Mukobi, Nathan Helm\-Burger, Rassin Lababidi, Lennart Justen, Andrew B\. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Ariel Herbert\-Voss, Cort B\. Breuer, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Lin, Adam A\. Hunt, Justin Tienken\-Harder, Kevin Y\. Shih, Kemper Talley, John Guan, Ian Steneker, David Campbell, Brad Jokubaitis, Steven Basart, Stephen Fitz, Ponnurangam Kumaraguru, Kallol Krishna Karmakar, Uday Tupakula, Vijay Varadharajan, Yan Shoshitaishvili, Jimmy Ba, Kevin M\. Esvelt, Alexandr Wang, and Dan Hendrycks\.The WMDP benchmark: measuring and reducing malicious use with unlearning\.In*Proceedings of the 41st International Conference on Machine Learning*, ICML’24\. JMLR\.org, 2024\.
- Liu et al\. \[2025\]Yezi Liu, Hanning Chen, Wenjun Huang, Yang Ni, and Mohsen Imani\.LUNE: Efficient LLM Unlearning via LORA Fine\-Tuning with Negative Examples\.*arXiv preprint arXiv:2512\.07375*, 2025\.
- Mi et al\. \[2026\]Qirui Mi, Zhijian Ma, Mengyue Yang, Haoxuan Li, Yisen Wang, Haifeng Zhang, and Jun Wang\.Skill\-Pro: Learning Reusable Skills from Experience via Non\-Parametric PPO for LLM Agents, 2026\.URL[https://arxiv\.org/abs/2602\.01869](https://arxiv.org/abs/2602.01869)\.
- Packer et al\. \[2024\]Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G\. Patil, Ion Stoica, and Joseph E\. Gonzalez\.MemGPT: Towards LLMs as Operating Systems, 2024\.URL[https://arxiv\.org/abs/2310\.08560](https://arxiv.org/abs/2310.08560)\.
- Park et al\. \[2023\]Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S\. Bernstein\.Generative Agents: Interactive Simulacra of Human Behavior\.In*Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology*, UIST ’23, New York, NY, USA, 2023\. Association for Computing Machinery\.ISBN 9798400701320\.doi:10\.1145/3586183\.3606763\.URL[https://doi\.org/10\.1145/3586183\.3606763](https://doi.org/10.1145/3586183.3606763)\.
- Patil et al\. \[2024\]Shishir G\. Patil, Tianjun Zhang, Xin Wang, and Joseph E\. Gonzalez\.Gorilla: Large Language Model Connected with Massive APIs\.In A\. Globerson, L\. Mackey, D\. Belgrave, A\. Fan, U\. Paquet, J\. Tomczak, and C\. Zhang, editors,*Advances in Neural Information Processing Systems*, volume 37, pages 126544–126565\. Curran Associates, Inc\., 2024\.doi:10\.52202/079017\-4020\.URL[https://proceedings\.neurips\.cc/paper\_files/paper/2024/file/e4c61f578ff07830f5c37378dd3ecb0d\-Paper\-Conference\.pdf](https://proceedings.neurips.cc/paper_files/paper/2024/file/e4c61f578ff07830f5c37378dd3ecb0d-Paper-Conference.pdf)\.
- Picard \[1976\]Jean\-Claude Picard\.Maximal Closure of a Graph and Applications to Combinatorial Problems\.*Manage\. Sci\.*, 22\(11\):1268–1272, July 1976\.ISSN 0025\-1909\.doi:10\.1287/mnsc\.22\.11\.1268\.URL[https://doi\.org/10\.1287/mnsc\.22\.11\.1268](https://doi.org/10.1287/mnsc.22.11.1268)\.
- Qin et al\. \[2023\]Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al\.ToolLLM: Facilitating Large Language Models to Master 16000\+ Real\-World APIs\.In*The twelfth international conference on learning representations*, 2023\.
- Rasmussen et al\. \[2025\]Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef\.Zep: A Temporal Knowledge Graph Architecture for Agent Memory\.*arXiv preprint arXiv:2501\.13956*, 2025\.
- Reisizadeh et al\. \[2026\]Hadi Reisizadeh, Jinghan Jia, Zhiqi Bu, Bhanukiran Vinzamuri, Anil Ramakrishna, Kai\-Wei Chang, Volkan Cevher, Sijia Liu, and Mingyi Hong\.BLUR: A Bi\-Level Optimization Approach for LLM Unlearning\.In Vera Demberg, Kentaro Inui, and Lluís Marquez, editors,*Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics \(Volume 1: Long Papers\)*, pages 7043–7058, Rabat, Morocco, March 2026\. Association for Computational Linguistics\.ISBN 979\-8\-89176\-380\-7\.doi:10\.18653/v1/2026\.eacl\-long\.331\.URL[https://aclanthology\.org/2026\.eacl\-long\.331/](https://aclanthology.org/2026.eacl-long.331/)\.
- Shinn et al\. \[2023\]Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao\.Reflexion: Language Agents with Verbal Reinforcement Learning\.In*Proceedings of the 37th International Conference on Neural Information Processing Systems*, NIPS ’23, Red Hook, NY, USA, 2023\. Curran Associates Inc\.
- ToolBench \[2023\]ToolBench\.Toolllama\-7b\-lora\-v1\.[https://huggingface\.co/ToolBench/ToolLLaMA\-7b\-LoRA\-v1](https://huggingface.co/ToolBench/ToolLLaMA-7b-LoRA-v1), 2023\.Released LoRA checkpoint for ToolLLaMA trained with ToolBench data\.
- Wang et al\. \[2026\]Bin Wang, Fan Wang, Pingping Wang, Jinyu Cong, Yang Yu, Yilong Yin, Zhongyi Han, and Benzheng Wei\.Agentic Unlearning: When LLM Agent Meets Machine Unlearning, 2026\.URL[https://arxiv\.org/abs/2602\.17692](https://arxiv.org/abs/2602.17692)\.
- Wang et al\. \[2025a\]Bo Wang, Weiyi He, Shenglai Zeng, Zhen Xiang, Yue Xing, Jiliang Tang, and Pengfei He\.Unveiling Privacy Risks in LLM Agent Memory\.In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,*Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\)*, pages 25241–25260, Vienna, Austria, July 2025a\. Association for Computational Linguistics\.ISBN 979\-8\-89176\-251\-0\.doi:10\.18653/v1/2025\.acl\-long\.1227\.URL[https://aclanthology\.org/2025\.acl\-long\.1227/](https://aclanthology.org/2025.acl-long.1227/)\.
- Wang et al\. \[2025b\]Piaohong Wang, Motong Tian, Jiaxian Li, Yuan Liang, Yuqing Wang, Qianben Chen, Tiannan Wang, Zhicong Lu, Jiawei Ma, Yuchen Eleanor Jiang, et al\.O\-mem: Omni Memory System for Personalized, Long Horizon, Self\-Evolving Agents\.*arXiv preprint arXiv:2511\.13593*, 2025b\.
- Xu et al\. \[2023\]Heng Xu, Tianqing Zhu, Lefeng Zhang, Wanlei Zhou, and Philip S\. Yu\.Machine Unlearning: A Survey\.*ACM Comput\. Surv\.*, 56\(1\), August 2023\.ISSN 0360\-0300\.doi:10\.1145/3603620\.URL[https://doi\.org/10\.1145/3603620](https://doi.org/10.1145/3603620)\.
- Xu et al\. \[2025\]Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang\.A\-Mem: Agentic memory for LLM agents\.In*Advances in Neural Information Processing Systems*, 2025\.
- Zhang et al\. \[2024\]Ruiqi Zhang, Licong Lin, Yu Bai, and Song Mei\.Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning\.*arXiv preprint arXiv:2404\.05868*, 2024\.
- Zhu et al\. \[2026\]Qiming Zhu, Shunian Chen, Rui Yu, Zhehao Wu, and Benyou Wang\.From Lossy to Verified: A Provenance\-Aware Tiered Memory for Agents\.*arXiv preprint arXiv:2602\.17913*, 2026\.Similar Articles
From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents
Researchers introduce Memora, a benchmark that evaluates LLMs’ ability to retain, update, and forget long-term user memories over weeks-to-months conversations, revealing frequent reuse of obsolete memories.
MemEvoBench: Benchmarking Memory MisEvolution in LLM Agents
MemEvoBench introduces the first benchmark for evaluating memory safety in LLM agents, measuring behavioral degradation from adversarial memory injection, noisy outputs, and biased feedback across QA and workflow tasks. The work reveals that memory evolution significantly contributes to safety failures and that static defenses are insufficient.
To Know is to Construct: Schema-Constrained Generation for Agent Memory
UnionPay researchers propose SCG-MEM, a schema-constrained generative memory architecture that eliminates structural hallucinations by forcing LLMs to decode only valid memory keys within a dynamic cognitive schema, outperforming dense-retrieval baselines on the LoCoMo benchmark.
MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval
MemReranker is a reasoning-aware reranking model family (0.6B/4B) designed for agent memory retrieval, addressing limitations in semantic similarity by incorporating LLM knowledge distillation for better temporal and causal reasoning.
MEME: Multi-entity & Evolving Memory Evaluation
The MEME benchmark evaluates AI memory systems across multiple entities and evolving conditions, revealing significant challenges in dependency reasoning that persist even with advanced retrieval techniques.