Useful Memories Become Faulty When Continuously Updated by LLMs

Hugging Face Daily Papers 05/13/26, 12:00 AM Papers

llm memory agent consolidation episodic-memory arc-agi

Summary

A study finds that continuously updating consolidated memories in LLM-based agentic systems degrades performance, and that retaining raw episodic trajectories is more reliable. Experiments on ARC-AGI show that even GPT-5.4 fails more often after consolidation.

Learning from past experience benefits from two complementary forms of memory: episodic traces -- raw trajectories of what happened -- and consolidated abstractions distilled across many episodes into reusable, schema-like lessons. Recent agentic-memory systems pursue the consolidated form: an LLM rewrites past trajectories into a textual memory bank that it continuously updates with new interactions, promising self-improving agents without parameter updates. Yet we find that such consolidated memories produced by today's LLMs are often faulty even when derived from useful experiences. As consolidation proceeds, memory utility first rises, then degrades, and can fall below the no-memory baseline. More surprisingly, even when consolidating from ground-truth solutions, GPT-5.4 fails on 54% of a set of ARC-AGI problems it had previously solved without memory. We trace the regression to the consolidation step rather than the underlying experience: the same trajectories yield qualitatively different memories under different update schedules, and an episodic-only control that simply retains those trajectories remains competitive with the consolidators we test. In a controlled ARC-AGI Stream environment that exposes Retain, Delete, and Consolidate actions, agents preserve raw episodes by default and double the accuracy of their forced-consolidation counterparts; disabling consolidation entirely (episodic management only) matches this auto regime. Practically, robust agent memory should treat raw episodes as first-class evidence and gate consolidation explicitly rather than firing it after every interaction. Looking forward, reliable agentic memory will require LLMs that can consolidate without overwriting the evidence they depend on.

Original Article

View Cached Full Text

Cached at: 05/14/26, 04:17 AM

Paper page - Useful Memories Become Faulty When Continuously Updated by LLMs

Source: https://huggingface.co/papers/2605.12978 Published on May 13

Submitted byhttps://huggingface.co/shizhuo2

Dylanon May 13

Abstract

Recent agentic-memory systems that rely on consolidated memory from LLMs fail to improve performance and often degrade due to faulty consolidation, while preserving raw episodic trajectories maintains better accuracy.

Learning from past experience benefits from two complementary forms of memory:episodic traces-- raw trajectories of what happened -- andconsolidated abstractionsdistilled across many episodes into reusable, schema-like lessons. Recentagentic-memory systemspursue the consolidated form: an LLM rewrites past trajectories into a textual memory bank that it continuously updates with new interactions, promising self-improving agents without parameter updates. Yet we find that such consolidated memories produced by today’sLLMsare often faulty even when derived from useful experiences. As consolidation proceeds, memory utility first rises, then degrades, and can fall below the no-memory baseline. More surprisingly, even when consolidating from ground-truth solutions, GPT-5.4 fails on 54% of a set ofARC-AGIproblems it had previously solved without memory. We trace the regression to the consolidation step rather than the underlying experience: the same trajectories yield qualitatively different memories under different update schedules, and an episodic-only control that simply retains those trajectories remains competitive with the consolidators we test. In a controlledARC-AGIStream environment that exposes Retain, Delete, and Consolidate actions, agents preserve raw episodes by default and double the accuracy of their forced-consolidation counterparts; disabling consolidation entirely (episodic managementonly) matches this auto regime. Practically, robust agent memory should treat raw episodes as first-class evidence and gate consolidation explicitly rather than firing it after every interaction. Looking forward, reliable agentic memory will requireLLMsthat can consolidate without overwriting the evidence they depend on.

View arXiv page View PDF Add to collection

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.12978 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.12978 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.12978 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Useful Memories Become Faulty When Continuously Updated by LLMs

Paper page - Useful Memories Become Faulty When Continuously Updated by LLMs

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Useful Memories Become Faulty When Continuously Updated by LLMs

Useful memories become faulty when continuously updated by LLMs (30 minute read)

STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?

@rohanpaul_ai: New Illinois+ Tsinghua University and other labs study finds that LLM agents still have unreliable memory and that it c…

@dylan_works_: Wrote up something fun I’ve been poking at: when LLM agents repeatedly rewrite their own experiences into textual “less…

Submit Feedback

Similar Articles

Useful Memories Become Faulty When Continuously Updated by LLMs

Useful memories become faulty when continuously updated by LLMs (30 minute read)

STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?

@rohanpaul_ai: New Illinois+ Tsinghua University and other labs study finds that LLM agents still have unreliable memory and that it c…

@dylan_works_: Wrote up something fun I’ve been poking at: when LLM agents repeatedly rewrite their own experiences into textual “less…