EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents

Hugging Face Daily Papers Papers

Summary

EvolveMem introduces a self-evolving memory architecture for LLM agents that optimizes retrieval configurations through LLM-powered diagnosis and iterative research cycles, achieving significant performance improvements on benchmarks like LoCoMo and MemBench.

Long-term memory is essential for LLM agents that operate across multiple sessions, yet existing memory systems treat retrieval infrastructure as fixed: stored content evolves while scoring functions, fusion strategies, and answer-generation policies remain frozen at deployment. We argue that truly adaptive memory requires co-evolution at two levels: the stored knowledge and the retrieval mechanism that queries it. We present EvolveMem, a self-evolving memory architecture that exposes its full retrieval configuration as a structured action space optimized by an LLM-powered diagnosis module. In each evolution round, the module reads per-question failure logs, identifies root causes, and proposes targeted configuration adjustments; a guarded meta-analyzer applies them with automatic revert-on-regression and explore-on-stagnation safeguards. This closed-loop self-evolution realizes an AutoResearch process: the system autonomously conducts iterative research cycles on its own architecture, replacing manual configuration tuning. Starting from a minimal baseline, the process converges autonomously, discovering effective retrieval strategies including entirely new configuration dimensions not present in the original action space. On LoCoMo, EvolveMem outperforms the strongest baseline by 25.7% relative and achieves a 78.0% relative improvement over the minimal baseline. On MemBench, EvolveMem exceeds the strongest baseline by 18.9% relative. Evolved configurations transfer across benchmarks with positive rather than catastrophic transfer, indicating that the self-evolution process captures universal retrieval principles rather than benchmark-specific heuristics. Code is available at https://github.com/aiming-lab/SimpleMem.
Original Article
View Cached Full Text

Cached at: 05/15/26, 04:23 AM

Paper page - EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents

Source: https://huggingface.co/papers/2605.13941

Abstract

EvolveMem enables adaptive memory systems for LLM agents through self-evolving retrieval mechanisms that autonomously optimize configuration parameters via diagnostic modules and iterative research cycles.

Long-term memoryis essential forLLM agentsthat operate across multiple sessions, yet existing memory systems treatretrieval infrastructureas fixed: stored content evolves while scoring functions, fusion strategies, and answer-generation policies remain frozen at deployment. We argue that truly adaptive memory requires co-evolution at two levels: the stored knowledge and the retrieval mechanism that queries it. We present EvolveMem, aself-evolving memoryarchitecture that exposes its full retrieval configuration as astructured action spaceoptimized by anLLM-powered diagnosis module. In each evolution round, the module reads per-question failure logs, identifies root causes, and proposes targeted configuration adjustments; aguarded meta-analyzerapplies them with automatic revert-on-regression and explore-on-stagnation safeguards. This closed-loop self-evolution realizes anAutoResearchprocess: the system autonomously conductsiterative research cycleson its own architecture, replacing manual configuration tuning. Starting from a minimal baseline, the process converges autonomously, discovering effectiveretrieval strategiesincluding entirely new configuration dimensions not present in the original action space. On LoCoMo, EvolveMem outperforms the strongest baseline by 25.7% relative and achieves a 78.0% relative improvement over the minimal baseline. On MemBench, EvolveMem exceeds the strongest baseline by 18.9% relative. Evolved configurations transfer across benchmarks with positive rather than catastrophic transfer, indicating that the self-evolution process captures universal retrieval principles rather than benchmark-specific heuristics. Code is available at https://github.com/aiming-lab/SimpleMem.

View arXiv pageView PDFAdd to collection

Get this paper in your agent:

hf papers read 2605\.13941

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.13941 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.13941 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.13941 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

MemEvoBench: Benchmarking Memory MisEvolution in LLM Agents

arXiv cs.CL

MemEvoBench introduces the first benchmark for evaluating memory safety in LLM agents, measuring behavioral degradation from adversarial memory injection, noisy outputs, and biased feedback across QA and workflow tasks. The work reveals that memory evolution significantly contributes to safety failures and that static defenses are insufficient.

From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms

Hugging Face Daily Papers

This survey paper proposes an evolutionary framework for LLM agent memory mechanisms, categorizing their development into three stages: storage, reflection, and experience. It analyzes core drivers such as long-range consistency and continual learning to provide design principles for next-generation agents.

HeLa-Mem: Hebbian Learning and Associative Memory for LLM Agents

arXiv cs.CL

HeLa-Mem is a bio-inspired memory architecture for LLM agents that models memory as a dynamic graph using Hebbian learning dynamics, featuring episodic and semantic memory stores to improve long-term coherence. Experiments on LoCoMo show superior performance across question categories while using fewer context tokens.