EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents
Summary
EvolveMem introduces a self-evolving memory architecture for LLM agents that optimizes retrieval configurations through LLM-powered diagnosis and iterative research cycles, achieving significant performance improvements on benchmarks like LoCoMo and MemBench.
View Cached Full Text
Cached at: 05/15/26, 04:23 AM
Paper page - EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents
Source: https://huggingface.co/papers/2605.13941
Abstract
EvolveMem enables adaptive memory systems for LLM agents through self-evolving retrieval mechanisms that autonomously optimize configuration parameters via diagnostic modules and iterative research cycles.
Long-term memoryis essential forLLM agentsthat operate across multiple sessions, yet existing memory systems treatretrieval infrastructureas fixed: stored content evolves while scoring functions, fusion strategies, and answer-generation policies remain frozen at deployment. We argue that truly adaptive memory requires co-evolution at two levels: the stored knowledge and the retrieval mechanism that queries it. We present EvolveMem, aself-evolving memoryarchitecture that exposes its full retrieval configuration as astructured action spaceoptimized by anLLM-powered diagnosis module. In each evolution round, the module reads per-question failure logs, identifies root causes, and proposes targeted configuration adjustments; aguarded meta-analyzerapplies them with automatic revert-on-regression and explore-on-stagnation safeguards. This closed-loop self-evolution realizes anAutoResearchprocess: the system autonomously conductsiterative research cycleson its own architecture, replacing manual configuration tuning. Starting from a minimal baseline, the process converges autonomously, discovering effectiveretrieval strategiesincluding entirely new configuration dimensions not present in the original action space. On LoCoMo, EvolveMem outperforms the strongest baseline by 25.7% relative and achieves a 78.0% relative improvement over the minimal baseline. On MemBench, EvolveMem exceeds the strongest baseline by 18.9% relative. Evolved configurations transfer across benchmarks with positive rather than catastrophic transfer, indicating that the self-evolution process captures universal retrieval principles rather than benchmark-specific heuristics. Code is available at https://github.com/aiming-lab/SimpleMem.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2605\.13941
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.13941 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.13941 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.13941 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Scaling Self-Evolving Agents via Parametric Memory
Researchers from Alibaba/Qwen and Peking University introduce TMEM, a self-evolving parametric memory framework that uses online LoRA weight updates to let LLM agents genuinely learn from experience within a single episode, rather than relying solely on prompt-space memory. TMEM outperforms summary-based and retrieval-based baselines across multiple benchmarks including LoCoMo, LongMemEval-S, and CL-Bench.
EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments
EvoArena introduces a benchmark for evaluating LLM agents in dynamic environments with progressive updates across terminal, software, and social domains, while EvoMem proposes a patch-based memory paradigm that records structured evolution; experiments show current agents achieve only 39.6% accuracy on EvoArena, and EvoMem yields average gains of 1.5% on the benchmark and improvements on GAIA and LoCoMo.
MemEvoBench: Benchmarking Memory MisEvolution in LLM Agents
MemEvoBench introduces the first benchmark for evaluating memory safety in LLM agents, measuring behavioral degradation from adversarial memory injection, noisy outputs, and biased feedback across QA and workflow tasks. The work reveals that memory evolution significantly contributes to safety failures and that static defenses are insufficient.
From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms
This survey paper proposes an evolutionary framework for LLM agent memory mechanisms, categorizing their development into three stages: storage, reflection, and experience. It analyzes core drivers such as long-range consistency and continual learning to provide design principles for next-generation agents.
AutoMem: Automated Learning of Memory as a Cognitive Skill
AutoMem introduces a framework that automates learning of memory management as a trainable skill for LLMs, improving performance on long-horizon tasks by 2x-4x through optimizing memory structure and proficiency.