Tag
The author argues that AI agent memory should focus on pruning data rather than hoarding, drawing parallels to human memory types (sensory, short-term, long-term) and suggesting that modeling after human memory can reduce token usage while maintaining high-quality context.
This paper proposes MERIT, a dynamic multi-horizon memory retrieval framework for interactive text-to-SQL agents that uses episode-level and turn-level memory with learned retrieval policies optimized via reinforcement learning and a process reward model for dense rewards. Experiments on BIRD-Interact and Spider2-Snow show that MERIT outperforms static and single-horizon dynamic baselines in success rate while requiring fewer interaction turns.
The author tested Persistent Sage's long-term memory feature, finding it accurately recalled personal facts like colorblindness and a spouse's name from a week earlier without explicit prompting, demonstrating effective persistent memory for AI agents.
Eywa is a provenance-grounded long-term memory architecture for AI agents that stores immutable source evidence, validates extracted memories, and achieves strong benchmark results on LoCoMo, LongMemEval-S, and BEAM.
Introduces TaskMem, a reinforcement-learning-based framework for dynamic memorization in multimodal agents, achieving accuracy improvements of 6.3%, 7.0%, and 5.3% on streaming video benchmarks.
This paper rethinks the data foundations for long-term AI agent memory, arguing that current database paradigms fall short. It introduces Governed Evolving Memory (GEM), a formalization with state-level operators and correctness conditions, and presents a prototype called MemState built on a property graph backend.
The article highlights the problem of AI memory becoming unreliable after six months, with contradictions and drifted summaries, and questions whether the industry is focusing on adding more storage rather than improving maintainability.
The author shares challenges in building a personal health agent for ongoing use, focusing on long-term memory management and reliability issues including hallucination when synthesizing data from multiple sources over time.
The author reflects on the limitations of using flat markdown files for long-term agent memory, which leads to prompt debt as the memory grows, and advocates for graph-based memory representations that retrieve relevant context dynamically.
Introduces how to configure Obsidian as Codex's cross-project long-term memory repository, allowing Codex to persistently save important information and avoid forgetting.
DeferMem introduces a long-term memory framework for LLM agents that decouples memory into high-recall candidate retrieval and query-conditioned evidence distillation using reinforcement learning, achieving state-of-the-art QA accuracy with faster runtime.
Mercury's Second Brain introduces a dual-layer memory architecture (conscious and subconscious) for AI agents, enabling better continuity, memory lifecycle management, and retrieval over long sessions.
This article introduces the open-source EverOS project, which provides long-term memory capabilities for AI coding assistants like Claude Code. It automatically saves conversation history and retrieves memories in new conversations. Additionally, it includes multiple application examples.
The author discusses using Bluedot's AI meeting data as long-term memory for agents via Claude MCP integration, enabling querying of historical meeting transcripts and action items.
Cognition introduces Devin Auto-Triage, a new feature for Devin that adds long-term memory and autonomous monitoring of bugs, alerts, and incidents, with the ability to investigate and propose fixes or pull requests.
DimMem introduces a dimensional memory framework for LLM agents that represents memories as atomic, typed units with explicit fields, achieving state-of-the-art accuracy on LoCoMo-10 and LongMemEval-S while reducing token costs by 24%.
The article discusses the common failures of current AI memory solutions in production, such as stale facts, summary drift, and vendor lock-in, suggesting that the real bottleneck is memory governance rather than retrieval.
Tencent 开源了 TencentDB Agent Memory,一个采用符号化短期记忆和分层长期记忆的代理记忆项目,可显著降低 token 使用量并提升任务成功率。
MemEye is a visual-centric evaluation framework that assesses multimodal agent memory by measuring visual evidence granularity and retrieval complexity across 8 life-scenario tasks, revealing that current architectures struggle to preserve fine-grained visual details and reason about state changes over time.
MemLens is a new benchmark for evaluating memory capabilities in large vision-language models through multi-session conversations. It compares long-context and memory-augmented approaches, revealing limitations in both and motivating hybrid architectures.