Tag
The article critiques the current framing of agent memory as merely a storage problem, arguing that memories should have typed roles, freshness, and authority levels to prevent stale or incorrect information from being treated as gospel.
Explores the need for correction mechanisms in agent memory systems, going beyond storage to include source tracking, confidence levels, expiry, and audit trails.
The article critiques current AI memory systems as mere write-only logs that lack the ability to be corrected, updated, or traced to their source, arguing that true memory requires a governance layer.
An exploration of the components and design decisions behind agent memory libraries, clarifying the gap between cognitive science terminology and engineering implementation.
Introduces a four-condition diagnostic protocol to identify whether failures in long-context memory systems stem from write-side compression discarding evidence or retrieval-side missing stored information. The analysis reveals write-side gaps dominate for most baselines, motivating the proposed Expected Predictive Compression (EPC) method that improves preservation of relevant evidence.
This paper introduces PerMemBench, the first benchmark for evaluating personalized memory systems in LLM-based agents, and proposes a session-level storage gating framework that adapts memory policies to individual user contexts.
A comprehensive guide to memory systems for Hermes Agent, explaining the three-layer memory architecture and comparing various memory tools and providers.
The article warns that AI agents' memory systems prioritize recall over accuracy, leading to outdated or incorrect assumptions that are hard to trace or fix without resetting everything.
The article presents a practical critique of long-running LLM sessions, life-companion agents, and persistent memory systems, raising issues of privacy, cost, intent-loss, and maintainance. It proposes alternative solutions like issue-bound ephemeral session chains and prompt templates.
A comprehensive guide explaining the concept of AI operating systems as intelligent orchestration layers that coordinate workflows, memory, tools, and agents. It breaks down the architecture and how companies can build autonomous systems.
A tweet promoting a free breakdown of an AI Agent OS that connects over 300 skills, 500 agents, and 4 memory systems using just Claude Code and a simple file tree with 5 config files, claim to set up in 30 minutes.
MINTEval is a new benchmark for evaluating LLM agents and memory systems in continuously updated environments with frequent context changes. It shows that current systems perform poorly, with an average accuracy of 27.9% across representative systems.
Introduces Contract-Bounded Evidence Activation (CBEA) with Lexicographic Commitment Validation (LCV) to prevent runtime control failures in personalized language systems where systems make incorrect commitments despite having relevant context. Achieves zero failures within validator scope at 0.49–0.60 availability, significantly outperforming baselines.
RecMem is a recurrence-based memory consolidation method for long-running LLM agents that reduces token consumption by up to 87% while improving accuracy, by only invoking LLMs when semantically similar interactions recur.
Explains two memory patterns for AI agents: GBrain (queryable company wiki) and Lossless (full conversation recording), helping agents retain and retrieve facts across and within conversations.
The article highlights three common failure modes in production AI memory systems: outdated preferences persisting, sarcasm stored as literal, and summaries outliving their source facts. It argues that the AI memory industry lacks provenance, confidence scores, and versioning, creating a black-box problem that hinders debugging.
BOOKMARKS is a search-based memory framework for role-playing agents that actively maintains task-relevant story details through structured bookmarks, outperforming existing recurrent summarization methods.
This preprint introduces a method to inject emotion vectors into language models to simulate somatic markers, aiming to bridge the gap between semantic and episodic memory. The authors demonstrate that combining emotional echoes with semantic knowledge improves decision-making capabilities, replicating findings from human cognitive science.
Applied Compute introduces ACL-Wiki, a continual learning memory system built on their Context Engine that logs coding agent interactions from Cursor, Claude Code, and Codex to build an improving Contextbase, roughly doubling the Critical Memory Rate over two weeks. The system uses a Remember-Refine-Retrieve pipeline exposed via MCP server to give coding agents institutional memory that improves with use.
This paper introduces LifeDialBench, a novel benchmark for evaluating memory capabilities in continuous lifelog scenarios using wearable devices, and proposes an online evaluation protocol that enforces temporal causality. Key finding: sophisticated memory systems underperform simple RAG baselines, highlighting the importance of high-fidelity context preservation over lossy compression.