Tag
An article discussing how AI agents often repeat mistakes because their memory retrieval mechanisms prioritize semantic similarity over effectiveness, leading to flawed decision-making.
ConvMemory v3 introduces a validity context layer that detects outdated or superseded conversational memories using a target-conditioned dual-evidence gate, achieving high accuracy on synthetic benchmarks and zero-shot transfer to role binding tasks.
PrecisionMemBench is an open-source benchmark that tests retrieval precision as a strict unit test, revealing that popular memory frameworks like Mem0, Zep, and Hindsight have very low precision (0.05-0.09) and rely on LLMs to compensate. The article argues for zero-tolerance hard fail on precision for production memory infrastructure.
This paper proposes a unified framework for memory access and selection in long-context dialogue systems, using Bayes factors to quantify the utility of historical turns for modeling changing user preferences. Experiments show it outperforms embedding-based retrieval on preference-intensive tasks.
A preprint on SSRN presents PHI // DRIFT, a cognitive middleware architecture for AI companions with persistent internal state and salience-weighted memory retrieval, claiming 14.8% more context per prompt versus cosine-only RAG on consumer hardware.
H-Mem is a novel memory mechanism for LLM-based agents that uses a hybrid structure combining a temporal and semantic tree with a knowledge graph to model memory evolution and improve retrieval, achieving state-of-the-art performance on QA benchmarks.
A novel memory retrieval system inspired by episodic memory theory achieves state-of-the-art 96.4% top-50 accuracy on the LongMemEval benchmark using Gemini Flash, outperforming larger Pro-based baselines by isolating retrieval quality from model capability.
This paper introduces PYTHALAB-MERA, an external controller for frozen local LLMs that uses validation-grounded memory and retrieval to improve coding agent performance. It demonstrates superior success rates in strict validation tasks compared to self-refinement baselines by leveraging execution feedback and temporal difference learning.
HAGE introduces a weighted multi-relational memory framework that enables query-conditioned traversal over unified relational memory graphs, improving long-horizon reasoning accuracy through adaptive memory retrieval and reinforcement learning-based optimization.
The author introduces Tiro, an open-source agentic memory and retrieval framework designed to solve long-term context drift in LLM agents by providing modular, inspectable memory lanes for sessions, documents, and operational state.
UnionPay researchers propose SCG-MEM, a schema-constrained generative memory architecture that eliminates structural hallucinations by forcing LLMs to decode only valid memory keys within a dynamic cognitive schema, outperforming dense-retrieval baselines on the LoCoMo benchmark.