Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Papers with Code Trending 04/28/25, 01:46 AM Papers

Summary

Mem0 introduces a scalable memory-centric architecture using graph-based representations to improve long-term conversational coherence in LLMs, significantly reducing latency and token costs while outperforming existing memory systems.

Large Language Models (LLMs) have demonstrated remarkable prowess in generating contextually coherent responses, yet their fixed context windows pose fundamental challenges for maintaining consistency over prolonged multi-session dialogues. We introduce Mem0, a scalable memory-centric architecture that addresses this issue by dynamically extracting, consolidating, and retrieving salient information from ongoing conversations. Building on this foundation, we further propose an enhanced variant that leverages graph-based memory representations to capture complex relational structures among conversational elements. Through comprehensive evaluations on LOCOMO benchmark, we systematically compare our approaches against six baseline categories: (i) established memory-augmented systems, (ii) retrieval-augmented generation (RAG) with varying chunk sizes and k-values, (iii) a full-context approach that processes the entire conversation history, (iv) an open-source memory solution, (v) a proprietary model system, and (vi) a dedicated memory management platform. Empirical results show that our methods consistently outperform all existing memory systems across four question categories: single-hop, temporal, multi-hop, and open-domain. Notably, Mem0 achieves 26% relative improvements in the LLM-as-a-Judge metric over OpenAI, while Mem0 with graph memory achieves around 2% higher overall score than the base configuration. Beyond accuracy gains, we also markedly reduce computational overhead compared to full-context method. In particular, Mem0 attains a 91% lower p95 latency and saves more than 90% token cost, offering a compelling balance between advanced reasoning capabilities and practical deployment constraints. Our findings highlight critical role of structured, persistent memory mechanisms for long-term conversational coherence, paving the way for more reliable and efficient LLM-driven AI agents.

Original Article

View Cached Full Text

Cached at: 05/08/26, 08:40 AM

Paper page - Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Source: https://huggingface.co/papers/2504.19413

Abstract

Mem0, a memory-centric architecture with graph-based memory, enhances long-term conversational coherence in LLMs by efficiently extracting, consolidating, and retrieving information, outperforming existing memory systems in terms of accuracy and computational efficiency.

Large Language Models (LLMs) have demonstrated remarkable prowess in generating contextually coherent responses, yet their fixed context windows pose fundamental challenges for maintaining consistency over prolonged multi-session dialogues. We introduceMem0, a scalable memory-centric architecture that addresses this issue by dynamically extracting, consolidating, and retrievingsalient informationfrom ongoing conversations. Building on this foundation, we further propose an enhanced variant that leveragesgraph-based memoryrepresentations to capture complex relational structures amongconversational elements. Through comprehensive evaluations onLOCOMO benchmark, we systematically compare our approaches against six baseline categories: (i) establishedmemory-augmented systems, (ii) retrieval-augmented generation (RAG) with varying chunk sizes and k-values, (iii) a full-context approach that processes the entire conversation history, (iv) an open-sourcememory solution, (v) a proprietarymodel system, and (vi) a dedicated memory management platform. Empirical results show that our methods consistently outperform all existing memory systems across four question categories:single-hop,temporal,multi-hop, andopen-domain. Notably,Mem0achieves 26% relative improvements in theLLM-as-a-Judgemetric over OpenAI, whileMem0with graph memory achieves around 2% higher overall score than the base configuration. Beyond accuracy gains, we also markedly reduce computational overhead compared to full-context method. In particular,Mem0attains a 91% lower p95 latency and saves more than 90% token cost, offering a compelling balance between advancedreasoning capabilitiesand practical deployment constraints. Our findings highlight critical role of structured, persistent memory mechanisms for long-term conversational coherence, paving the way for more reliable and efficient LLM-driven AI agents.

View arXiv page View PDF Project page GitHub55kauto Add to collection

Get this paper in your agent:

hf papers read 2504\.19413

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2504.19413 in a model README.md to link it from this page.

Datasets citing this paper1

#### GloriaaaM/LLM-Agent-Harness-Survey Viewer• Updated23 days ago • 1 • 1.17k • 5

Spaces citing this paper1

Collections including this paper14

Browse 14 collections that include this paper

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Paper page - Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Abstract

Models citing this paper0

Datasets citing this paper1

Spaces citing this paper1

Collections including this paper14

Similar Articles

SimpleMem: Efficient Lifelong Memory for LLM Agents

Cognis: Context-Aware Memory for Conversational AI Agents

DMF: A Deterministic Memory Framework for Conversational AI Agents

G-Long: Graph-Enhanced Memory Management for Efficient Long-Term Dialogue Agents

RecMem: Recurrence-based Memory Consolidation for Efficient and Effective Long-Running LLM Agents

Submit Feedback

Similar Articles

SimpleMem: Efficient Lifelong Memory for LLM Agents

Cognis: Context-Aware Memory for Conversational AI Agents

DMF: A Deterministic Memory Framework for Conversational AI Agents

G-Long: Graph-Enhanced Memory Management for Efficient Long-Term Dialogue Agents

RecMem: Recurrence-based Memory Consolidation for Efficient and Effective Long-Running LLM Agents