@servasyy_ai: https://x.com/servasyy_ai/status/2057463627255570937
Summary
Tencent Cloud database team open-sourced TencentDB Agent Memory, a runtime system that solves the context degradation problem in long tasks for AI agents, compressing short-term context into the memory system through three-layer backtracking and dynamic compression, and integrating a long-term memory pipeline. This is a landmark attempt for AI agent memory systems moving from 'database' to 'runtime'.
View Cached Full Text
Cached at: 05/21/26, 03:45 PM
What Did Tencent Open Source? A Runtime System Solving the AI Agent Memory Problem
While everyone else was doing “Vector Database + RAG”, the Tencent Cloud Database team delivered a different answer.
Why Another Memory System Deserves Attention?
There are already many AI Agent memory solutions on the market: LangChain Memory, MemGPT, Zep, Mem0… Most of them address the same question: How to make an Agent remember conversation history?
But TencentDB Agent Memory asks a different question: How to prevent the current task from exploding the context?
This difference may seem subtle, but it is critical.
The Real Pain Point: It’s Not “Forgetting”, It’s “Blowing Up”
Imagine asking Claude to help with a complex task: analyze 10 PDFs, generate a report, and create a presentation.
Each tool call returns results:
-
Read PDF → returns 5000 tokens
-
Search materials → returns 3000 tokens
-
Generate chart → returns 2000 tokens
After 10 steps, the context window is filled with tool outputs. The model starts “forgetting”, losing sight of the original task goal, or simply hitting token limits and throwing an error.
This is the “context degradation” problem for long tasks.
Most memory systems approach this by storing conversation history in a vector database and retrieving it when needed. But this doesn’t solve context explosion during an ongoing task.
Tencent’s solution: Formally incorporate short-term context compression into the boundaries of a memory system.
Core Innovation: Three-Level Rollback + Dynamic Compression
TencentDB Agent Memory’s Offload mechanism designs a three-level structure:
Level 1: Mermaid Task Diagram
Compress a long task into a flowchart, annotating the status of each node (done/doing/todo). The model sees not a pile of tool outputs but “where the task is at”.
This diagram dynamically updates as the task progresses, always reflecting the latest state.
Level 2: JSONL Index
Each tool call is recorded as an index entry: tool name, parameters, summary, citation location. Like a table of contents for the task, telling “what content is where” without taking up much space.
Level 3: Original Citations
Full tool outputs are saved in separate files. When needed, the system can “drill down” to see the original text, but by default it is not loaded into the context.
Three-Level Compression Strategy: From Mild to Emergency
The system automatically selects compression strength based on token usage:
Mild: Replace tool results with summaries, highest information retention
Aggressive: Delete old message segments, inject Mermaid diagram to compensate for orientation
Emergency: Hard fallback, quickly trim tokens below the safety line
This is not simply “deleting messages”, but a complete chain from compressed view to original evidence.
L1.5: The Overlooked Critical Layer
Between short-term compression and long-term memory, Tencent adds a “task lifecycle judgment” layer:
- Has the current task ended?
- Is this a long task?
- Is the new request a continuation of an old task?
- Does the historical task diagram need to be mounted?
Without this layer, Mermaid diagrams and offload results would mix across tasks.
This is the missing piece in most memory systems — they don’t know where the “task boundary” is.
Long-term Memory: Four-Level Pipeline
In addition to short-term compression, TencentDB Agent Memory has a complete long-term memory system:
L0: Raw Conversation Capture
No judgment, just preservation. Atomic checkpoints prevent concurrent dirty writes.
L1: Structured Memory Extraction
Extract three types of atomic memories from conversations:
- persona: user preferences and habits
- episodic: important events and decisions
- instruction: behavioral rules
Each memory has type, priority, source citation, and time metadata.
L2: Scene Narrative Integration
Integrate atomic memories into coherent narrative documents — not a checklist, not a log. Like organizing scattered photos into an album.
L3: Persona Profile
Incrementally evolve user profiles, not full recomputation each round. “Four-level deep scan”: base anchor → interest graph → interaction protocol → cognitive core.
Key Design: Responsibilities do not mix across layers.
L0 preservation → L1 abstraction → L2 organization → L3 rollup. More governance than “chunk everything + vector”.
Recall Injection: Stable Zone vs Dynamic Zone
When the model needs to call memory, the system injects into two zones:
Stable Zone (end of system prompt):
- Persona profile
- Scene navigation
- Tool guide
This part changes infrequently, benefiting from prompt cache acceleration.
Dynamic Zone (prefix of user prompt):
- Structured memory relevant to the current round
May differ each round, but does not affect cache benefits.
Two tools are provided simultaneously:
tdai_memory_search: search structured memory (abstraction layer)tdai_conversation_search: search raw conversations (evidence layer)
Separating abstraction layer and evidence layer — this is key to improving memory reliability.
Engineering Awareness: The Devil is in the Details
TencentDB Agent Memory is not an academic demo; it’s a production-grade system. The code reveals many engineering details:
- Atomic checkpoint: prevents concurrent dirty writes
- tiktoken metering + WeakMap caching: no arbitrary compression
- MMD dynamic refresh: not a static summary
- orphan ref cleanup: prevents garbage file accumulation
- scene file limit: prevents entropy increase in mid-level knowledge structures
- warm-up threshold: more aggressive processing in early new sessions
- global mutex persona: prevents concurrent overwrites
- JSONL sanitize/validate: prevents real data corruption
These details determine whether the system can run stably in real scenarios.
Host Agnostic: Not Tied to Any Framework
Core logic is in TdaiCore (host-neutral facade), connected to different frameworks via adapters:
- OpenClaw uses OpenClawHostAdapter
- Hermes/Gateway uses StandaloneHostAdapter
This means you can plug it into any Agent framework.
Risks and Limitations
No system is perfect. TencentDB Agent Memory also has clear trade-offs:
Risk 1: Long system chain, high debugging complexity
Multiple stages from capture to recall. Any layer anomaly manifests as “memory is wrong”, but it’s hard to pinpoint where the issue lies.
Risk 2: Highly dependent on prompt quality
L1 scene segmentation, L2 scene integration, L3 profile updates, L1.5 task judgment are all prompt-driven. Change the model, behavior may drift.
Risk 3: L1 is the key to success or failure
L1 extraction errors → L2 builds narratives on wrong atoms → L3 builds profiles on wrong narratives. Garbage in → hierarchy out.
Risk 4: Accumulated abstraction bias
Scene and Persona are prone to “locally distorted but long-term self-consistent” issues — harder for users to detect.
Risk 5: Sensitive to host message protocol
assistant tool_use / tool_result pairing; deleting incorrectly easily causes error 400.
5 Most Worthwhile Design References
If you are building your own Agent memory system, these 5 points are most worth referencing (sorted by ROI):
- Tool result offload — directly solves long task context degradation
- L1.5 task continuation judgment — improves cross-round task continuity
- L0 evidence layer / L1 abstraction layer separation — improves memory reliability
- L2 scene blocks — improves long-term semantic organization ability
- Recall stable/dynamic split — improves prompt quality and cache benefits
What Not to Blindly Copy
- Don’t copy the entire system 1:1 (too heavy, complex state machine)
- Don’t let the memory system take over current prompt compression (responsibility confusion)
- Don’t over-rely on scene/persona as the sole source of truth (prone to accumulated bias)
Conclusion: The Next Phase of Memory Systems
The value of TencentDB Agent Memory is not “yet another memory store”, but that it proposes a new problem framework:
Memory systems are not just about “storing history”, but also about “managing current context”.
While everyone else was doing vector retrieval, Tencent chose a more difficult but more complete path: integrating short-term compression, medium-term organization, and long-term profiling into a runtime system.
This is a landmark attempt for AI Agent memory systems evolving from “database” to “runtime”.
Source Code: Tencent/TencentDB-Agent-Memory (Open Source)
Recommended Reading Path:
- Offload section:
src/offload/index.ts→hooks/after-tool-call.ts→mmd-injector.ts - Memory pipeline:
src/core/tdai-core.ts→record/l1-extractor.ts→scene/scene-extractor.ts
If you are building long-task Agents, this project is worth in-depth study.
For more dry goods, follow 👇
Similar Articles
@GoSailGlobal: 腾讯也下场了 做agent memory 代理记忆项目 开源链接: https://github.com/Tencent/TencentDB-Agent-Memory…
Tencent 开源了 TencentDB Agent Memory,一个采用符号化短期记忆和分层长期记忆的代理记忆项目,可显著降低 token 使用量并提升任务成功率。
@wsl8297: When running complex tasks with AI agents, the most painful thing is often not that the model isn't strong enough, but that as the conversation gets longer, the context starts to overflow. You have to keep filling in background details, re-explaining the process, plus the redundant logs from tool calls — tokens just gush out like a broken pipe. Recently, I saw TencentDB Agent Memory open-sourced by Tencent...
Tencent has open-sourced TencentDB Agent Memory, which solves the AI agent long-context overflow problem through hierarchical memory management (symbolic short-term memory + hierarchical long-term memory). Benchmarks show token consumption reduced by up to 61% and task success rate improved by over 50%.
@TencentAI_News: We spent 6 months on one problem: agents losing context in long sessions. Ended up building and open-sourcing an agent …
Tencent open-sourced TencentDB-Agent Memory, a symbolic short-term and layered long-term memory system for AI agents, which cuts token usage by up to 61% and improves persona accuracy from 48% to 76%.
@berryxia: Agent memory is incredibly competitive! I have to say, the more people join this track, the better it gets! The Tencent AI team spent a full 6 months tackling just one problem: AI agents frequently dropping context in long conversations. They ended up building a complete memory system and open-sourced it directly. After reading their sharing, my biggest takeaway is...
Tencent AI has open-sourced an Agent memory system that significantly improves token efficiency and agent consistency in long dialogues through three methods: real-time context compression, Mermaid task maps, and Persona memory. Token consumption is reduced by 61%, and persona consistency jumps from 48% to 76%.
tencentdb agent memory is great for compression, but i'm not sure compression is the whole problem
Comparison of TencentDB's agent memory, which excels at compressing messy run histories for token savings, versus the Memos local plugin, which focuses on turning execution history into reusable habits and long-term learning through feedback loops.