@servasyy_ai: https://x.com/servasyy_ai/status/2057463627255570937

X AI KOLs Timeline Tools

Summary

Tencent Cloud database team open-sourced TencentDB Agent Memory, a runtime system that solves the context degradation problem in long tasks for AI agents, compressing short-term context into the memory system through three-layer backtracking and dynamic compression, and integrating a long-term memory pipeline. This is a landmark attempt for AI agent memory systems moving from 'database' to 'runtime'.

https://t.co/CKIAnRdJtp
Original Article
View Cached Full Text

Cached at: 05/21/26, 03:45 PM

What Did Tencent Open Source? A Runtime System Solving the AI Agent Memory Problem

While everyone else was doing “Vector Database + RAG”, the Tencent Cloud Database team delivered a different answer.

Why Another Memory System Deserves Attention?

There are already many AI Agent memory solutions on the market: LangChain Memory, MemGPT, Zep, Mem0… Most of them address the same question: How to make an Agent remember conversation history?

But TencentDB Agent Memory asks a different question: How to prevent the current task from exploding the context?

This difference may seem subtle, but it is critical.

The Real Pain Point: It’s Not “Forgetting”, It’s “Blowing Up”

Imagine asking Claude to help with a complex task: analyze 10 PDFs, generate a report, and create a presentation.

Each tool call returns results:

  • Read PDF → returns 5000 tokens

  • Search materials → returns 3000 tokens

  • Generate chart → returns 2000 tokens

After 10 steps, the context window is filled with tool outputs. The model starts “forgetting”, losing sight of the original task goal, or simply hitting token limits and throwing an error.

This is the “context degradation” problem for long tasks.

Most memory systems approach this by storing conversation history in a vector database and retrieving it when needed. But this doesn’t solve context explosion during an ongoing task.

Tencent’s solution: Formally incorporate short-term context compression into the boundaries of a memory system.

Core Innovation: Three-Level Rollback + Dynamic Compression

TencentDB Agent Memory’s Offload mechanism designs a three-level structure:

Level 1: Mermaid Task Diagram

Compress a long task into a flowchart, annotating the status of each node (done/doing/todo). The model sees not a pile of tool outputs but “where the task is at”.

This diagram dynamically updates as the task progresses, always reflecting the latest state.

Level 2: JSONL Index

Each tool call is recorded as an index entry: tool name, parameters, summary, citation location. Like a table of contents for the task, telling “what content is where” without taking up much space.

Level 3: Original Citations

Full tool outputs are saved in separate files. When needed, the system can “drill down” to see the original text, but by default it is not loaded into the context.

Three-Level Compression Strategy: From Mild to Emergency

The system automatically selects compression strength based on token usage:

Mild: Replace tool results with summaries, highest information retention

Aggressive: Delete old message segments, inject Mermaid diagram to compensate for orientation

Emergency: Hard fallback, quickly trim tokens below the safety line

This is not simply “deleting messages”, but a complete chain from compressed view to original evidence.

L1.5: The Overlooked Critical Layer

Between short-term compression and long-term memory, Tencent adds a “task lifecycle judgment” layer:

  • Has the current task ended?
  • Is this a long task?
  • Is the new request a continuation of an old task?
  • Does the historical task diagram need to be mounted?

Without this layer, Mermaid diagrams and offload results would mix across tasks.

This is the missing piece in most memory systems — they don’t know where the “task boundary” is.

Long-term Memory: Four-Level Pipeline

In addition to short-term compression, TencentDB Agent Memory has a complete long-term memory system:

L0: Raw Conversation Capture

No judgment, just preservation. Atomic checkpoints prevent concurrent dirty writes.

L1: Structured Memory Extraction

Extract three types of atomic memories from conversations:

  • persona: user preferences and habits
  • episodic: important events and decisions
  • instruction: behavioral rules

Each memory has type, priority, source citation, and time metadata.

L2: Scene Narrative Integration

Integrate atomic memories into coherent narrative documents — not a checklist, not a log. Like organizing scattered photos into an album.

L3: Persona Profile

Incrementally evolve user profiles, not full recomputation each round. “Four-level deep scan”: base anchor → interest graph → interaction protocol → cognitive core.

Key Design: Responsibilities do not mix across layers.

L0 preservation → L1 abstraction → L2 organization → L3 rollup. More governance than “chunk everything + vector”.

Recall Injection: Stable Zone vs Dynamic Zone

When the model needs to call memory, the system injects into two zones:

Stable Zone (end of system prompt):

  • Persona profile
  • Scene navigation
  • Tool guide

This part changes infrequently, benefiting from prompt cache acceleration.

Dynamic Zone (prefix of user prompt):

  • Structured memory relevant to the current round

May differ each round, but does not affect cache benefits.

Two tools are provided simultaneously:

  • tdai_memory_search: search structured memory (abstraction layer)
  • tdai_conversation_search: search raw conversations (evidence layer)

Separating abstraction layer and evidence layer — this is key to improving memory reliability.

Engineering Awareness: The Devil is in the Details

TencentDB Agent Memory is not an academic demo; it’s a production-grade system. The code reveals many engineering details:

  • Atomic checkpoint: prevents concurrent dirty writes
  • tiktoken metering + WeakMap caching: no arbitrary compression
  • MMD dynamic refresh: not a static summary
  • orphan ref cleanup: prevents garbage file accumulation
  • scene file limit: prevents entropy increase in mid-level knowledge structures
  • warm-up threshold: more aggressive processing in early new sessions
  • global mutex persona: prevents concurrent overwrites
  • JSONL sanitize/validate: prevents real data corruption

These details determine whether the system can run stably in real scenarios.

Host Agnostic: Not Tied to Any Framework

Core logic is in TdaiCore (host-neutral facade), connected to different frameworks via adapters:

  • OpenClaw uses OpenClawHostAdapter
  • Hermes/Gateway uses StandaloneHostAdapter

This means you can plug it into any Agent framework.

Risks and Limitations

No system is perfect. TencentDB Agent Memory also has clear trade-offs:

Risk 1: Long system chain, high debugging complexity

Multiple stages from capture to recall. Any layer anomaly manifests as “memory is wrong”, but it’s hard to pinpoint where the issue lies.

Risk 2: Highly dependent on prompt quality

L1 scene segmentation, L2 scene integration, L3 profile updates, L1.5 task judgment are all prompt-driven. Change the model, behavior may drift.

Risk 3: L1 is the key to success or failure

L1 extraction errors → L2 builds narratives on wrong atoms → L3 builds profiles on wrong narratives. Garbage in → hierarchy out.

Risk 4: Accumulated abstraction bias

Scene and Persona are prone to “locally distorted but long-term self-consistent” issues — harder for users to detect.

Risk 5: Sensitive to host message protocol

assistant tool_use / tool_result pairing; deleting incorrectly easily causes error 400.

5 Most Worthwhile Design References

If you are building your own Agent memory system, these 5 points are most worth referencing (sorted by ROI):

  • Tool result offload — directly solves long task context degradation
  • L1.5 task continuation judgment — improves cross-round task continuity
  • L0 evidence layer / L1 abstraction layer separation — improves memory reliability
  • L2 scene blocks — improves long-term semantic organization ability
  • Recall stable/dynamic split — improves prompt quality and cache benefits

What Not to Blindly Copy

  • Don’t copy the entire system 1:1 (too heavy, complex state machine)
  • Don’t let the memory system take over current prompt compression (responsibility confusion)
  • Don’t over-rely on scene/persona as the sole source of truth (prone to accumulated bias)

Conclusion: The Next Phase of Memory Systems

The value of TencentDB Agent Memory is not “yet another memory store”, but that it proposes a new problem framework:

Memory systems are not just about “storing history”, but also about “managing current context”.

While everyone else was doing vector retrieval, Tencent chose a more difficult but more complete path: integrating short-term compression, medium-term organization, and long-term profiling into a runtime system.

This is a landmark attempt for AI Agent memory systems evolving from “database” to “runtime”.

Source Code: Tencent/TencentDB-Agent-Memory (Open Source)

Recommended Reading Path:

  • Offload section: src/offload/index.tshooks/after-tool-call.tsmmd-injector.ts
  • Memory pipeline: src/core/tdai-core.tsrecord/l1-extractor.tsscene/scene-extractor.ts

If you are building long-task Agents, this project is worth in-depth study.

For more dry goods, follow 👇

Similar Articles

@wsl8297: When running complex tasks with AI agents, the most painful thing is often not that the model isn't strong enough, but that as the conversation gets longer, the context starts to overflow. You have to keep filling in background details, re-explaining the process, plus the redundant logs from tool calls — tokens just gush out like a broken pipe. Recently, I saw TencentDB Agent Memory open-sourced by Tencent...

X AI KOLs Timeline

Tencent has open-sourced TencentDB Agent Memory, which solves the AI agent long-context overflow problem through hierarchical memory management (symbolic short-term memory + hierarchical long-term memory). Benchmarks show token consumption reduced by up to 61% and task success rate improved by over 50%.

@berryxia: Agent memory is incredibly competitive! I have to say, the more people join this track, the better it gets! The Tencent AI team spent a full 6 months tackling just one problem: AI agents frequently dropping context in long conversations. They ended up building a complete memory system and open-sourced it directly. After reading their sharing, my biggest takeaway is...

X AI KOLs Timeline

Tencent AI has open-sourced an Agent memory system that significantly improves token efficiency and agent consistency in long dialogues through three methods: real-time context compression, Mermaid task maps, and Persona memory. Token consumption is reduced by 61%, and persona consistency jumps from 48% to 76%.