How I wired a Graph DB on top of my vector store to scale 1K agents for 2 months, because vector search alone fails when user preferences change over time.
A detailed architectural guide for building long-running AI agents that handle changing user preferences over time by combining a vector store, graph DB, and temporal edges instead of overwriting data.
Most agentic memory patterns are naturally designed around short-lived chat sessions. The focus there is straightforward: track the active thread, keep a basic user profile, and reset the context once the conversation closes. But when you operate long-running AI agents in production over extended periods, the architectural needs completely change. These agents don't get reset. They work for weeks on end, hand off tasks between execution loops, and face a massive real-world hurdle: **facts change over time.** If a user uses Gmail today and switches to Outlook next month, the agent needs to track both. It has to know which one is current, exactly when the switch happened, and it cannot act like the old truth is still valid. Standard vector database similarity scores do not understand chronological decay or truth overrides. Memory in a long-running agent isn't a single database. It requires distinct layers running in parallel across multiple DB types. After dealing with this problem for a while, here is the 7-layer architecture I landed on to handle it: **1. Working Memory** The active per-turn scratchpad. I enforce a strict execution wall here so temporary reasoning or transient tokens never leak into long-term storage. **2. Conversation Memory** Immediate thread history, managed by a dynamic summarizer middleware before it crosses token context thresholds. **3. Episodic Memory** A time-indexed log of past runs, especially the failed ones. This gives the agent continuity of its own execution history so it doesn't repeat past mistakes. **4. Semantic Memory** Slow-changing, deterministic facts. I split this into a human-editable markdown file (for explicit user configurations) and an LLM-extracted graph. If they disagree, the human notebook explicitly wins. **5. Knowledge Graph** The relational structure. While semantic memory holds the raw facts, this layer maps the structural edges between entities. A vector store treats data like isolated islands; the graph connects them contextually. **6. Procedural Memory** Behavior and execution mechanics, not facts. This stores the specific habits, tool-use skills, and workflow patterns the agent reproduces across its automation loops. **7. Checkpoints** State snapshots. This is the difference between a pod crash starting a 40-minute multi-step task over from scratch, or resuming smoothly at minute 33. # The Core Breakthrough: Temporal Edges The biggest win was to **stop deleting or overwriting data** when preferences or environments change. Instead, every extracted fact in the semantic and graph layers needs a `valid_at` and `invalid_at` timestamp. When today’s session contradicts yesterday’s state, the pipeline invalidates the old edge instead of erasing it. This preserves a clean, immutable audit trail and allows the LLM to logically reason about *when* a preference or infrastructure shifted.
A new open-source tool called Writ uses a hybrid retrieval pipeline with BM25, ONNX vectors, and Neo4j graph traversals to provide context rules for AI coding agents, reducing token bloat by 726x and enforcing plan approval via bash hooks.
The article warns that using shared vector databases with only logical isolation (metadata filters) for multi-tenant AI agents can silently cause data breaches, and advocates for physical isolation per user to guarantee zero data bleed.
HelixDB is a new open-source database built in Rust that combines vector, graph, and other data models into a single engine, backed by Y Combinator. It aims to replace separate vector, graph, and application databases for AI stacks, offering native vector search, graph traversal, and MCP support.
A developer discusses limitations in current AI agent memory systems and proposes a new memory layer tool with episode storage and replay debugging, seeking community validation.
A developer shares real-world experiences with AI orchestration frameworks (LangGraph, CrewAI, AutoGen), noting trade-offs between ease of prototyping and production reliability, and asks the community about handling failures, human-in-the-loop, and token costs.