@wquguru: https://x.com/wquguru/status/2069641926752780384

X AI KOLs Timeline 06/24/26, 04:41 AM News

Summary

This article comprehensively reviews the complete architectural layering of AI Agent Memory as of mid-2026, including rule files, persistent profiles, historical recall, and evidence chains. It explains the storage methods, loading timings, and governance principles of different memory layers, emphasizing the key role of memory in helping agents achieve cross-session compounding work.

https://t.co/1nh9OAj9w0

Original Article

View Cached Full Text

Cached at: 06/24/26, 12:23 PM

Agent Memory Architecture Overview: From Rule Files, Session Retrieval to Reflection and Skill Accumulation

A few months ago I started using OpenClaw and discovered that the thing that excited me most about what it brings to the Agent industry, aside from scheduled tasks, was a seemingly ordinary file: MEMORY.md.

This curiosity drove me to poke around my local ~/.claude/ directory. What I found was that for the projects where I frequently use Claude Code with /loop, each one has its own corresponding memory file. They aren’t logs, READMEs, or session history:

Looking at the memory files from several projects together, a question naturally emerges: what exactly are they recording? Some say “must do this from now on”, some say “this is the current state of the project”, some say “stepped in this trap last time, don’t go there again”, some say “why we believe this conclusion”, and others say “the next loop picks up from here”.

Agent Memory has actually already diverged from “saving chat history” into a complete architecture. Rules, profiles, history, evidence, reflection, and skill accumulation each have their own storage methods, loading timing, and governance challenges. This article aims to clearly explain what this architecture looks like as of mid-2026.

Who is this article for:

Developers using or building Coding Agents, Research Agents, Personal AI Agents, or any LLM application that requires cross-session persistent work;
Technical decision-makers interested in agent architecture but haven’t figured out how to layer memory;
Heavy users of tools like Claude Code / Codex / OpenClaw / Hermes who want to understand what those memory files in their projects are actually doing.

1. Long Context Solves Current Tasks, Memory Solves Cross-Task Compounding

First, let’s address a point many people get wrong: Agent Memory is not a replacement for long context.

250K, 1M, or even longer context windows are now standard. Long context is certainly important — it allows the model to see more files, more logs, and more evidence within a single task, avoiding the information loss from frequent summarization.

But long context always solves the problem of “how much can fit in this single round”.

Memory solves another problem: When the agent wakes up for the next round, does it still remember why it did things that way last time?

The context window is the workbench — all materials needed for the current task are laid out on it. RAG and search are external knowledge bases called on demand. Memory is the persistent state layer that spans sessions, projects, and agents. The division of labor is clear.

A coding agent with only long context and no memory will hit the same test environment bug next week when it opens a new session. A research agent with only RAG can find past materials but won’t know which ones have been disproven or which sources are unreliable on a topic. A trading agent with only transcripts can review all logs but can’t distinguish which ones have been elevated to invariants versus which were just one-off occurrences.

The core value of Memory isn’t in “storing a lot”, but in layering the past: what should be resident, what should be searchable, what should be archived, and what should become reusable skills for the future.

2. Layer 1: Rule Memory — The Agent’s Working Constitution

The earliest widely used agent memory was actually adopted before any automatic memory system. It’s the rule file.

Claude Code calls it CLAUDE.md, Codex calls it AGENTS.md. Essentially, it’s a “working constitution”: how this project is built and tested, which directories are absolutely off-limits, which commands must be run in specific subdirectories, which code styles and commit rules are inviolable, and which business red lines are more important than completing the current task itself.

The advantages are obvious: readable, editable, auditable, and storable in Git. The team can review it, CI can check it, and the agent sees it every time it starts.

But it has clear boundaries. Rule files are suitable for things that are “stable long-term and should be followed every time”. They are not suitable for cramming in all historical details. Piling every bug, every experiment, and every log into CLAUDE.md will eventually turn the context into a pile of low-density noise.

The Claude Code official docs also clearly separate CLAUDE.md and auto memory: the former is human-written instructions and rules, the latter is learning that Claude accumulates based on corrections and preferences. Codex similarly emphasizes that rules the team must follow should be placed in AGENTS.md or repo documentation, while memories are just a local recall layer.

This establishes the first design principle:

Rules that must be followed should not exist only in automatic memory. They should be in versioned rule files.

Rule memory is the first layer of agent memory. It solves the problem of “do it this way from now on”.

3. Layer 2: Resident Profile — Things That Pay the Token Tax Every Round

After rules comes a more subtle category: profiles.

Hermes Agent’s design is distinctive in this regard. It has two built-in files: MEMORY.md stores the agent’s own high-density notes — environment facts, project conventions, learned experiences; USER.md stores the user profile — preferences, communication style, long-term expectations. These two files are injected directly into the system prompt at session start, with strict length limits: exceeding the limit doesn’t get silently compressed, it throws an error, forcing the agent to merge, replace, or delete content itself.

This design is restrained. But because it’s restrained, it’s effective.

Resident memory isn’t about having more. It pays a token tax every round, and the earlier its position in the prompt, the more it influences model judgment. If you cram a large amount of unedited history into resident memory, the agent becomes both expensive and confused — it’s like repeating every conversation from the last decade to someone, hoping they will extract something useful.

Resident memory should only hold three types of things: Identity — who this agent is, what its long-term responsibilities are; Preferences — what this user consistently likes and dislikes; Invariants — facts in the environment that hold true repeatedly and will certainly be useful next time.

My own garden memory is quite similar to this layer. It doesn’t record the entire process of writing each article, only facts like “public handle is wquguru, don’t use private account”, “AGENTS.md and .agents/skills are symlinks”, “garden deploy goes through Cloudflare Pages + Access” — facts that will certainly be useful next time.

Second design principle:

Resident memory should be short, hard, and high-density. History should not be resident. Only compressed identity, preferences, and invariants deserve residency.

Profile memory solves the problem of “with what identity and stable preferences should the agent continue working”.

4. Layer 3: Historical Recall — Most Memories Should Not Be Resident, But Must Be Findable

So if most history shouldn’t be resident, where does it go?

The answer is on-demand recall. Search it out when needed, let it sit quietly on disk when not.

Hermes uses SQLite FTS5 to save all CLI and messaging sessions, providing session_search. It retains the full original messages without the information loss that comes from summarization, and you can scroll back and forth within a session to see context.

OpenClaw designs the workspace more like a filesystem memory: MEMORY.md is the refined layer, memory/YYYY-MM-DD.md is daily notes, DREAMS.md stores offline thinking outputs. memory_search and memory_get handle recall, and with an embedding provider configured, it can also do hybrid vector + keyword search.

Codex’s Memories follow a similar path — turning stable preferences, workflows, tech stacks, project conventions, and known pitfalls from old threads into local memory files, brought into future tasks on demand.

A consensus is forming: Separate “resident memory” from “historical recall”. Resident memory is like an index page — very short. Historical recall is like a database — very large. Search is responsible for precisely pulling out local fragments from the database when needed.

Going back to the carrywatch in the initial image — its MEMORY.md is exactly an index page, a dozen lines long, each pointing to a topic file. binance-funding-interval-bug.md records root cause, impact scope, fix commit, verification method, and unresolved issues. clock-domain-and-health-freshness.md documents the full analysis of a counter-intuitive bug where on macOS, the monotonic clock freezes during sleep while the wall clock keeps advancing, causing “data stale for 6.6 hours but health status still green”.

Injecting such files into every prompt is a waste, but completely losing them means the agent has to start investigating from scratch next time. So this is best suited as on-demand recall historical memory.

Third design principle:

Historical memory should be searchable, traceable, and partially readable, not completely resident in the context.

Historical recall solves the problem of “what actually happened in the past, and why were those judgments made then”.

5. Layer 4: Evidence Chain and State Governance — Remembering the Conclusion Isn’t Enough, You Need to Remember the Why

The truly difficult kind of memory is remembering the source of a conclusion.

Agents are too easily inclined to treat a one-time summary as fact, a single guess as experience, and a temporary workaround as a permanent rule. Once memory becomes long-term, errors also become long-term — and they become the kind of errors that are harder to find the older they get, and harder to correct the more confident the system becomes.

This is why I’ve become increasingly concerned with the structure of memory files.

A qualified bug memory shouldn’t just say “fixed”. It should read like a miniature postmortem: what the problem is, where the evidence is (the official field actually comes from fundingInfo, not the premiumIndex response), what was affected (4-hour symbol APR underestimated by 2x, 1-hour by 8x), how it was fixed, how it was verified, and what issues remain unresolved (the semantics of last-settled and predicted funding are still inconsistent).

Compared to ordinary notes, this is more like auditable engineering memory.

The trader project’s memory illustrates this even better. It doesn’t simply record “Bitget demo connected”. It writes the boundary red lines between demo and real accounts, fee sign conventions, settlement invariants, deployment topology, Grafana dashboards, and authorization changes allowing only commits, not pushes — all into memory.

Why record these? Because they constitute state governance. A long-running agent that doesn’t know “which changes involving fund settlement require escalation confirmation” will act on the wrong boundaries no matter how capable its code. An autonomous loop that doesn’t know “only local commits allowed, no pushes” might push technically correct but procedurally wrong changes to the remote.

So agent memory must contain a category called governance memory: permission boundaries, risk red lines, environment topology, deployment processes, verification gates, current running state, and why previous decisions were made.

This kind of memory cannot rely solely on vector recall. It needs clear structure, explicit state, and manually auditable sources.

Fourth design principle:

Agent memory is also a governance system. It must manage sources, confidence, expiration, permissions, and deletability.

6. Layer 5: From Recall to Reflection and Skill Accumulation — The Compounding of Memory

So far, the discussion has primarily been about recall — remembering rules, profiles, history, and evidence.

But the real watershed for agent memory lies in the next layer: self-evolution.

Recall is just remembering what happened in the past. Reflection is summarizing why something succeeded or failed. Skill extraction is turning repeatedly successful paths into reusable workflows. Dreaming is offline consolidation during idle time, rather than cramming things into the context inline every round.

OpenClaw’s dreaming, Hermes’ post-turn self-improvement review, Claude Code’s auto memory, EverOS’ agent cases and skills — they’re all moving in this direction.

The “backtesting framework and strategy economy conclusions” in the trader project are close to experience accumulation. It organizes experimental conclusions across multiple time windows, multiple assets, and multiple parameter grids into strategic judgments: high-turnover weak signals are structurally net negative, reducing turnover is the core lever, narrow samples easily misjudge strategies, a sufficiently wide asset pool and longer windows are needed to see stability, and backtesting data sources must be closed-loop with the real exchange.

If this kind of thing only stays in chat history, the next time the agent will start from “let’s run it once and see”. Once consolidated into memory, the next round can start from a higher level — which parameters are worth sweeping, which verifications cannot be skipped, and which early conclusions have been corrected by subsequent samples.

This is the true compounding of memory.

Fifth design principle:

The endgame of Agent Memory is not “remembering more”, but making fewer of the same mistakes and reusing what was done correctly, faster.

7. Memory Architecture Comparison Across Four Applications

Looking at Claude Code, Codex, OpenClaw, and Hermes together, application-layer memory is clearly differentiating into four layers:

Rule Layer: CLAUDE.md / AGENTS.md, suitable for project conventions that must be followed.
Resident Layer: MEMORY.md / USER.md, suitable for high-density identity, preferences, and invariants.
Historical Layer: session search, daily notes, topic files, suitable for large amounts of facts, evidence, and processes.
Evolution Layer: dreaming, reflection, skills, responsible for converting historical experience into default capabilities for future actions.

A truly mature agent memory is a combination of these four layers. No single file or single vector database can support it alone — just like you can’t rely on only one data structure to manage the entire state of an operating system.

8. Why EverOS Aligns with This Direction

EverOS designed memory as a developer-facing runtime — a structure that developers can directly read, write, debug, and version, rather than guessing what’s inside a black-box recall layer through an API.

Several of its design choices align well with the layers above:

Markdown as source of truth: Memory is persisted as files that are readable, editable, grepable, and Git-versionable.
SQLite + LanceDB: Markdown is the source of truth, SQLite manages state, LanceDB manages vectors, BM25, and scalar filters.
Dual-track memory: User memory and agent memory are separate; episodes/profile are not mixed with cases/skills.
Multimodal ingestion: Text, images, audio, PDFs, HTML, emails can all enter the unified memory layer.
Self-evolution: Cases from real usage can be accumulated as skills.
Orthogonal retrieval: Searchable by user_id, agent_id, app_id, project_id, session_id dimensions.

If memory is only in a remote black box, developers will never know what the agent remembers, why this particular item was recalled, when it should be deleted, or which parts have expired. If memory is Markdown, at least the first step becomes simpler: you can open it, read it, diff it, edit it, put it in Git, and hand it to another agent.

This isn’t the final answer. But it’s the best engineering starting point we have right now.

9. New Problems Memory Will Bring

Memory is not a silver bullet. Precisely because it persists over the long term, it creates more troublesome problems than context hallucination.

Permanent error memory: A single misjudgment written into memory makes the next agent more confident in repeating it. And because “it’s in memory”, it’s harder to self-correct than when reasoning from scratch.

Outdated information continues influencing decisions: Today’s API limits, deployment topology, and account status could be completely different next month, but the old snapshot in memory silently biases the agent’s judgment.

Persistent contamination from prompt injection: If an agent saves web page content as “experience”, all subsequent sessions will be affected — far more severe than a single attack.

Irreversibility of privacy and deletion: Deleting chat history doesn’t mean the profiles, facts, and skills extracted from it also disappear. Extraction is an irreversible refinement of information, but forgetting requires equally systematic design.

Summaries turn evidence into second-hand conclusions: With overuse of second-hand conclusions, the system increasingly struggles to distinguish between “facts that really happened” and “the model’s interpretation at the time”. Once this boundary blurs, memory turns from an asset into a liability.

Therefore, a good memory system must have built-in governance: source, time, expiration, confidence level (fact/inference/preference), scope (user/project/agent/organization), deletability, and traceability (ability to go back to the original session, file, or commit).

This is also why I increasingly dislike simplifying agent memory to “storing conversations in a vector database”. Vector databases are just a recall method. The real memory system needs to handle state, sources, permissions, and evolution — things that a single embedding cannot solve.

10. Conclusion: Memory is the State Layer of the Agent

Back to the beginning.

Those local ~/.claude/projects/*/memory/MEMORY.md files convinced me: when an agent truly starts participating in long-term projects, it naturally needs a place outside of conversation to store state.

This state includes how the project runs, which bugs have been disproven, which verification methods are reliable, which risk red lines cannot be crossed, which experimental conclusions have been updated by subsequent data, which processes can be directly reused next time — and, finally, the part most easily overlooked: what to do next, when to trigger it, and which commitments remain unfulfilled.

Memory doesn’t just record the past. Task queues, scheduled triggers, unfinished loop states — these are all future-oriented memories. In the trader project, “inspection-fix-deployment loop” and “where to continue next round” are essentially the agent’s commitment to itself. Losing this would make the agent repeatedly start planning from scratch for things already planned, like a person who forgets yesterday’s progress every morning.

Long context allows the agent to see more completely within the current task. Memory allows the agent to start from a higher point in the next task.

This is why Agent Memory has evolved from a “minor feature” into an “architectural layer”. It’s not an optional enhancement; it’s infrastructure that allows the agent to evolve from a single-use call to continuous operation. Missing any layer will cause the agent to degrade into a stateless API wrapper in some dimension.

If you are building a coding assistant, research agent, personal AI, browser agent, customer support agent, or any LLM app requiring cross-session persistent work, take a look at EverOS: https://github.com/EverMind-AI/EverOS