@cocoindex_io: Orgs that want a production-grade pipeline without writing ingestion and transformation logic from scratch should take …
Summary
SerroAI open-sources a guide for building live program memory using Claude Code and MCP integrations, enabling AI agents to query organizational decisions and signals without proprietary infrastructure.
View Cached Full Text
Cached at: 06/11/26, 03:42 PM
Orgs that want a production-grade pipeline without writing ingestion and transformation logic from scratch should take a look at @cocoindex_io . @jakesrro just open sourced programmable memory with @cocoindex_io, which sits in the center of the project.
An AI-native engineering org runs multiple programs in parallel - each with its own scope, stakeholders, signals, and decisions. Live program memory means every AI agent in your org can answer questions like:
- “What decisions did the platform team make last quarter and why?”
- “Which engineers have been working on auth - and for how long?”
- “What action items from last week’s design review are still open?”
- “Has the scope of Program X drifted from its original charter?”
- Without live program memory, every Claude session starts from zero. Engineers re-explain context. Decisions get repeated. Work happens invisibly.
Link to the repo - https://github.com/SerroAI/program-memory…
SerroAI/program-memory
Source: https://github.com/SerroAI/program-memory
Build Your Own Serro
Definitive guide on building your own program memory layer
A starter kit for building live shared program memory using Claude Code, MCP integrations, and git - no proprietary infrastructure required.
This repo contains step-by-step instructions, not code. There is nothing to
npm installordocker run. Each family folder is a guide for how to build the implementation yourself using Claude Code and native MCP integrations.
Why Serro published this
We want our customers to succeed - with or without us.
If you have the engineering bandwidth to build and operate your own live program memory, this repo gives you the full architecture: every decision point, every tradeoff, every dead end we found. You’ll know exactly what you’re signing up for.
If, after reading this, you’d rather not operate it yourself - Serro is the managed version. Same capabilities, no infrastructure, with a proprietary ontology built from three years of org signals.
Either way, you’ll have made an informed choice. That’s the goal.
What is live program memory?
An AI-native engineering org runs multiple programs in parallel - each with its own scope, stakeholders, signals, and decisions. Live program memory means every AI agent in your org can answer questions like:
- “What decisions did the platform team make last quarter and why?”
- “Which engineers have been working on auth - and for how long?”
- “What action items from last week’s design review are still open?”
- “Has the scope of Program X drifted from its original charter?”
Without live program memory, every Claude session starts from zero. Engineers re-explain context. Decisions get repeated. Work happens invisibly.
Serro solves this by maintaining a continuously updated, program-indexed memory across GitHub, Slack, Google Drive, and meetings - and making it queryable by any agent in the org.
This repo documents how to replicate that using only Claude Code and native MCP integrations.
What you’re building
┌─────────────────────────────────────────────────────┐
│ Widget layer Prompt-based live views of │
│ program state (requires layer 2) │ ← out of scope
├─────────────────────────────────────────────────────┤
│ Proactive layer Autonomous loop agent monitors │
│ programs, flags blockers, posts │ ← loop pattern (see below)
│ digests on a schedule │
├─────────────────────────────────────────────────────┤
│ Memory layer Signals from GitHub/Slack/Drive │
│ organized by program, queryable │ ← this repo
│ by any Claude session │
└─────────────────────────────────────────────────────┘
This repo covers the memory layer in full. The proactive layer is documented as a loop pattern — an autonomous Claude agent that wakes up on a schedule, reads program memory, and surfaces what matters without being asked. See content_ideas/serroloop_blog_post.md and the loop pattern concept below. The widget layer remains out of scope.
Four levels of live program memory
There are four levels. Each one is useful on its own. Each one is also the foundation for the next.
| Level | What you do | What you get | Families |
|---|---|---|---|
| 1 — Pull | Nothing. Claude pulls all sources at query time. | Instant setup. Works until context fills up. | Family A |
| 2 — Map | Maintain program_mappings.yaml — programs, people, sources in one file. | Scoped queries, contributor attribution, action item follow-up. | Family B |
| 3 — Loop | A Claude loop automaintains the digests. | Always-current memory. No manual maintenance. | Family C (C-4 start) |
| 4 — Graph | Pipe loop output into a semantic/graph index. | Semantic search. Entity resolution. Temporal reasoning. | Family C + CocoIndex / FalkorDB / Serro |
Start at Level 3. One /loop command, no infrastructure required. Move to Level 4 only after your loop is stable and you’re hitting the ceiling of flat digest queries. The mapping file (program_mappings.yaml) is the same at every level — you write it once and it carries forward.
See verdict.md for the full rationale and when to use each level.
Full decision tree: memory_layer_decision_chart.md

What Serro does that this can’t (yet)
This is an honest comparison. The open-source version covers the architecture - but Serro has advantages that aren’t replicable with public tooling alone:
| Capability | Open-source (this repo) | Serro |
|---|---|---|
| Live memory ingestion | ✅ Hourly–seconds depending on Family C option | ✅ Continuous, event-driven |
| Program-indexed memory | ✅ Via program_mappings.yaml | ✅ Auto-classified, org-wide |
| Keyword search across sources | ✅ Via MCP (GitHub, Slack, Drive) | ✅ |
| Semantic / embedding search | ⚠️ Requires self-hosted vector store (Family C) | ✅ Built-in |
| Temporal code intelligence | ⚠️ Keyword search only - conceptual drift not detectable | ✅ Symbol-level history |
| Engineer contribution history | ⚠️ Reconstructed from git blame + Slack - incomplete | ✅ Continuously maintained |
| Voice-driven memory updates | ❌ | ✅ |
| Proactive program coordination | ✅ Via loop pattern — scheduled Claude agent reads memory, flags blockers, posts Slack digest | ✅ Event-driven |
| Zero-config setup | ❌ Requires mapping yaml + MCP server setup | ✅ |
The biggest structural gap is the data corpus. Serro has been ingesting and indexing org signals since 2023. The open-source version starts from zero. That gap matters most for temporal reasoning and contribution history.
Current status
| Layer | Status | Notes |
|---|---|---|
| Memory layer | 🟢 Instructions written | Three architectures documented with step-by-step guides. Not validated against a real org. |
| Proactive layer | 🟡 Loop pattern documented | Serroloop pattern covers monitoring, digest, and blocker detection. Implementation guide not yet written. |
| Widget layer | 🔴 Out of scope (checkpoint 1) | Requires memory + proactive layers |
Checkpoint 1 complete: capability analysis, architectural decision tree, and implementation options documented.
Checkpoint 2: build Family B or C2 against a real org and measure classification accuracy, coverage, and latency.
Quick start
Read
critical_review.mdfirst. This repo has a conflict of interest - it’s written by the people who built Serro. The review names the biases explicitly.
1. Understand what you’re replicating
research/serro_capabilities.md
Seven capabilities across three layers, with difficulty ratings and measurement rubrics.
2. Pick an architecture
memory_layer_decision_chart.md
A decision tree with 14 decision points. The wrong architecture choice costs weeks.
3. Read what didn’t work
comparative_analysis.md
MCP is pull-only - not a continuous stream. Several architectural assumptions fail because of this. Don’t repeat the mistakes.
4. Follow implementation instructions
family_a/ family_b/ family_c/
Pick your family from the decision chart. Each folder has an instructions.md.
5. Copy the templates
templates/
program_mappings.yaml, charter.md, and CLAUDE_template.md are ready to fill in.
Repo structure
├── README.md ← you are here
├── CLAUDE.md ← agent navigation (read if you're an AI)
├── goal.md ← mission and motivation
├── critical_review.md ← honest critique of this analysis
├── comparative_analysis.md ← what we tried, what broke, the key fork
├── key_decisions.md ← 14 decision points with rationale
├── memory_layer_decision_chart.md ← mermaid decision tree for picking an approach
│
├── research/
│ └── serro_capabilities.md ← 7 capabilities, difficulty ratings, open questions
│
├── family_a/
│ └── instructions.md ← full context pull - micro-orgs, zero config
│
├── family_b/
│ ├── overview.md ← human-maintained source mapping approach
│ └── instructions.md ← step-by-step setup
│
├── family_c/
│ ├── overview.md ← auto-ingestion: C1 / C2 / C3 comparison
│ ├── c1_webhook_server.md ← always-on server (seconds latency)
│ ├── c2_git_cron.md ← git + scheduled cron (hourly, recommended start)
│ ├── c3_github_actions.md ← GitHub Actions + Cloudflare Worker (1–2 min)
│ └── instructions.md ← step-by-step setup
│
├── templates/
│ ├── program_mappings.yaml
│ ├── charter.md
│ └── CLAUDE_template.md
│
├── content/
│ ├── youtube_script_checkpoint_1.md
│ └── blog_post_checkpoint_1.md
│
└── content_ideas/
├── youtube_script_v1.md ← first-person YouTube script (Jake's POV)
└── serroloop_blog_post.md ← loop pattern applied to program engineering
Read this before building
critical_review.md- conflict of interest, unvalidated assumptions, what would constitute real evidencecomparative_analysis.md- Approach 1 failed because MCP is pull-only. Don’t design around polling as if it’s continuous ingestion.family_b/overview.md- Family B has 6 known limitations. Long-horizon technical reasoning is the hardest gap to close.
Concepts
- What is a program?
- What is program engineering?
- What is an agentic TPM?
- What is live program memory?
- What is a loop?
- Why not just use Jira, Linear, or Notion?
- What is MCP and why does it matter here?
What is a program?
A program is a named, ongoing technical initiative with a defined scope, a set of owning engineers, and signals distributed across multiple tools. Unlike a ticket (which tracks one task) or a project (which has a hard end date), a program is continuous. It has a charter, stakeholders, and a living record of decisions, commitments, and scope changes.
Examples: “Platform Reliability”, “Auth Modernization”, “AI Discoverability”, “Mobile Launch Q3”.
What is program engineering?
Program engineering is what happens when multiple workstreams all point at the same outcome and none of them, fixed in isolation, achieves it. A program gives that work a name, an owner, a sequence, and honest visibility into whether it’s on track. Unlike a project, it doesn’t end — it recurs. And every time it runs without a shared memory of how it ran before, the coordination cost compounds from scratch.
Program engineering used to be a large-company problem. You hit it somewhere around 80 engineers when the org got complex enough that coordination started breaking down. Before that, a good senior engineer or EM could hold the picture in their head.
That threshold is gone.
AI has decoupled team size from execution capacity. A 10-person engineering team today can move at a speed and breadth that would have required 80 people five years ago. The ambition expands to meet the new capacity. And the moment you’re running eight things at once with a team of ten, you have an 80-person company’s coordination problems with none of the organizational infrastructure that large companies built to handle them.
This is the shift from product engineering to program engineering. When a program has no name, no owner, and no shared visibility, it still gets run — by whoever has enough context to hold the picture. That person becomes the accidental program engineer, doing it on top of their actual job, and all the institutional knowledge lives in their head.
And the workstreams themselves are no longer all human. With Anthropic’s release of loops in Claude Code, agents stopped being tools you invoke and became standing participants in programs. A loop wakes up on its own schedule, reads program state, pulls signals, writes memory, posts digests, flags blockers — and decides for itself when to run again. Nobody is prompting it.
This is the primary use case of program engineering: programming the governance of human-agent loops. Which decisions a loop makes autonomously, which it escalates, what context it’s allowed to act on, who reviews what it wrote while everyone was asleep, and who is accountable when it acts on stale memory. These are program-level decisions, not prompt-level ones. The team that writes its loops’ governance explicitly — in the same mapping file that declares its programs’ owners and sources — is doing program engineering. The team that doesn’t has unaccountable workstreams running unattended.
Read the full essay - Welcome to Program Engineering
What is an agentic TPM?
An agentic TPM platform is infrastructure for program engineering - not a replacement for the TPM role. With live program intelligence, agentic actions, and self-driven reports, it surfaces program visibility for everyone driving the work so they spend less time coordinating and more time executing. It scales TPM capacity without scaling headcount.
It does this by:
- Continuously ingesting signals from GitHub, Slack, Drive, and meetings
- Maintaining a live model of each program’s state: who’s working on what, what decisions were made, what commitments are outstanding
- Surfacing blockers before they’re escalated
- Following up on action items
- Routing context to downstream agents so they don’t start from zero
Serro is an agentic TPM. This repo is a guide for building a version of it yourself.
What is live program memory?
Live program memory is an always-current, program-indexed record of everything that matters to a program: decisions, contributors, scope changes, blockers, and action items.
“Live” means it updates automatically as signals arrive - not a static doc someone has to remember to update.
“Program-indexed” means signals are organized by program, not by tool or date. A question like “what changed about the auth program last quarter?” draws from GitHub, Slack, Drive, and meeting transcripts simultaneously.
This is the problem this repo is trying to solve.
What is a loop?
A loop is an autonomous Claude agent that runs on a recurring interval. It wakes up, reads state, decides what matters, acts, and goes back to sleep — with no human prompting it.
Applied to program engineering, a loop is what turns passive memory into active oversight. The memory layer answers questions when asked. A loop running on top of it asks the questions itself:
- What PRs have been open longer than expected?
- What Slack decisions haven’t made it into any doc?
- Which programs have gone quiet when they shouldn’t have?
- Has scope drifted outside the declared charter?
The loop reads the serro-diy repo, pulls live signals from declared sources, compares current state against the last digest, and posts a summary to Slack — or flags specific items that need attention.
This is the Serroloop pattern: memory layer + autonomous loop = proactive program oversight that scales without headcount.
See content_ideas/serroloop_blog_post.md for the full pattern and implementation sketch.
Why not just use Jira, Linear, or Notion?
Those tools track work at the ticket or document level. They don’t maintain a cross-source model of program state over time. Connecting signals - a GitHub PR to a Jira ticket to a Slack decision to a meeting transcript - is still manual.
An agentic TPM does that connection automatically and makes the result queryable.
Jira answers: “what’s the status of this ticket?” An agentic TPM answers: “what has changed about the auth program over the last quarter, and why?”
What is MCP and why does it matter here?
MCP (Model Context Protocol) is an open protocol that lets Claude connect to external tools - GitHub, Slack, Google Drive, meetings - and query them in real time. Instead of copy-pasting context into a chat window, Claude reads from live sources directly.
This repo uses MCP integrations as the data layer for all three implementation families.
The key constraint: MCP is pull-only. Claude reads from tools when asked; it does not receive a continuous stream of events. That single constraint shapes every architectural decision in this repo - it’s why Family C (auto-ingestion) exists at all.
Contributing
This is an open experiment. Contributions that advance it are welcome:
- Measurements - ran Family B or C2 against a real org? Classification accuracy, coverage, and latency numbers are the most valuable thing you can add.
- Dead ends - tried something that didn’t work? Document it in
comparative_analysis.md. - Implementation gaps -
family_a/instructions.md,family_b/instructions.md, andfamily_c/instructions.mdneed step-by-step instructions written. - Alternative architectures -
family_c/overview.mdhas a placeholder for approaches not yet identified.
Please do not contribute claims without measurements. The value of this repo is honest engineering, not optimistic design.
License
MIT. Use it, fork it, build on it.
Related
- Serro - the managed version of what this repo attempts to build
- Claude Code - the tool this is built with
- Model Context Protocol - the integration layer all approaches depend on
Worth knowing if you’re going deep on Family C
- Apache Iggy - persistent message streaming platform (lightweight Kafka in Rust). Fits as a durable event bus between your webhook sources and ingestion agent — gives you replay, backpressure, and delivery guarantees that raw webhooks don’t.
- CocoIndex - open-source incremental data transformation framework built for AI indexing pipelines. Fits as a replacement for the custom ingestion agent — handles source-to-index transformation, incremental updates, and embedding generation declaratively.
Jake Kim (@jakesrro): Shoutout to @cocoindex_io @laserdatainc @ApacheIggy @KrantiParisa
Similar Articles
@tom_doerr: Orchestrates AI coding agents with persistent memory https://github.com/RedPlanetHQ/core…
CORE is an open-source AI operating layer that orchestrates coding agents with persistent memory, coordinating tasks across tools and agents.
@pvergadia: 9-layer AI production architecture every developer must know. → services/ RAG pipeline, semantic cache, memory, query r…
This post outlines a comprehensive 9-layer AI production architecture, emphasizing components like RAG pipelines, security guards, observability, and evaluation to distinguish robust production systems from simple demos.
@anyscalecompute: Most agent frameworks solve orchestration and leave infrastructure completely unresolved. New blog: production-ready AI…
Anyscale published a technical guide on deploying production-ready AI agents using Ray Serve, MCP, and A2A protocols. The article addresses common infrastructure bottlenecks by proposing a decoupled microservices architecture that enables independent scaling of LLMs, tools, and agents.
@GitHub_Daily: Using AI agents for production-grade tasks—writing code, running workflows, calling APIs—works fine initially, but as the scale grows, things easily get out of control: permissions too broad, context loss, and debugging becomes impossible. That's where agents-best-practices comes in: a complete guide to designing a runtime framework for AI agents, not limited to coding scenarios, but also applicable to operations, sales...
Introduces the agents-best-practices repository, a production-grade AI agent runtime framework design guide covering tool permission tiers, context compression, etc., supporting Codex and Claude Code installation.
@eng_khairallah1: https://x.com/eng_khairallah1/status/2058116763372453997
A comprehensive guide teaching non-coders how to build AI agents using Claude and Cowork without writing any code, explaining the core components and providing step-by-step instructions.