Do coding agents need an OS-like control plane? I built a prototype and want critique.

Reddit r/AI_Agents Tools

Summary

The author introduces 'KnowledgeOS', a prototype control plane designed to govern local coding agents by managing task lifecycles, preventing state drift, and ensuring execution evidence. They are seeking architectural critique on whether this OS-like abstraction is necessary or if it constitutes over-engineering for agent workflows.

I’ve been experimenting with a local control-plane for coding agents, and I’d love serious critique from people building real agent workflows. The problem I kept running into: \- agents forget the original project intent after long sessions \- “done” is often claimed without eval/test/postflight evidence \- MCP/tool/subagent calls are invisible unless you manually inspect logs \- old projects accumulate stale generated files, broken hooks, and mismatched state \- multi-agent work gets messy because there is no durable task/spec/lifecycle record So I built a prototype called KnowledgeOS. The idea is not to replace the operating system. It is more like a project-local governance layer for agents. Current pieces: \- \`.agent-os/\` control plane per project \- \`create-task\` for formal task intake \- \`create-spec\` / \`align-spec\` so runs bind to durable user intent \- \`route-task\` and \`check-route-write\` to prevent uncontrolled file mutation \- \`context-pack\` and \`plan-task\` before execution \- mandatory lifecycle phases: route, plan, review, dispatch, execute, report \- visible \`CHECKPOINT\_OK\`, \`CAPABILITY\_OK\`, and \`TRACE\_OK\` markers \- \`capability-event\` for MCP / skill / subagent / shell / script visibility \- \`eval-task\`, \`verify-context\`, \`verify-lifecycle\`, \`complete-task\` \- postflight hook that must return \`\[SYNC\_OK\]\` \- local tool registry for MCPs, skills, orchestrators, and subagents \- recently integrated Maestro Orchestrate as a local specialist-agent catalog via MCP The design philosophy is: \- small kernel \- pluggable modules \- optional apps/workbench \- each project decides strictness \- every important agent claim needs command evidence What I’m unsure about: 1. Is “OS-like control plane for agents” the right abstraction, or is this just workflow tooling with a fancy name? 2. Should lifecycle gates be strict by default, or opt-in per project? 3. Is spec-first / checkpoint-first work too much friction for everyday coding? 4. How should subagent registries be represented without turning into prompt soup? 5. Are there existing systems that solve this more cleanly? I’m not looking for stars as much as architecture feedback. If this is over-engineered, I’d love to hear where. If the abstraction is useful, I’d love suggestions on what should be kernel vs plugin/module.
Original Article

Similar Articles

Coding Agents Won’t Be Won by Prompts, but by Runtime Infrastructure

Reddit r/AI_Agents

As coding agents become more capable, the bottleneck shifts from model quality to the infrastructure that supports long-running tasks, including durable state, permissions, checkpoints, observability, and cost controls. The author argues that the best agent products resemble runtime and workflow systems rather than just improved prompt interfaces.

Agent libOS: A Library-OS-Inspired Runtime for Long-Running, Capability-Controlled LLM Agents

Hugging Face Daily Papers

Agent libOS introduces a library-OS-inspired runtime substrate for LLM agents, treating agents as schedulable processes with explicit capabilities, lifecycle management, audit records, and human approval queues. The design shifts the trust boundary from tool dispatch to runtime primitives, enabling long-running agents to be scheduled, authorized, resumed, and audited safely.