@_avichawla: https://x.com/_avichawla/status/2071897559287955680
Summary
The article discusses that the real challenge in AI agents is not building them but running them in production, and proposes the need for an operating system layer to manage fleets of agents, akin to how an OS manages software processes.
View Cached Full Text
Cached at: 06/30/26, 03:44 PM
How to Build an OS for Your AI Workforce?
Why running a fleet of agents in production is an operations problem, not a framework problem, and what the layer that solves it has to handle.
The last two years have mostly gone into making agents easier to build.
We have frameworks, workflow builders, drag-and-drop canvases, Python libraries, and multi-agent orchestrators. Spinning up an agent that does one job has never been less work.
And yet most teams that put agents into production are still running them like one-off experiments.
As of today, the problem isn’t building agents but rather running them.
Think about how software development has matured in a predictable order.
First came scripts, then applications, and eventually enough processes running at once that you needed something underneath to manage them. That something was the operating system. It scheduled resources, coordinated processes, and gave you one surface to control everything running on the machine.
AI agents are following the exact same arc.
Right now, most teams are still at the scripting stage.
You build an agent for one task, ship it, then build the next one, and the next. A few months in, you have a dozen agents doing a dozen unrelated jobs, none aware of the others, with no single place to manage any of them.
Calling that a workforce is generous, but in reality, it’s a pile of disconnected scripts with nothing coordinating them.
What the current tooling actually covers
Three categories cover most of what’s shipping today.
-
Workflow builders like n8n, Dify, and Flowise are good for prototyping. You can drag nodes onto a canvas, wire them together, and get something that runs. They hit a limitation quickly though on multi-agent coordination, dynamic task assignment, access controls, and audit trails.
-
Code-first frameworks like LangChain, CrewAI, and AutoGen give you control, and you pay for it in maintenance. You write graph definitions in Python, wire up role-based patterns, and carry state by hand. Anyone who has shipped on these knows what happens once agents(.)py crosses a few hundred lines. The abstraction starts fighting you, tracing a bad run gets hard, and rewrites become routine.
-
Personal assistants like OpenAI’s agents, Claude and Gemini, are strong on individual tasks. If you hand them a research question, a document to draft, or a single workflow, they can easily deliver. The interaction model is one conversation at a time, responding to you. Coordinating a set of specialized agents running in parallel toward a shared goal was never what they were built for.
There’s a pattern that appears across all three of these:
-
Each one is built around a single agent, whether you’re constructing it or talking to it
-
None of them gives you a unified view of a running fleet
-
You can’t hand new work to an already-deployed agent in plain language
-
There’s no shared memory, shared state, or shared governance across agents
In other words, they solve the construction problem but managing operations is still a problem.
OS for agents
Let’s go back to first principles.
An operating system doesn’t write your programs.
Instead, it runs them and arbitrates resources between them. It gives you one interface to see and control everything happening on the machine, enforces permissions, logs what ran, and contains failures so one process doesn’t take down the rest.
An OS for agents does the same job one level up, across your agents.
It gives you one place to:
-
Build, change, and deploy agents without dropping into code
-
Direct the whole fleet through natural language
-
Route tasks to the right agents and watch their progress
-
Wire every agent into shared knowledge, data, and tools
-
Scope permissions so teams only touch the agents they should
-
Read logs and audit exactly what each agent did and why
To reiterate, this layer isn’t a builder but rather a command center for the agents you’ve already built.
The builder is still important. Agents have to be designed well, and workflows have to be structured.
But once they’re live, you need a layer above them to operate the whole thing as one system instead of a dozen disconnected ones.
Why does this limitation exists in the first place
Almost all of the tooling here was designed bottom-up.
After starting with an LLM, you add tool use, chain a few calls together, add memory, and then try to coordinate several agents. Every layer piles complexity onto a foundation that was already complex.
At no point in that stack does anyone design for the person who has to operate the result.
For instance, the developer wiring up a workflow isn’t thinking about the team lead who has to assign that agent new work next month. The engineer designing a multi-agent pipeline isn’t thinking about the compliance officer who has to audit every action it takes.
This induces a wide gap between having agents deployed and having a workforce you can actually manage.
Plenty of teams fall into that gap and never climb back out.
The new architecture
If you were to design an OS for your AI workforce from scratch, it would need a few things.
-
A natural language command interface instead of a visual canvas you drag things around on, or a Python SDK you build with. This will be a conversational layer where you can say “create a workflow that monitors our support inbox and escalates urgent tickets to Slack” and have it happen. This is how people actually want to interact with their AI workforce.
-
Every agent should share access to the same knowledge bases, file stores, databases, and integration credentials. These wouldn’t be siloed per-agent, but managed at the workspace level. When you build a new agent, it should be able to see and use everything else that already has access to.
-
One place to see what every agent is doing, what it did, and why it chose the path it did, rather than reconstructing it from scattered logs across services. A single structured audit trail.
-
Different teams should be able to use different agents without stepping on each other. Admins should be able to restrict which models or tools any agent can use. Sensitive data should stay gated.
-
For any serious enterprise deployment, you can’t send your data to a third-party SaaS. The OS needs to run in your own infrastructure.
None of these are optional extras because if you drop any one of them, you won’t have an OS but rather a builder with a nicer UI.
Next steps for teams building with AI today
This philosophy reframes how you think about agents entirely.
Instead of “how do I build this agent,” you’re asking what the fleet looks like in six months and how you keep it running.
A few things follow from that positioning:
-
Agents become workers rather than scripts. They carry roles, responsibilities, and oversight, and when requirements change, you reinstruct them instead of rebuilding from scratch.
-
The fleet is composable. A support agent, a research agent, and a data-enrichment agent can share context and hand work to each other because the same layer governs all of them.
-
Non-engineers can operate it. With a natural-language interface, spinning up an agent or assigning a task doesn’t route through a developer. The PM, the ops lead, and the analyst can all make changes directly.
-
Governance becomes tractable. Audit trails, access controls, and compliance sit in the management layer from day one instead of being retrofitted under pressure later.
This is helpful in making AI agents viable at scale, without moving to better models or adding more integrations. It gives you a coherent management layer that treats the entire fleet as a single, operable system.
Where things are heading
The bottleneck in adoption stopped being model capability a while ago, and now it’s infrastructure maturity.
Teams know what they want agents to do. What they don’t have is a clean way to deploy, manage, and govern them at scale. Whoever works out the operations side first ends up with an advantage that compounds.
The teams building toward this aren’t starting from “how do we make better agents” but rather starting from what the command center for the fleet should look like.
That’s the more useful question to be working on.
If you want to see this philosophy already being built, Sim (open-source with 27k+ stars) has already built this. It gives you a workflow agent for AI automations in the collaborative workspace to build, deploy, and manage AI agents and workflows.
It started as an open-source visual workflow builder and has grown into a natural-language command layer for creating, managing, and directing a fleet of agents from one interface, rather than a tool for assembling a single workflow.
You can self-host it, read the code, and you’re not locked into someone else’s infrastructure.
Here’s the GitHub Repo →
(don’t forget to star it ⭐️)
That’s a wrap!
If you enjoyed reading this:
Find me → @_avichawla
Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.
Similar Articles
The hard part of agents is not building one. It is operating five.
The article discusses the operational challenges of running multiple AI agents in production, emphasizing observability, recovery, and session management over the initial development of a single agent.
@chamath: https://x.com/chamath/status/2054646394867364143
A detailed primer on the rise of AI agents, including statistics, failure modes, and a five-layer framework, highlighting the shift from chatbots to autonomous task-oriented AI.
@hwchase17: https://x.com/hwchase17/status/2053157547985834227
The article outlines a systematic 'Agent Development Lifecycle' (Build, Test, Deploy, Monitor) for creating and managing AI agents effectively, highlighting key frameworks like LangChain, LangGraph, and CrewAI.
The Real Truth About AI Agents
An experienced practitioner shares hard-won lessons from deploying 25+ AI agents to production, arguing that memory, orchestration, and auditability matter far more than model choice. The article details common failure modes like context loss and silent cost loops, and recommends a stack including Claude Sonnet 4, Pydantic AI, and dedicated memory layers like Octopodas.
AI Agents 102
This article discusses the transition from demo AI agents to production-ready systems, covering six pillars for deployment including input validation, graceful degradation, and state checkpointing.