@CobusGreylingZA: https://x.com/CobusGreylingZA/status/2066593705906012188
Summary
A detailed thread arguing that true universal AI agents must build their own tools and explore environments dynamically, rather than relying on pre-configured integrations like MCP. It positions the terminal/CLI as the universal integration layer and references supporting research from OSExpert and NVIDIA.
View Cached Full Text
Cached at: 06/16/26, 03:39 PM
Universal Agent Thesis
Full digital autonomy requires agents that build their own tools, discover their own boundaries and operate any system they encounter.
In Short
I have been writing around a single idea for months now…**This post pulls the threads together.**The agent that needs a pre-built connector, a curated schema, a hand-written integration for every system it touches is not a universal agent.It is a specialised agent tethered to what someone already configured.
A** universal agent **lands in any environment, explores what is available, builds the tools it needs, maps what it can and cannot do and executes within those boundaries.
Give your agent its own computer Running code execution in an AI agent is harder than it looks. Your agent needs a real computer (filesystem, shell…www.langchain.com
No pre-configuration.No human-curated tool registries.No framework scaffolding.This is where the industry is heading…
The integration layer collapsed
I wrote about this in Replace MCP With CLI. The observation was straightforward.
MCP requires building and maintaining a server for every integration.
SDKs, schemas, edge case handling, version management. The ecosystem’s value depends entirely on adoption.
Meanwhile every tool already has a CLI. Git, Docker, curl, ffmpeg, npm.
Decades of tooling, already accessible through a shell command.
The model already knows how to use them. It was trained on billions of shell scripts, man pages, Stack Overflow threads.
The insight was not that CLI is better than MCP for specific tasks.
The insight was that the entire integration layer was collapsing.
Six layers between the agent and the service, REST clients, authentication middleware, API gateways, integration platforms, all of it, replaced by a model that reasons about the intent and generates the command.
Jensen Huang called it the shift from pre-recorded software to real-time processing.
The integration is not defined ahead of time. It emerges from the agent’s reasoning at the moment it is needed.
The terminal is the bridge
I explored the four-stage autonomy progression in The CLI Is The Path To AI Autonomy.
Stage 1: Chat. The model generates text. You copy-paste it.
Stage 2: Tool use. The model calls predefined functions. MCP, function calling, tool schemas. Curated, bounded, safe.
Stage 3: Terminal. The model operates the computer directly. Shell commands, file systems, scripts. No curated APIs.
Stage 4: Full autonomy. The model reasons, plans and executes end-to-end. Multi-step tasks across applications, sessions and time.
We are between Stage 2 and Stage 3.
Claude Code, Codex CLI, Grok Build CLI are pushing into Stage 3.
The CLI is where the model stops calling curated APIs and starts operating the machine.
NVIDIA’s Nemotron-Terminal study proved that terminal capability is a trainable skill that scales predictably.
Their 32B model matched significantly larger models through data engineering alone.
The industry is converging on one conclusion.
The terminal is the universal integration layer.
Agents must explore before they execute
The OSExpert paper made the skill boundary argument concrete.
Agents that explore an environment before executing achieve approximately 20% higher success rates and approximately 80% efficiency improvements.
The mechanism is simple.
The agent systematically probes the environment.
It records what works. It records what fails. Successful sequences become unit skills. Failed sequences become boundary markers.
At inference time, if a task maps to a known failure, the agent stops immediately.
It does not attempt the task. It does not burn tokens, burn API calls, burn time on something it already knows will not succeed.
Current agents do the opposite.
They attempt every task with equal confidence.
When they fail, they try again with a different approach. When that fails, they try again.
The token vanishes
Underneath all of this, the fundamental unit of compute is disappearing from view.
I traced this in The Token Is Becoming The New Hidden Compute Primitive.
1990s Clock cycles → hidden by the OS
2000s Server capacity → hidden by the cloud
2010s API calls → hidden by SaaS platforms
2020s Tokens → hidden by AI Agents
For most users, CLI UI’s do not surface token counts. It surfaces outcomes. Fix this bug. Build this feature. Refactor this module.
SaaS was sold per seat. AI is now largely sold per outcome. The token is the unit of cost between those two layers but the user never sees it.
The competitive advantage shifts from who has the cheapest tokens to who manages tokens most intelligently.
Context compaction, reasoning budget allocation, tool orchestration.
The harness is the product. The token is the commodity.
The universal agent architecture
Pull these threads together and the architecture emerges.
Integration
The agent uses the terminal. Every tool already has a CLI. The model already knows how to use it. No new protocols, no servers to maintain, no schemas to define. When the agent encounters an unfamiliar tool, it reads the help files.
Exploration
Before executing, the agent explores the environment. It discovers what systems are available, what each exposes, what works, what fails. It builds a skill map and a boundary map.
Skill boundaries
The agent knows what it cannot do. If a task maps to a known failure, it stops. If it maps to a known capability, it executes. If it maps to unknown territory, it explores first.
Minimal context
The agent reasons from first principles. It does not need detailed schemas, extensive documentation or pre-loaded patterns. A requirement and an endpoint are enough.
Harness
The six components manage the lifecycle. Tool integration, memory management, context engineering, planning, verification, extensibility. The harness is the operating system for the agent.
Bounded autonomy
The agent operates freely within discovered constraints. File system permissions, sandbox boundaries, authentication levels, rate limits.
Governance is built into the environment.
The sandbox is the guardrail.
Software that writes its own tools
The endpoint of this trajectory is an agent that does not just use tools. It creates them.
When the agent explores an environment and discovers an API that has no CLI, it writes a wrapper.
When it needs a data transformation that no existing tool provides, it generates a script.
When it encounters a novel integration pattern, it builds the connector.
This is not speculative. Claude Code already does it.
The agent reads a codebase, understands the architecture, generates the integration code, runs the tests, iterates until it works.
The tool in this context is not a pre-defined function. It is whatever software the agent needs to create in order to complete the task.
The distinction between using a tool and building a tool dissolves.
The agent perceives the environment, reasons about what it needs, creates what is missing and operates the result.
Perceive, reason, act, learn. The universal agent loop.
This is why I built the prototype in this repository. A Claude-powered agent with a full harness, running in a Gradio interface.
Where you can configure the system prompt, select the model, enable or disable harness components, give it real tools and watch it operate.
Not a demo. A working harness you can point at any task.
Where this is heading
The universal agent is not one product.
It is a capability that emerges when these pieces converge.
The integration layer already collapsed.
The framework layer already collapsed.
The terminal is the bridge.
The exploration phase gives the agent reach.
The boundary check gives it discipline.
The harness manages the lifecycle.
The token disappears into the infrastructure.
What remains is an agent that can land in any digital environment, discover what is there, build what is missing, know what it cannot do and execute within those boundaries.
Full digital autonomy is not a destination. It is the natural consequence of agents that explore, build and know their own edges.
**Chief Evangelist @ **Kore.ai | I’m passionate about exploring the intersection of AI and language. Language Models, AI Agents, Agentic Apps, Dev Frameworks & Data-Driven Tools shaping tomorrow.
Similar Articles
A developer shares insights on how to maximize AI agent capabilities, arguing that simpler setups and understanding core principles are more effective than complex harnesses and libraries.
A developer shares insights on how to maximize AI agent capabilities, arguing that simpler setups and understanding core principles are more effective than complex harnesses and libraries.
@techwith_ram: https://x.com/techwith_ram/status/2064925285003542820
Explores the shift from human-in-the-loop to autonomous agent loops in AI coding, where agents self-prompt and iterate, discussing both the promise and the hidden costs of reduced human control.
@akshay_pachaar: The MCP vs CLI debate. For most of 2025, AI Engineers argued about it. The skeptics had real numbers: - Playwright MCP …
Anthropic's 'Code Mode' reframes the MCP vs CLI debate by having AI agents write code to call tools via a runtime rather than loading full schemas into context, drastically reducing token usage. This approach combines MCP's typed contracts with lazy loading, proving the protocol is evolving rather than dying.
@ghumare64: https://x.com/ghumare64/status/2052825541057626258
An X thread arguing that production AI agents need operational scaffolding (runbooks, permissions, logs, rollback, verification) rather than just better prompts. The author draws parallels to DevOps evolution, stating that prompts provide advice while runbooks provide control, and that agent systems require platform engineering solutions for permissions, state management, verification, observability, and rollback capabilities.
@daniel_mac8: https://x.com/daniel_mac8/status/2054994899422826592
The thread discusses recent evidence that AI agents have become largely autonomous, with Claude Mythos solving previously unsolved cyber attack simulations and exceeding current benchmark measurement limits, indicating super-exponential progress. It highlights the security implications and institutional responses.