@CobusGreylingZA: https://x.com/CobusGreylingZA/status/2066593705906012188

X AI KOLs Timeline 06/15/26, 06:48 PM News

ai-agents universal-agent-thesis cli autonomy terminal language-models agent-tool-use

Summary

A detailed thread arguing that true universal AI agents must build their own tools and explore environments dynamically, rather than relying on pre-configured integrations like MCP. It positions the terminal/CLI as the universal integration layer and references supporting research from OSExpert and NVIDIA.

https://t.co/eTnpsHVnsF

Original Article

View Cached Full Text

Cached at: 06/16/26, 03:39 PM

Universal Agent Thesis

Full digital autonomy requires agents that build their own tools, discover their own boundaries and operate any system they encounter.

In Short

I have been writing around a single idea for months now…**This post pulls the threads together.**The agent that needs a pre-built connector, a curated schema, a hand-written integration for every system it touches is not a universal agent.It is a specialised agent tethered to what someone already configured.

A** universal agent **lands in any environment, explores what is available, builds the tools it needs, maps what it can and cannot do and executes within those boundaries.

Give your agent its own computer Running code execution in an AI agent is harder than it looks. Your agent needs a real computer (filesystem, shell…www.langchain.com

No pre-configuration.No human-curated tool registries.No framework scaffolding.This is where the industry is heading…

The integration layer collapsed

I wrote about this in Replace MCP With CLI. The observation was straightforward.

MCP requires building and maintaining a server for every integration.

SDKs, schemas, edge case handling, version management. The ecosystem’s value depends entirely on adoption.

Meanwhile every tool already has a CLI. Git, Docker, curl, ffmpeg, npm.

Decades of tooling, already accessible through a shell command.

The model already knows how to use them. It was trained on billions of shell scripts, man pages, Stack Overflow threads.

The insight was not that CLI is better than MCP for specific tasks.

The insight was that the entire integration layer was collapsing.

Six layers between the agent and the service, REST clients, authentication middleware, API gateways, integration platforms, all of it, replaced by a model that reasons about the intent and generates the command.

Jensen Huang called it the shift from pre-recorded software to real-time processing.

The integration is not defined ahead of time. It emerges from the agent’s reasoning at the moment it is needed.

The terminal is the bridge

I explored the four-stage autonomy progression in The CLI Is The Path To AI Autonomy.

Stage 1: Chat. The model generates text. You copy-paste it.

Stage 2: Tool use. The model calls predefined functions. MCP, function calling, tool schemas. Curated, bounded, safe.

Stage 3: Terminal. The model operates the computer directly. Shell commands, file systems, scripts. No curated APIs.

Stage 4: Full autonomy. The model reasons, plans and executes end-to-end. Multi-step tasks across applications, sessions and time.

We are between Stage 2 and Stage 3.

Claude Code, Codex CLI, Grok Build CLI are pushing into Stage 3.

The CLI is where the model stops calling curated APIs and starts operating the machine.

NVIDIA’s Nemotron-Terminal study proved that terminal capability is a trainable skill that scales predictably.

Their 32B model matched significantly larger models through data engineering alone.

The industry is converging on one conclusion.

The terminal is the universal integration layer.

Agents must explore before they execute

The OSExpert paper made the skill boundary argument concrete.

Agents that explore an environment before executing achieve approximately 20% higher success rates and approximately 80% efficiency improvements.

The mechanism is simple.

The agent systematically probes the environment.

It records what works. It records what fails. Successful sequences become unit skills. Failed sequences become boundary markers.

At inference time, if a task maps to a known failure, the agent stops immediately.

It does not attempt the task. It does not burn tokens, burn API calls, burn time on something it already knows will not succeed.

Current agents do the opposite.

They attempt every task with equal confidence.

When they fail, they try again with a different approach. When that fails, they try again.

The token vanishes

Underneath all of this, the fundamental unit of compute is disappearing from view.

I traced this in The Token Is Becoming The New Hidden Compute Primitive.

1990s Clock cycles → hidden by the OS

2000s Server capacity → hidden by the cloud

2010s API calls → hidden by SaaS platforms

2020s Tokens → hidden by AI Agents

For most users, CLI UI’s do not surface token counts. It surfaces outcomes. Fix this bug. Build this feature. Refactor this module.

SaaS was sold per seat. AI is now largely sold per outcome. The token is the unit of cost between those two layers but the user never sees it.

The competitive advantage shifts from who has the cheapest tokens to who manages tokens most intelligently.

Context compaction, reasoning budget allocation, tool orchestration.

The harness is the product. The token is the commodity.

The universal agent architecture

Pull these threads together and the architecture emerges.

Integration

The agent uses the terminal. Every tool already has a CLI. The model already knows how to use it. No new protocols, no servers to maintain, no schemas to define. When the agent encounters an unfamiliar tool, it reads the help files.

Exploration

Before executing, the agent explores the environment. It discovers what systems are available, what each exposes, what works, what fails. It builds a skill map and a boundary map.

Skill boundaries

The agent knows what it cannot do. If a task maps to a known failure, it stops. If it maps to a known capability, it executes. If it maps to unknown territory, it explores first.

Minimal context

The agent reasons from first principles. It does not need detailed schemas, extensive documentation or pre-loaded patterns. A requirement and an endpoint are enough.

Harness

The six components manage the lifecycle. Tool integration, memory management, context engineering, planning, verification, extensibility. The harness is the operating system for the agent.

Bounded autonomy

The agent operates freely within discovered constraints. File system permissions, sandbox boundaries, authentication levels, rate limits.

Governance is built into the environment.

The sandbox is the guardrail.

Software that writes its own tools

The endpoint of this trajectory is an agent that does not just use tools. It creates them.

When the agent explores an environment and discovers an API that has no CLI, it writes a wrapper.

When it needs a data transformation that no existing tool provides, it generates a script.

When it encounters a novel integration pattern, it builds the connector.

This is not speculative. Claude Code already does it.

The agent reads a codebase, understands the architecture, generates the integration code, runs the tests, iterates until it works.

The tool in this context is not a pre-defined function. It is whatever software the agent needs to create in order to complete the task.

The distinction between using a tool and building a tool dissolves.

The agent perceives the environment, reasons about what it needs, creates what is missing and operates the result.

Perceive, reason, act, learn. The universal agent loop.

This is why I built the prototype in this repository. A Claude-powered agent with a full harness, running in a Gradio interface.

Where you can configure the system prompt, select the model, enable or disable harness components, give it real tools and watch it operate.

Not a demo. A working harness you can point at any task.

Where this is heading

The universal agent is not one product.

It is a capability that emerges when these pieces converge.

The integration layer already collapsed.

The framework layer already collapsed.

The terminal is the bridge.

The exploration phase gives the agent reach.

The boundary check gives it discipline.

The harness manages the lifecycle.

The token disappears into the infrastructure.

What remains is an agent that can land in any digital environment, discover what is there, build what is missing, know what it cannot do and execute within those boundaries.

Full digital autonomy is not a destination. It is the natural consequence of agents that explore, build and know their own edges.

**Chief Evangelist @ **Kore.ai | I’m passionate about exploring the intersection of AI and language. Language Models, AI Agents, Agentic Apps, Dev Frameworks & Data-Driven Tools shaping tomorrow.