@ClementDelangue: Token costs are why there will be no saas apocalypse / good dev tools are cached intelligence for agents! The popular t…
Summary
Hugging Face's hf CLI is shown to be far more token-efficient and successful for AI agents than hand-rolling raw API calls, with benchmarks showing up to 6x fewer tokens and 94% vs 84% task success, demonstrating that good abstractions are cached intelligence for agents.
View Cached Full Text
Cached at: 06/06/26, 01:22 AM
Token costs are why there will be no saas apocalypse / good dev tools are cached intelligence for agents!
The popular theory goes: agents can write code, so they’ll just rebuild every tool from scratch and hit raw APIs. no more dev tools, no more CLIs, no more software layers. just agents and endpoints!
We just tested this and the data says the opposite. We benchmarked Claude Code and Codex on real Hugging Face Hub tasks (~1,000 graded runs), with two setups: the agent-optimized hf CLI vs the agent hand-rolling curl or SDK calls from scratch.
Hand-rolling burns up to 6x more tokens on multi-step tasks and fails more often (84% vs 94% task success).
And that’s just dropping one abstraction layer. It would obviously be orders of magnitude more tokens and a dramatically higher failure rate if the agent tried to bypass HF altogether and rebuild model hosting, versioning, and distribution from scratch. Every time an agent re-derives a workflow from raw API calls, you pay for that reasoning in tokens. every single run. a good CLI compresses that entire chain into a few high-level commands the agent can’t get wrong. In a world where everyone is complaining tokens are too expensive, abstraction is leverage: thousands of hours of design decisions your agent doesn’t have to re-reason about at inference time.
Good tools are cached intelligence for agents!
So no, agents won’t rebuild everything from scratch. they’ll gravitate to the most token-efficient tools, because that’s what their owners pay for. The software that survives won’t just be accessible to agents, it will be accurate and cheap for them to drive.
We’re seeing it happen with HF, which is becoming the platform for agents to use AI: ~49M requests in just two months, and growing fast!
https://huggingface.co/blog/hf-cli-for-agents…
Designing the hf CLI as an agent-optimized way to work with the Hub
Source: https://huggingface.co/blog/hf-cli-for-agents Back to Articles
- AI agent traffic on the Hub
- Built for humans and agents- One command, multiple renderings - Next-command hints - Non-blocking and safe to retry - Discoverable, predictable commands
- Benchmarking the hf CLI for Coding Agents- The setup - The results - Key findings
- The hf-cli skill
- Try it yourself
- Register an agent harness
hfis the official command-line entrypoint to the Hugging Face Hub. Anything you can do on the Hub from the Python SDK, you can do from your terminal: download and upload models, datasets and Spaces; create and manage repos, branches, tags and pull requests; run Jobs on HF infrastructure; manage Buckets, Collections, webhooks and Inference Endpoints.
ThehfCLI has been primarily built for our users over the years. But it’s now increasingly used bycoding agents: Claude Code, Codex, Cursor and more. So we rebuilt it to make it work for both audiences at once. This blog post summarizes what we did, and how we benchmarked it. We found that on complex, multi-step tasks the no-CLI baseline (an agent hand-rollingcurlor the Python SDK) uses up to6× as many tokensas thehfCLI.
https://huggingface.co/blog/hf-cli-for-agents#ai-agent-traffic-on-the-hubAI agent traffic on the Hub
We started tracking agent usage of the Hub in April 2026. ThehfCLI (and thehuggingface\_hubPython SDK it’s built on) detects when a coding agent is driving it by reading the environment variables agents set:CLAUDECODE/CLAUDE\_CODEfor Claude Code,CODEX\_SANDBOXfor Codex, plus Cursor, Gemini, Pi, and the universalAI\_AGENT. That single signal does two jobs: it shapes the CLI’s output (more on that below) and it tags each Hub request with anagent/<name\>user-agent, so we can attribute traffic to the agent driving it. The two largest by distinct users areClaude Code and Codex, well ahead of everything else, and they’re the two agents we benchmark later in this article.


The bars count distinct users per agent; request volume is the sub-label. Claude Code alone is ~40k users and nearly 49M requests, with Codex close behind. These are early numbers (we only began attributing agent traffic in April 2026), but the scale is already significant, and we expect it to keep growing as coding agents become a standard way to work with the Hub.
https://huggingface.co/blog/hf-cli-for-agents#built-for-humans-and-agentsBuilt for humans and agents
Humans and coding agents expect different outputs for the samehfcommands. A human wants rich terminal output: ANSI color, padded tables truncated to fit the screen, a green ✅ on success,✔for booleans, progress bars, prose hints. An agent wants the inverse: no ANSI, nothing truncated, every value in full since an agent can handle far denser output than a human, kept compact and structured to stay light on tokens. It also can’t answer a CLI prompt and will happily re-run a command after a timeout. The rest of this section is howhfgives each side what it needs. We introduced agent-mode output inhfv1.9.0 and have been migrating the rest of the CLI to it gradually in the following releases.
https://huggingface.co/blog/hf-cli-for-agents#one-command-multiple-renderingsOne command, multiple renderings
Whenhfauto-detects agent use (via the environment variables mentioned above), it renders thesame commanddifferently. It optimizes output format for humans or agents without passing a flag:
# human (default in a terminal): aligned table, truncated to fit, with a hint
> hf models ls --author Qwen --sort downloads --limit 3
ID CREATED_AT DOWNLOADS LIBRARY_NAME LIKES PIPELINE_TAG PRIVATE TAGS
------------------------ ---------- --------- ------------ ----- --------------- ------- -------------------------
Qwen/Qwen3-0.6B 2025-04-27 21156913 transformers 1285 text-generation transformers, safetens...
Qwen/Qwen2.5-1.5B-Ins... 2024-09-17 15143953 transformers 725 text-generation transformers, safetens...
Qwen/Qwen3-4B 2025-04-27 14808352 transformers 625 text-generation transformers, safetens...
Hint: Use `--no-truncate` or `--format json` to display full values.
# agent (auto-detected): TSV, full ids + ISO timestamps + every tag, nothing truncated
$ hf models ls --author Qwen --sort downloads --limit 3
id created_at downloads library_name likes pipeline_tag private tags
Qwen/Qwen3-0.6B 2025-04-27T03:40:08+00:00 21156913 transformers 1285 text-generation False ['transformers', 'safetensors', 'qwen3', 'text-generation', 'conversational', 'arxiv:2505.09388', 'base_model:Qwen/Qwen3-0.6B-Base', 'base_model:finetune:Qwen/Qwen3-0.6B-Base', 'license:apache-2.0', 'text-generation-inference', 'endpoints_compatible', 'deploy:azure', 'region:us']
Qwen/Qwen2.5-1.5B-Instruct 2024-09-17T14:10:29+00:00 15143953 transformers 725 text-generation False['transformers', 'safetensors', 'qwen2', 'text-generation', 'chat', 'conversational', 'en', 'arxiv:2407.10671', 'base_model:Qwen/Qwen2.5-1.5B', 'base_model:finetune:Qwen/Qwen2.5-1.5B', 'license:apache-2.0', 'text-generation-inference', 'endpoints_compatible', 'deploy:azure', 'region:us']
Qwen/Qwen3-4B 2025-04-27T03:41:29+00:00 14808352 transformers 625 text-generation False ['transformers', 'safetensors', 'text-generation', 'arxiv:2309.00071', 'arxiv:2505.09388', 'base_model:Qwen/Qwen3-4B-Base', 'base_model:finetune:Qwen/Qwen3-4B-Base', 'license:apache-2.0', 'endpoints_compatible', 'deploy:azure', 'region:us']
Ahumangets an aligned table, truncated to fit the terminal, plus a hint on how to see more, with color cues for status (a green✓on success, red on error). Anagentgets the complete record as TSV: full repo ids, full ISO timestamps, every tag, no ANSI codes, nothing truncated, clean to parse and light on tokens.
In practice, we’ve implemented logging methods like\.table\(\.\.\.\),\.result\(\.\.\.\),\.json\(\), etc., which take raw data as input and handle the formatting. In addition to human and agent modes, we’ve introduced\-\-jsonand\-\-quietoptions to make it easier to pipe commands together. The default mode is automatically chosen based on context, but users can always force the format of their choice with\-\-format human \| agent \| json \| quiet.
https://huggingface.co/blog/hf-cli-for-agents#next-command-hintsNext-command hints
CLI commands rarely run in isolation: one step usually implies the next (git add, thengit commit). Manyhfcommands now end with ahint: the exact next command to run, pre-filled with the IDs you just used, so a user or agent can chain straight to the next step instead of working it out from scratch. Start a Job in the background and it points you to its logs; create a Space and it points you to its boot status:
$ hf jobs run --detach python:3.12 python train.py
✓ Job started
id: 6f3a1c2e9b
url: https://huggingface.co/jobs/celinah/6f3a1c2e9b
Hint: Use `hf jobs logs 6f3a1c2e9b` to fetch the logs.
For a human that’s a convenience. For an agent it’s a rail: the next action is named, parameterized with the right ids, and ready to run, so it takes fewer steps working out what to do. Errors behave the same way, naming the fix instead of just failing:
Error: Not logged in. Run `hf auth login` first.
Hints, warnings and errors all go to stderr while data goes to stdout, so none of this guidance pollutes the output the agent is parsing.
https://huggingface.co/blog/hf-cli-for-agents#non-blocking-and-safe-to-retryNon-blocking and safe to retry
hfnever sits on an interactive prompt waiting for a key an agent can’t press. A destructive command still asks a human to confirm, but in agent mode itfails fastwith the fix in the message (Use \-\-yes to skip confirmation\.), and\-y/\-\-yesskips it. And because agents retry on timeouts and lost context, operations are built to be safe to repeat:hf repos create \-\-exist\-okis a no-op if the repo already exists, and re-running an upload re-commits cleanly. Separately, the commands that move real data take a\-\-dry\-runthat shows exactly what they’ll transfer before they run, which proves handy for humans and agents alike, since neither has to commit to a long download or blind sync:
# agent mode: a destructive command without --yes refuses, with the fix in the message
$ hf repos delete my-org/old-model
Error: You are about to permanently delete model 'my-org/old-model'. Proceed? Use --yes to skip confirmation.
# commands that move data take --dry-run to preview the transfer first
$ hf download deepseek-ai/DeepSeek-V4-Pro config.json --dry-run
[dry-run] Will download 1 files (out of 1) totalling 1.8K.
file size
config.json 1.8K
https://huggingface.co/blog/hf-cli-for-agents#discoverable-predictable-commandsDiscoverable, predictable commands
hfis built to be probed: runhfto see the resource groups, run\-\-helpon the one you need, and every\-\-helpends with real, copy-pasteable examples (which an agent matches against far faster than it parses a description):
$ hf models ls --help
...
Examples
$ hf models ls --sort downloads --limit 10
$ hf models ls --search "qwen" --author Qwen
$ hf models ls Qwen/Qwen3-4B --tree
The command tree is consistent,resource + verbwith the obvious aliases (hf models ls,hf repos create,hf jobs ps,hf collections delete;list/ls,remove/rm), so once an agent learns one command it can guess the rest. And the output composes:\-qprints one id per line to pipe into the next command,\-\-jsongives you something to hand tojq.
$ hf models ls --author Qwen -q | head -3
Qwen/Qwen3-0.6B
Qwen/Qwen2.5-1.5B-Instruct
Qwen/Qwen3-4B
https://huggingface.co/blog/hf-cli-for-agents#benchmarking-the-hf-cli-for-coding-agentsBenchmarking the hf CLI for Coding Agents
To find out whether thehfCLI is really more efficient for agents, we measured it. We built a small evaluation harness and ran the same set of Hub tasks through each way of driving the Hub, many times over, grading every run against the live Hub. Here’s the headline before the methodology: across both agents thehfCLI comes out ahead, most clearly on complex, multi-step tasks where it uses far fewer tokens.
agenttoolsuccess scoretoken usageself-report errorClaude Code (Sonnet 4.6)hfCLI0.94baseline2 / 163curl / Python SDK0.841.3-1.6× tokens11 / 163Codex (GPT-5.5)hfCLI0.93baseline3 / 163curl / Python SDK0.921.6-1.8× tokens10 / 163
(self-report error = the agent reported success on the 17 solvable tasks but the Hub said otherwise. ThehfCLI rows are the CLI with its skill installed; what the skill adds on top of the bare CLI (chiefly fewer tool calls) is broken out inthe skill sectionbelow. Representative transcripts are publishedin this bucket.)
https://huggingface.co/blog/hf-cli-for-agents#the-setupThe setup
We defined18 non-trivial Hub tasks. Not “download a file”, but the kind of thing you’d actually ask for: aggregate a trending org’s models, inspect a repo’s files and their sizes, upload a folder with include/exclude rules, delete files, copy files across repos, open a PR that adds a license, create a repo with a branch and a tag, sync and prune a bucket, build a collection. Each task goes to a fresh coding agent with exactlyoneway to talk to the Hub:
- the
hfCLI, or - curl / the Python SDK: no
hfCLI at all, so the agent falls back tocurlagainst the REST API or thehuggingface\_hubPython library.
We run thehfCLI in two configurations, with and without its skill (a generated command reference we come back to inits own section). But the headline comparison below is simply**hfCLI vs curl / the SDK**; the skill’s incremental effect is small enough that we break it out on its own rather than crowd it into the main results.
The config is deliberately clean: a fresh instance per run, no custom MCP servers, noCLAUDE\.mdorAGENTS\.md, nothing in context to nudge behavior. The task and the tool go into a single prompt, and the agent finishes with aTASK\_COMPLETEorTASK\_FAILEDmarker, but we don’t trust that marker (an agent will report success on work that never landed), so we grade every run independently byre-querying the live Hub: did the branch really get created, is the file actually gone, does the bucket exist? Each task/tool combination is run10 times, since coding agents are non-deterministic, about520 runs per agent(18 tasks × 3 tools × 10 reps, minus a cap on one billable Jobs task) and ~1,000 graded runs in total. We ran the whole thing twice, on the two most popular coding agents (Claude Codewith Sonnet 4.6 andOpenAI Codexwith GPT-5.5).
https://huggingface.co/blog/hf-cli-for-agents#the-resultsThe results
The two charts below unpack the table above. First,task success on Sonnet, the agent where curl and the SDK struggle most:


Without the CLI, curl and the SDK trail by ten points, because on Sonnet they simply can’t finish parts of the job (the writes, mostly), while thehfCLI clears them.
The second image showstoken impact on GPT-5.5, broken down per task. Each bar is the curl/SDK tokens divided by the CLI’s on the same task, so2\.4×means the non-hf version burned 2.4 times as many tokens to do the same thing:


On a one-shot read (count dataset rows, batch metadata) curl and the SDK are fine, and sometimes lighter. But as tasks get more complex and involve several dependent steps, the agent has to hand-roll the entire chain of REST calls (or dig through the SDK) and the cost blows up:2.4× to 6× the CLI’son creating a repo with a branch and tag, deleting files, copying across repos, or syncing a bucket. ThehfCLI lets the agent express the task as a few higher-level commands, rather than crafting a complex workflow.
https://huggingface.co/blog/hf-cli-for-agents#key-findingsKey findings
- The
hfCLI is far leaner than curl or the SDK.For the same task, at equal-or-better success, curl and the SDK burnroughly 1.3× to 1.8× the tokens. On easy reads they’re fine, but on real multi-step work they pay2× to 6×: the CLI composes a chain of REST calls into a few high-level commands, while curl or the SDK re-derives the chain by hand every run. - **On a stronger model, curl and the SDK work but stay wasteful.**On Sonnet they can’t finish parts of the job (the writes, mostly); on GPT-5.5 they mostly succeed, hand-rolling the REST calls (or using the SDK) correctly, but still pay well over the CLI’s token bill.
https://huggingface.co/blog/hf-cli-for-agents#the-hf-cli-skillThe hf-cli skill
hfships askill: a compact reference of the whole command surface that an agent loads as context. It’sauto-generatedfrom the livehfcommand tree, one line per command (its signature, a one-line description, and the flags that matter), grouped by resource, with a short glossary of common options. It deliberately skips the self-explanatory flags so it stays terse and light on context, and it’s regenerated every release. Runhf skills previewto print it, or install it with:
# for Codex, Cursor, OpenCode, Pi and other agents that load skills from `.agents/skills`
hf skills add
# includes the above + Claude Code
hf skills add --claude
What does it buy you? Mostly, the agent stops guessing. The clearest single view is how many commands each run takes, with the skill and without:


On both agents that’s about ten commands per task down to about seven, roughly 30% fewer tool calls. That’s because the agent isn’t probing\-\-helpto find the right command and argument. The skill won’t cut your token bill, because it prepends a fixed slice of info to the context, so tokens remain about the same or slightly tick up for the same task. The Skill won’t make the CLI more reliable either, but it will help the agent spend time running your task rather than finding out how the tool works. This could be particularly helpful when usinghfwith local models.
We ran each task in a fresh session, so the skill pays its context cost on every task. In a real multi-task session that cost amortizes (the agent learns the command surface once), so the token picture likely improves there; we didn’t measure that case.
https://huggingface.co/blog/hf-cli-for-agents#try-it-yourselfTry it yourself
We benchmarked all this because we think it matters. Agents are becoming real users of the Hub: they train models, build and clean datasets, and ship demos as Spaces, almost always on behalf of a person. A Hub that works well for agents is also a Hub that works better for the people using them. The better an agent’s tools are, the more it can do for you.
If your agent interacts with the Hugging Face Hub, we recommend giving it thehfCLI:
# macOS / Linux
curl -LsSf https://hf.co/cli/install.sh | bash
# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://hf.co/cli/install.ps1 | iex"
Then hand it the skill, so it knows the whole command surface from the first turn:
hf skills add # Codex, Cursor, OpenCode, Pi and other agents that load skills from .agents/skills
hf skills add --claude # the above + Claude Code
Then point your agent at the Hub and let it work. Make sure you’re logged in (hf auth login), then hand it a prompt like:
Use `hf` to list my Hugging Face Hub models, datasets, and Spaces.
Take a look at how I am currently using the Hub and suggest a few ways you could help me.
It’ll work out the commands on its own and come back with something useful.
The full command reference lives in thehfCLI guide.
https://huggingface.co/blog/hf-cli-for-agents#register-an-agent-harnessRegister an agent harness
Building an agent harness?**Get it registered!**That’s howhflearns to detect it, and how the Hub attributes its traffic to your harness. You simply need to open a small PR adding an entry toagent\-harnesses\.ts. Read theRegister your agent harnessguide for more details.
Similar Articles
@ClementDelangue: HF becoming the platform for agents (assisted by their humans) to use and build AI (rather than just leveraging APIs)!
Hugging Face is positioning itself as the primary platform for AI agents (and their human helpers) to build and use AI beyond simple API calls.
Designing the hf CLI as an agent-optimized way to work with the Hub
Hugging Face redesigned its `hf` CLI to be optimized for both human users and AI coding agents like Claude Code and Codex, with agent-aware output rendering and benchmarking showing up to 6× token savings versus no-CLI baselines on complex tasks.
@akshay_pachaar: https://x.com/akshay_pachaar/status/2053166970166772052
The article discusses a shift in AI agent tool usage from the 'MCP vs CLI' debate to 'Code Mode,' where agents write code to dynamically import tools, significantly reducing context window usage. It highlights Anthropic's approach and Cloudflare's implementation, demonstrating a 98.7% reduction in token consumption for specific tasks.
AI agents are making tokenization platforms far more usable than I expected
A developer shares how AI agents are improving tokenization platforms through intelligent orchestration of humans and systems, rather than full autonomy.
@_avichawla: A smarter Claude model burns more tokens, not fewer! And it's not a minor 3-5% difference. But 54% higher token usage. …
The article analyzes why smarter AI agents like Claude consume more tokens when interacting with human-centric backends like Supabase due to inefficient context discovery. It introduces InsForge, an open-source backend tool designed for agents that provides structured context to significantly reduce token usage and manual interventions.