@tunguz: Here is one big reason why this matters. Time spent on non-LLM inference tasks is only going to increase. However, tool…

X AI KOLs Following 05/23/26, 07:32 PM News

Summary

A post highlights that 42% of time in modern agentic coding is spent on CPU-based tool use, which is inefficient and presents a major opportunity to redesign these tools for AI agents.

Here is one big reason why this matters. Time spent on non-LLM inference tasks is only going to increase. However, tools that these AI system use are *very* inefficient and have been built from the ground up for CPU and human use. There is a huge untapped opportunity there to significantly improve those processes with AI agents in mind from the ground up.

Original Article

View Cached Full Text

Cached at: 05/24/26, 12:13 AM

Here is one big reason why this matters. Time spent on non-LLM inference tasks is only going to increase. However, tools that these AI system use are very inefficient and have been built from the ground up for CPU and human use. There is a huge untapped opportunity there to significantly improve those processes with AI agents in mind from the ground up.

SemiAnalysis (@SemiAnalysis_): FACT ALERT 🚨 : In modern agentic coding, 42% of the time is spent on CPU doing tool use such as editing files, running Bash scripts, running lints, etc. The economy of traditional cloud computing charges at $ per cpu core. In the economy of agents, the business model is $ per

Similar Articles

LLM Agents Already Know When to Call Tools -- Even Without Reasoning

Hugging Face Daily Papers

This paper introduces When2Tool, a benchmark to study when LLM agents actually need to call tools, and reveals that models already know tool necessity from hidden states but fail to act. The proposed Probe&Prefill method reduces unnecessary tool calls by 48% with minimal accuracy loss.

Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic

Hugging Face Blog

IBM Research explores how agent logic—software primitives like knowledge graphs and program analysis—can guide LLM-based agents to efficiently handle complex enterprise workflows, reducing hallucinations and costs while improving outcomes.

GLM 5.1 Thinks Strategically, Data-Center Revolt Intensifies, When Helpful LLMs Turn Unhelpful, Humanoid Robots Get to Work

The Batch

Andrew Ng discusses how coding agents accelerate different types of software work at varying speeds, with frontend development benefiting most and research least.

Ai agents

Reddit r/AI_Agents

Analysis of Goldman Sachs research comparing costs of AI agents vs humans across coding, support, and data entry, with projections of token consumption growth and falling inference costs. Discusses productivity gains, job displacement, and opportunities in healthcare.

Stateful Inference for Low-Latency Multi-Agent Tool Calling

arXiv cs.LG

This paper presents a stateful inference architecture for multi-agent tool calling that reuses KV cache across turns and employs speculative decoding, achieving 2.1x-4.2x speedup over vLLM and SGLang on agentic workflows.