@tunguz: Here is one big reason why this matters. Time spent on non-LLM inference tasks is only going to increase. However, tool…
Summary
A post highlights that 42% of time in modern agentic coding is spent on CPU-based tool use, which is inefficient and presents a major opportunity to redesign these tools for AI agents.
View Cached Full Text
Cached at: 05/24/26, 12:13 AM
Here is one big reason why this matters. Time spent on non-LLM inference tasks is only going to increase. However, tools that these AI system use are very inefficient and have been built from the ground up for CPU and human use. There is a huge untapped opportunity there to significantly improve those processes with AI agents in mind from the ground up.
SemiAnalysis (@SemiAnalysis_): FACT ALERT 🚨 : In modern agentic coding, 42% of the time is spent on CPU doing tool use such as editing files, running Bash scripts, running lints, etc. The economy of traditional cloud computing charges at $ per cpu core. In the economy of agents, the business model is $ per
Similar Articles
LLM Agents Already Know When to Call Tools -- Even Without Reasoning
This paper introduces When2Tool, a benchmark to study when LLM agents actually need to call tools, and reveals that models already know tool necessity from hidden states but fail to act. The proposed Probe&Prefill method reduces unnecessary tool calls by 48% with minimal accuracy loss.
Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic
IBM Research explores how agent logic—software primitives like knowledge graphs and program analysis—can guide LLM-based agents to efficiently handle complex enterprise workflows, reducing hallucinations and costs while improving outcomes.
GLM 5.1 Thinks Strategically, Data-Center Revolt Intensifies, When Helpful LLMs Turn Unhelpful, Humanoid Robots Get to Work
Andrew Ng discusses how coding agents accelerate different types of software work at varying speeds, with frontend development benefiting most and research least.
Ai agents
Analysis of Goldman Sachs research comparing costs of AI agents vs humans across coding, support, and data entry, with projections of token consumption growth and falling inference costs. Discusses productivity gains, job displacement, and opportunities in healthcare.
Stateful Inference for Low-Latency Multi-Agent Tool Calling
This paper presents a stateful inference architecture for multi-agent tool calling that reuses KV cache across turns and employs speculative decoding, achieving 2.1x-4.2x speedup over vLLM and SGLang on agentic workflows.