@morganlinton: I asked Teknium, who is probably one of the smartest agent devs in the world, what he did recently to speed up tool cal…
Summary
Teknium shares recent performance improvements for tool calling in AI agents, including deferring imports, cutting 47% of per-conversation function calls, and deferring compression feasibility checks, with links to working code on GitHub.
View Cached Full Text
Cached at: 05/21/26, 05:36 PM
I asked Teknium, who is probably one of the smartest agent devs in the world, what he did recently to speed up tool calling.
This is what he shared. So much better than an article or a deck, real examples of working code.
I mean, he’s kinda famously a 10x engineer, not that many of them on the planet, and very few who share as much about his process, code, and workflow than Tek.
We’re very lucky.
Similar Articles
@akshay_pachaar: https://x.com/akshay_pachaar/status/2053166970166772052
The article discusses a shift in AI agent tool usage from the 'MCP vs CLI' debate to 'Code Mode,' where agents write code to dynamically import tools, significantly reducing context window usage. It highlights Anthropic's approach and Cloudflare's implementation, demonstrating a 98.7% reduction in token consumption for specific tasks.
@tunguz: Here is one big reason why this matters. Time spent on non-LLM inference tasks is only going to increase. However, tool…
A post highlights that 42% of time in modern agentic coding is spent on CPU-based tool use, which is inefficient and presents a major opportunity to redesign these tools for AI agents.
When I finally instrumented my agents' tool calls, the cost breakdown surprised me. A few lessons.
The author shares lessons from instrumenting AI agent tool calls, revealing that tools like web_search can account for ~50% of spend, and highlighting the importance of tracking p95 latency and attributing costs per workflow or customer to avoid surprises.
@0xMovez: Anthropic Head of Product just dropped a 28-minute masterclass on how to put agents into production with real-world use…
Anthropic's Head of Product released a free 28-minute masterclass on putting AI agents into production, covering prompt caching, tool search, programmatic tool calling, compaction, and advisor strategy.
@_avichawla: https://x.com/_avichawla/status/2063548691353629040
Explains how a traditional backend inflates AI agent token usage and demonstrates a context-engineering approach that reduces Claude Code session costs by 2.5x without changing models or prompts.