prompt-caching

Tag

Cards List
#prompt-caching

Why does it feel like big LLM providers are literally hiding prompt caching?

Reddit r/artificial · 2d ago

An article discussing how prompt caching can significantly reduce LLM API costs, pointing out that providers under-explain it and offering a simple rule to structure prompts for maximum cache hits.

0 favorites 0 likes
#prompt-caching

@FinanceYF5: Claude is now officially available on Microsoft Foundry, fully open today. Use Azure accounts directly with existing authentication, billing, and compliance. Initial launch includes Claude Opus 4.8 and Haiku 4.5, supporting prompt caching...

X AI KOLs Timeline · 4d ago Cached

Claude is now officially available on Microsoft Foundry, allowing Azure accounts to use it directly with existing authentication, billing, and compliance. The initial rollout includes Claude Opus 4.8 and Haiku 4.5, supporting prompt caching and extended thinking.

0 favorites 0 likes
#prompt-caching

@LangChain: Alex recently joined the @LangChain_OSS team, and he published his first article on how Deep Agents uses prompt caching…

X AI KOLs Timeline · 2026-06-26 Cached

Alex, a new LangChain team member, published an article explaining how Deep Agents uses prompt caching to reduce API costs.

0 favorites 0 likes
#prompt-caching

Fable 5 just made cost-aware model routing mandatory for agent builders

Reddit r/AI_Agents · 2026-06-09

Anthropic released Fable 5, a powerful new model with high pricing, making cost-aware routing essential for agent builders due to token fan-out and high output costs.

0 favorites 0 likes
#prompt-caching

How I easily cut my input token burn ~90% on long agent runs

Reddit r/AI_Agents · 2026-06-01

The author shares a practical tip to reduce input token costs by ~90% on long agent runs using prompt caching: placing unchanged text (system prompt, tool definitions, context) at the start of every prompt to leverage cached prefixes from LLM providers.

0 favorites 0 likes
#prompt-caching

Measured token consumption across 4 agent runtimes doing the same tasks. Costs ranged from 1x to 4x depending on cache architecture

Reddit r/AI_Agents · 2026-05-27

A comparison of token consumption across four agent runtimes (Claude Code, OpenClaw, Hermes, and OpenClacky) on the same tasks reveals costs ranging from 0.8x to 4x relative to Claude Code, driven by differences in cache architecture and tool schema design.

0 favorites 0 likes
#prompt-caching

@freeman1266: Slash AI coding costs by 80% monthly with optimization strategies and model routing. Inefficient context management and blind use of expensive models can cause bills to skyrocket. By implementing prompt caching, trimming context files, and fixing auto-loops in tool calls, developers can significantly reduce ineffective token consumption.…

X AI KOLs Timeline · 2026-05-26

This article introduces practical techniques to cut AI coding costs by 80%, including prompt caching, context trimming, multi-model routing (using Kimi 2.6 for daily coding tasks and advanced models for core architecture), and more.

0 favorites 0 likes
#prompt-caching

@pallavishekhar_: How to reduce token usage in AI Agents? Let's understand. AI Agents use LLMs to think, plan, and recommend tools. Every…

X AI KOLs Timeline · 2026-05-22 Cached

This thread shares strategies to reduce token usage in AI agents, including prompt caching, context summarization, using smaller models, trimming tool outputs, subagents, RAG, and tight system prompts.

0 favorites 0 likes
#prompt-caching

@nateherk: https://x.com/nateherk/status/2057450555212013627

X AI KOLs Timeline · 2026-05-21 Cached

A practical guide explaining how prompt caching works in Claude Code, how it reduces token costs by 90%, and common habits that break the cache, helping developers extend session length and reduce costs.

0 favorites 0 likes
#prompt-caching

10 Ways To Reduce Your LLM API Costs

Reddit r/AI_Agents · 2026-05-20

A practical guide listing 10 strategies to reduce costs when using LLM APIs, including model selection, prompt caching, batch processing, and monitoring expenses.

0 favorites 0 likes
#prompt-caching

@akshay_pachaar: RAG vs. CAG, clearly explained! RAG is great, but it has a major problem: Every query hits the vector DB. Even for stat…

X AI KOLs Following · 2026-05-19 Cached

Explains Cache-Augmented Generation (CAG) as a method to cache static knowledge directly in the model's KV memory, reducing latency and cost compared to traditional RAG, and shows how to combine both for optimal performance.

0 favorites 0 likes
#prompt-caching

Every AI prompt costs money — and that changes everything

Reddit r/AI_Agents · 2026-05-18

The article argues that the real challenge in AI isn't just building smarter models but making them cost-efficient at scale, highlighting the importance of reducing token usage, improving speed, and optimizing infrastructure.

0 favorites 0 likes
#prompt-caching

Tokenomics: the 62.5-minute rule for Claude's cache (8 minute read)

TLDR AI · 2026-05-18 Cached

An analysis of Anthropic's prompt caching costs for Claude derives a 62.5-minute break-even rule: refresh the cache if you expect to need it again within that time, otherwise let it expire to save costs.

0 favorites 0 likes
#prompt-caching

@0xMovez: Anthropic Head of Product just dropped a 28-minute masterclass on how to put agents into production with real-world use…

X AI KOLs Timeline · 2026-05-13

Anthropic's Head of Product released a free 28-minute masterclass on putting AI agents into production, covering prompt caching, tool search, programmatic tool calling, compaction, and advisor strategy.

0 favorites 0 likes
#prompt-caching

@gneubig: "The Math Behind the Cost of AI Agents" Nice, clear, tutorial by Vasco Schiavo at @OpenHandsDev on why agents can be ex…

X AI KOLs Following · 2026-05-13 Cached

A tutorial by Vasco Schiavo explaining the math behind the cost of AI agents, focusing on why agents can be expensive and the importance of prompt caching.

0 favorites 0 likes
#prompt-caching

prompt caching, but for rl training - 7.5x speedup on long-prompt/short-response workloads

Reddit r/LocalLLaMA · 2026-05-11

A new optimization technique for open-source RL training engines introduces prompt caching during training, achieving up to 7.5x speedup on long-prompt, short-response workloads by reducing redundant compute.

0 favorites 0 likes
#prompt-caching

Anthropic says OpenClaw-style Claude CLI usage is allowed again

Hacker News Top · 2026-04-21 Cached

OpenClaw is a CLI tool that supports Anthropic Claude models via API key or Claude CLI reuse, with features including adaptive thinking defaults for Claude 4.6, fast mode service tier toggling, and configurable prompt caching. Anthropic has reportedly re-allowed OpenClaw-style Claude CLI usage.

0 favorites 0 likes
#prompt-caching

Prompt Caching in the API

OpenAI Blog · 2024-10-01 Cached

OpenAI introduces Prompt Caching, an automatic feature that reduces API costs by 50% and improves latency by reusing recently cached input tokens on GPT-4o, GPT-4o mini, o1-preview, and o1-mini models. The feature automatically applies to prompts longer than 1,024 tokens without requiring developer integration changes.

0 favorites 0 likes
#prompt-caching

Explains how prompt caching works in LLMs, using Claude as a case study, detailing the transformer's KV cache mechanism and the cost benefits of caching static prefixes in agentic workflows.

X AI KOLs · 2026-06-23 Cached

Explains how prompt caching works in LLMs, using Claude as a case study, detailing the transformer's KV cache mechanism and the cost benefits of caching static prefixes in agentic workflows.

1 favorites 1 likes
← Back to home

Submit Feedback