OpenAI introduces Prompt Caching, an automatic feature that reduces API costs by 50% and improves latency by reusing recently cached input tokens on GPT-4o, GPT-4o mini, o1-preview, and o1-mini models. The feature automatically applies to prompts longer than 1,024 tokens without requiring developer integration changes.
Offering automatic discounts on inputs that the model has recently seen
# Prompt Caching in the API
Source: [https://openai.com/index/api-prompt-caching/](https://openai.com/index/api-prompt-caching/)
OpenAIOffering automatic discounts on inputs that the model has recently seen
Many developers use the same context repeatedly across multiple API calls when building AI applications, like when making edits to a codebase or having long, multi\-turn conversations with a chatbot\. Today, we’re introducing Prompt Caching, allowing developers to reduce costs and latency\. By reusing recently seen input tokens, developers can get a 50% discount and faster prompt processing times\.
Starting today, Prompt Caching is automatically applied on the latest versions of GPT‑4o, GPT‑4o mini, o1‑preview and o1‑mini, as well as fine\-tuned versions of those models\. Cached prompts are offered at a discount compared to uncached prompts\.
Here's an overview of pricing:
API calls to supported models will automatically benefit from Prompt Caching on prompts longer than 1,024 tokens\. The API caches the longest prefix of a prompt that has been previously computed, starting at 1,024 tokens and increasing in 128\-token increments\. If you reuse prompts with common prefixes, we will automatically apply the Prompt Caching discount without requiring you to make any changes to your API integration\.
Requests using Prompt Caching have a`cached\_tokens`value within the`usage`field in the API response:
Caches are typically cleared after 5\-10 minutes of inactivity and are always removed within one hour of the cache's last use\. As with all API services, Prompt Caching is subject to our[Enterprise privacy](https://openai.com/enterprise-privacy/)commitments\. Prompt caches are not shared between organizations\.
Prompt Caching is one of a variety of tools for developers to scale their applications in production while balancing performance, cost and latency\. For more information, check out the[Prompt Caching docs\(opens in a new window\)](https://platform.openai.com/docs/guides/prompt-caching)\.
A new optimization technique for open-source RL training engines introduces prompt caching during training, achieving up to 7.5x speedup on long-prompt, short-response workloads by reducing redundant compute.
OpenAI launches GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano models via API with major improvements in coding (54.6% on SWE-bench), instruction following, and 1M token context windows at lower costs. GPT-4.5 Preview will be deprecated on July 14, 2025.
OpenAI releases GPT-5.1, a new model in the GPT-5 series that dynamically adapts thinking time based on task complexity, offering 2-3x faster performance than GPT-5 while maintaining frontier intelligence. The release includes extended prompt caching (24-hour retention), new coding tools (apply_patch and shell), and a 'no reasoning' mode for latency-sensitive applications.
The author shares a practical tip to reduce input token costs by ~90% on long agent runs using prompt caching: placing unchanged text (system prompt, tool definitions, context) at the start of every prompt to leverage cached prefixes from LLM providers.
OpenAI Academy guide on prompting fundamentals that teaches users how to write clear, effective prompts to get better responses from ChatGPT through techniques like being specific, adding context, specifying output format, and breaking down complex tasks.