@NFTCPS: Guys, using DeepSeek V4 Pro to run Codex, the tokens burning a hole in your pocket? You gotta know these two skills. token-saver: after modifying code, just returns a path + done, no extra words. Tests show it saves 60-80% tokens memory…
Summary
Codex skills optimized for DeepSeek V4 Pro, saves 60-80% tokens by freezing skill files and minimal output, with cross-conversation persistent memory capability.
View Cached Full Text
Cached at: 06/18/26, 10:22 PM
Hey folks, burning through tokens running Codex on DeepSeek V4 Pro? These two skills are must-haves.
token-saver: After modifying code, only returns the file path + “done” — zero fluff. Tests show 60–80% token savings.
memory: Even if context gets compressed, next session automatically recovers your habits and project structure from MEMORY.md — only ~800 tokens per use.
Bottom line: once your skill files are written, freeze them. That’s how you maximize cache hits. Grab them before you miss out.
🔗 https://github.com/lokikill123/codex-token-skills
lokikill123/codex-token-skills
Source: https://github.com/lokikill123/codex-token-skills
⚡ Codex Token Skills
Codex skills optimized for DeepSeek V4 Pro, drastically reducing token consumption and improving prefix cache hit rates.
Codex (https://github.com/openai/codex)
Skills
License
Why Do You Need This?
DeepSeek V4 Pro’s prefix caching mechanism works best when system prompts + context files are stable. If your SKILL.md or AGENTS.md files change frequently, every conversation reprocesses the entire context, wasting a ton of tokens.
These two skills solve that:
| Problem | Solution |
|---|---|
| Frequent context changes → cache misses | Freeze skill files after writing them |
| Verbose Codex output → token waste | Enforce minimal output: just path + done |
| Compressed context → lost memory | Use a file for persistent memory, recover across conversations |
Skill List
🔧 token-saver — Force Token Savings
Smart dual-mode:
- Simple tasks (fixing bugs, changing config, installing stuff): No preamble, no plan, no explanation, no verification
- Complex tasks (vibecoding, data analysis, architecture design): Allow brief interaction, but keep it tight
Tested to save 60–80% of tokens.
🧠 memory — Global Persistent Memory
When the context window gets compressed, the next conversation automatically restores from MEMORY.md:
- User preferences and habits
- Project structure and key paths
- Past decisions and feedback
Only uses ~800 tokens per session for cross-conversation consistency.
Quick Install
# Clone into Codex skills directory
git clone https://github.com/lokikill123/codex-token-skills.git
cp -r codex-token-skills/skills/* ~/.codex/skills/
# Restart Codex to apply
Or via skill-installer:
install-skill --repo lokikill123/codex-token-skills --path skills/token-saver
install-skill --repo lokikill123/codex-token-skills --path skills/memory
Cache Optimization Principle
[system prompt] + [AGENTS.md] + [each skill's SKILL.md] + [user message]
^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
system fixed these skill files frozen → prefix cache hit → 50%+ token savings
Key rule: Write skill files once and never modify them. Keep project-level dynamic content in the project’s AGENTS.md, not in the skill files.
Recommended Pairing
- Activate
token-saver+memorytogether for maximum effect - Configure auto-activation in
~/.codex/AGENTS.md - For UI development, pair with the mac-style-ui skill (see author’s other repos)
License
MIT
Similar Articles
@geekbb: MCP tool that offloads low-risk tasks from Codex to DeepSeek, letting expensive models only make judgments. Average 48% cost savings over five test tasks with about 6 seconds latency. CodexSaver is an MCP tool that delegates low-risk tasks (writing tests, documentation, code explanations...) in Codex coding sessions...
CodexSaver is an MCP tool that offloads low-risk coding tasks (tests, docs, lint fixes) from Codex to a cheaper model like DeepSeek, achieving ~48% cost savings with ~6s latency.
@NFTCPS: Holy cow! DeepSeek is trying to sweep all the Agents off the market, giving you the whole family bucket! Even Reasonix didn't escape, directly acquired. To put it in plain English: This is the native terminal that can save you the most on your DeepSeek bill. Tool calls are rock solid, no glitches. Relying on cache hits, it slashes API costs down to your ankles—broken bones don't even describe it.
DeepSeek-Reasonix is a native terminal AI coding agent based on DeepSeek, drastically reducing API costs through cache hits, with stable and reliable tool calls.
@billtheinvestor: Give Claude Code and Codex infinite memory, programming efficiency improved by 92%! The Agentmemory tool has quickly gained 4000+ stars on GitHub and is completely free. It saves all information from your coding sessions through smart compression, and automatically extracts relevant context in future sessions, avoiding re...
Agentmemory is an open-source tool that provides infinite memory for Claude Code and Codex, reducing token usage through intelligent compression, improving programming efficiency, and has gained 4000+ stars on GitHub.
@Luckyjudy666: 8 Tips to Make Codex Your Personal Assistant 1. Build a Shared Memory for Codex Core rules go in Agents.md, project background in Obsidian, repeated processes as skills, personal preferences and common questions in Memories. Otherwise, Codex is like a new colleague every time, having to explain everything from scratch...
This article shares 8 tips for using the Codex AI assistant effectively, including building shared memory, remote task execution, scheduled automation, file organization, and teaching new software operations, all aimed at improving work efficiency.
@QingQ77: A terminal AI coding agent designed specifically for DeepSeek API prefix caching mechanism, maintaining ultra-low token costs in long sessions through a cache-first architecture. https://github.com/esengine/DeepSeek-Reasonix… Reaso…
Reasonix is a terminal AI coding agent designed specifically for DeepSeek API prefix caching mechanism, achieving ultra-low token costs in long sessions through a cache-first architecture. In testing, 435 million input tokens cost only about $12, with a cache hit rate of 99.82%.