Tag
Tencent open-sourced Hy-Memory, a memory plugin for AI agents that provides long-term memory with a 6-layer dual-reasoning framework, reducing token usage by 35% and memory bloat by 70%.
ContextSpy is a local proxy tool that profiles how LLM applications use their context window, breaking down token usage by category to help developers optimize and reduce costs.
The tweet criticizes AI apps for overusing large models and introduces Dari Router, a tool designed to route requests to appropriate model sizes for efficiency.
The author reduced token usage for an AI agent by 2.5x by switching from Firebase to InsForge, an open-source backend platform for agentic coding, cutting tokens from 5.5M to 2.3M and eliminating manual interventions.
Discusses strategies and techniques for maximizing token usage in large language models to improve efficiency and output quality.
This article suggests an approach inspired by ant colonies to optimize token usage and create efficient tool chains for AI agents, similar to n8n workflows.
Tencent has open-sourced TencentDB Agent Memory, which solves the AI agent long-context overflow problem through hierarchical memory management (symbolic short-term memory + hierarchical long-term memory). Benchmarks show token consumption reduced by up to 61% and task success rate improved by over 50%.
A developer built a system to reduce token waste in AI agent workflows by reusing information across tasks, and is seeking testers for feedback.
MeshFlow is an open-source framework for production-safe multi-agent orchestration with built-in HIPAA/SOX/GDPR compliance, a SHA-256 audit chain, token cost reduction of 70-85%, and durable execution, treating governance as infrastructure.
The author shares a practical tip to reduce input token costs by ~90% on long agent runs using prompt caching: placing unchanged text (system prompt, tool definitions, context) at the start of every prompt to leverage cached prefixes from LLM providers.
Benchmarks an event-routing approach for email agents that wakes only on relevant triggers, reducing downstream token usage by 91% compared to polling.
Google introduces Gemini 3.5 Flash (Low), a new model variant that uses about 45% fewer tokens than the Medium version while outperforming the older Gemini 3 Flash (High) on SWE tasks. They have also reset quotas for all paid plans.
The author shares their three-year experience of feeding PDFs to AI, pointing out that Markdown is a better input format for AI than PDF, because PDF is essentially a mix of coordinates and characters. AI needs to parse the structure first, which is error-prone and consumes more tokens. The article provides specific cases and recommended tools (markitdown, pandoc, LlamaParse), and teases a new series called 'The Art of Feeding AI'.
CodeGraph reduces the number of times an AI coding assistant scans the entire project by building a local index graph, significantly lowering token consumption and improving speed, compatible with VS Code, Claude Code, and Cursor.
This article presents a comprehensive guide to reduce token costs in Agentic AI systems by 95%, detailing seven core techniques including tree-structured document architecture, AI auto-compression, local model management, and script-to-API calls.
BetaPRM is a process reward model that predicts both a step-level success probability and the reliability of that prediction using a Beta belief from Monte Carlo continuations, enabling adaptive computation allocation that reduces token usage by up to 33.57% while improving accuracy.
A tweet thread by @DataChaz lists 10 open-source tools to drastically reduce token usage in Claude Code and similar AI coding assistants, potentially cutting API bills by 75-98% through various optimizations.
Agentmemory is an open-source tool that provides infinite memory for Claude Code and Codex, reducing token usage through intelligent compression, improving programming efficiency, and has gained 4000+ stars on GitHub.
A tweet from @levelsio asking about tokenmaxing Claude Code, quoting Garry Tan's advice on using OpenClaw/Hermes + GBrain for a competitive AI advantage.
LyteNyte Grid AI Skills is a free open-source tool that leverages a declarative, stateless architecture to help AI agents build data grids efficiently, cutting token usage by 85% and boosting developer speed.