@_avichawla: I cut Fable 5 token usage 2.5x with just one change! - Before: 5.5 M tokens · 7 errors · $8.94 - After: 2.3 M tokens · …
Summary
The author reduced token usage for an AI agent by 2.5x by switching from Firebase to InsForge, an open-source backend platform for agentic coding, cutting tokens from 5.5M to 2.3M and eliminating manual interventions.
View Cached Full Text
Cached at: 06/10/26, 05:55 PM
I cut Fable 5 token usage 2.5x with just one change!
- Before: 5.5 M tokens · 7 errors · $8.94
- After: 2.3 M tokens · 0 errors · $4.17
The final build was the same for both, but the path the agent took wildly differed.
In both runs, the agent started with the same thing, i.e., it understood the backend before building anything, like:
- Permission policies
- Available storage buckets
- Auth providers configured
- How edge functions are deployed
The first run used Firebase, which was built for a human dev using a dashboard.
While the dev can read the above state by clicking through tabs, an agent has no dashboard. So it gathered the same info through API calls.
And there’s no single Firebase call that returned this info. The agent required to query multiple times, and each query over-returned.
For instance, when the agent asked how sign-in is configured, Firebase also returned the entire auth surface and every method it supported.
This was far more context than what it needed. And it repeated across every part of the backend it inspected.
Some states (like which auth providers are active) weren’t queryable at all. I provided it myself. Otherwise, the agent would have guessed.
Errors further compounded the token usage.
When a dev sees “permission denied,” they can look at the console and figure out whether it’s a rule, a path, or an unauthenticated request.
Firebase returned the same string to the agent as well, and it had none of that surrounding context to debug.
So it guessed again, picked the most likely cause, and rewrote code, utilizing more tokens.
This Firebase setup cost me 5.5M tokens and 7 manual interventions during errors on a full-stack RAG app.
But I brought that down to 2.3M tokens and 0 manual interventions by using InsForge as the backend context engineering layer (open-source and self-hostable via Docker).
It provides the same primitives as Supabase/Firebase, but structures the entire information layer for agents, instead of dashboards.
In one CLI call that consumed ~500 tokens, the agent saw the full backend topology before writing a single line of code.
This included auth, database, storage, edge functions, model gateway, micro VMs, and deployment.
Also, instead of loading the entire product surface into context on every task, four narrowly scoped skills activated only when relevant to keep cognitive load minimal.
And to ensure efficient retries if needed, every CLI operation returned structured JSON with meaningful exit codes, so the agent never guessed what to do next.
Here’s the InsForge GitHub Repo: http://github.com/InsForge/InsForge…
(don’t forget to star it )
The video below depicts the final build, comparing Firebase and InsForge.
To dive deeper, I recently published a full walkthrough building the same RAG app on both backends and inspected them end-to-end.
Read it below.
InsForge/InsForge
Source: https://github.com/InsForge/InsForge
⭐ Help us reach more developers and grow the InsForge community. Star this repo!
InsForge
The all-in-one, open-source backend platform for agentic coding. InsForge gives your coding agent database, auth, storage, compute, hosting, and AI gateway to ship full-stack apps end-to-end.
https://github.com/user-attachments/assets/345efbc6-ca63-4189-bde0-12ef3bda561b
How it works
Coding agents interact with InsForge through one of two interfaces:
- MCP Server (self-hosted and cloud): exposes InsForge’s operations as tools any MCP-compatible agent can call.
- CLI + Skills (cloud only): a command-line interface paired with Skills that agents invoke directly from the terminal.
Both interfaces let coding agents operate the backend like backend engineers:
- Read backend context and state: Pull documentation, schemas, metadata (deployed functions, bucket contents, auth config), and runtime logs, so the agent has what it needs to write code, verify what it built, and debug when something breaks.
- Configure primitives: Deploy edge functions, run database migrations, create storage buckets, set up auth providers, and configure other backend resources directly.
graph TB
subgraph TOP[" "]
AG[AI Coding Agents]
end
subgraph MID[" "]
SL[InsForge]
end
AG --> SL
SL --> AUTH[Authentication]
SL --> DB[Database]
SL --> ST[Storage]
SL --> EF[Edge Functions]
SL --> MG[Model Gateway]
SL --> CP[Compute]
SL --> DEP[Deployment]
classDef bar fill:#0b0f14,stroke:#30363d,stroke-width:1px,color:#ffffff
classDef card fill:#161b22,stroke:#30363d,stroke-width:1px,color:#ffffff
class AG,SL bar
class AUTH,DB,ST,EF,MG,CP,DEP card
style TOP fill:transparent,stroke:transparent
style MID fill:transparent,stroke:transparent
linkStyle default stroke:#30363d,stroke-width:1px
Core Products:
- Authentication: User management, authentication, and sessions
- Database: Postgres relational database
- Storage: S3 compatible file storage
- Model Gateway: OpenAI compatible API across multiple LLM providers
- Edge Functions: Serverless code running on the edge
- Compute (private preview): Long-running container services
- Site Deployment: Site build and deployment
⭐️ Star the Repository
If you find InsForge useful or interesting, a GitHub Star ⭐️ would be greatly appreciated.
Quickstart
Cloud-hosted: insforge.dev
Self-hosted: Docker Compose
Prerequisites: Docker + Node.js
1. Setup
You can run InsForge locally using Docker Compose. This will start a local InsForge instance on your machine.
Or run from source:
# Run with Docker
git clone https://github.com/InsForge/InsForge.git
cd insforge
cp .env.example .env
docker compose -f docker-compose.prod.yml up
2. Connect InsForge MCP
Follow the steps to connect InsForge MCP Server
3. Verify installation
To verify the connection, send the following prompt to your agent:
I'm using InsForge as my backend platform, call InsForge MCP's fetch-docs tool to learn about InsForge instructions.
4. Running Multiple Projects
You can run multiple InsForge projects on the same host by using different ports and project names.
# Create a separate env file for each project
cp .env.example .env.project1
cp .env.example .env.project2
Edit .env.project2 with different ports:
POSTGRES_PORT=5442
POSTGREST_PORT=5440
APP_PORT=7230
AUTH_PORT=7231
DENO_PORT=7233
Start each project with a unique name:
docker compose -f docker-compose.prod.yml --env-file .env.project1 -p project1 up -d
docker compose -f docker-compose.prod.yml --env-file .env.project2 -p project2 up -d
Each project gets its own isolated database, storage, and configuration. Manage them with:
docker compose -f docker-compose.prod.yml --env-file .env.project1 -p project1 ps # status
docker compose -f docker-compose.prod.yml --env-file .env.project1 -p project1 logs -f # logs
docker compose -f docker-compose.prod.yml --env-file .env.project1 -p project1 down # stop
One-click Deployment
In addition to running InsForge locally, you can also launch InsForge using a pre-configured setup. This allows you to get up and running quickly with InsForge without installing Docker on your local machine.
| Railway | Zeabur | Sealos |
|---|---|---|
Contributing
Contributing: If you’re interested in contributing, you can check our guide here CONTRIBUTING.md. We truly appreciate pull requests, all types of help are appreciated!
Support: If you need any help or support, we’re responsive on our Discord channel, and also feel free to email us [email protected] too!
Documentation & Support
Documentation
- Official Docs - Comprehensive guides and API references
Community
Contact
- Email: [email protected]
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Badges
Show your project is built with InsForge.
Made with InsForge
Markdown:
[](https://insforge.dev)
HTML:
<a href="https://insforge.dev">
<img
width="168"
height="30"
src="https://insforge.dev/badge-made-with-insforge.svg"
alt="Made with InsForge"
/>
</a>
Made with InsForge (dark)
Markdown:
[](https://insforge.dev)
HTML:
<a href="https://insforge.dev">
<img
width="168"
height="30"
src="https://insforge.dev/badge-made-with-insforge-dark.svg"
alt="Made with InsForge"
/>
</a>
⭐ Star us on GitHub to get notified about new releases!
Similar Articles
@_avichawla: Claude Code used 3x fewer tokens with one change: - Before: 10.4M tokens · 10 errors · $9.21 - After: 3.7M tokens · 0 e…
By swapping to Insforge Skills + CLI as the backend context layer, a user cut Claude Code token usage by 64 %, eliminated all errors and reduced cost from $9.21 to $2.81.
@_avichawla: A smarter Claude model burns more tokens, not fewer! And it's not a minor 3-5% difference. But 54% higher token usage. …
The article analyzes why smarter AI agents like Claude consume more tokens when interacting with human-centric backends like Supabase due to inefficient context discovery. It introduces InsForge, an open-source backend tool designed for agents that provides structured context to significantly reduce token usage and manual interventions.
@_avichawla: https://x.com/_avichawla/status/2063548691353629040
Explains how a traditional backend inflates AI agent token usage and demonstrates a context-engineering approach that reduces Claude Code session costs by 2.5x without changing models or prompts.
@ClementDelangue: Token costs are why there will be no saas apocalypse / good dev tools are cached intelligence for agents! The popular t…
Hugging Face's hf CLI is shown to be far more token-efficient and successful for AI agents than hand-rolling raw API calls, with benchmarks showing up to 6x fewer tokens and 94% vs 84% task success, demonstrating that good abstractions are cached intelligence for agents.
@pallavishekhar_: How to reduce token usage in AI Agents? Let's understand. AI Agents use LLMs to think, plan, and recommend tools. Every…
This thread shares strategies to reduce token usage in AI agents, including prompt caching, context summarization, using smaller models, trimming tool outputs, subagents, RAG, and tight system prompts.
