@DeRonin_: i ran Fable 5 the whole day and still haven't touched my limits why? i stopped paying surgeon rates for small talk here…

X AI KOLs Following News

Summary

A user shares a detailed workflow strategy for efficiently using multiple AI models (Fable, Opus, Codex, DeepSeek, GLM, Qwen, Kimi) by delegating tasks based on cost and capability, using a single CLAUDE.md routing table, and avoiding small talk to reduce token usage.

i ran Fable 5 the whole day and still haven't touched my limits why? i stopped paying surgeon rates for small talk here's how i actually run it: 1. i don't small-talk it every "thanks!" makes it re-read the whole conversation at its price point it's an architect, not a roommate. Fable treats Opus 4.8 like the new Haiku now, so i push everyday tasks (standups, cleanup, small edits) to Opus and save Fable for the real problem 2. i route everything from one CLAUDE.md this is the big one. i keep a single routing table in my CLAUDE.md and let Fable act as the orchestrator that reads it and dispatches rough shape of the file: - Fable → planning, architecture, reviewing every stage - Opus / Codex → implementation labor - DeepSeek + GLM + Qwen (dirt cheap) → bulk grunt work: boilerplate, test writing, data cleaning, translations, first-draft docs - Kimi / long-context models → reading huge files so Fable never spends its tokens on it Fable never touches the cheap work directly. it plans, delegates to the right tier per task, then checks it against the plan. the expensive brain only spends tokens deciding that one file is why my bill went DOWN while my output went up 3. one big brief, not twenty tiny prompts it holds hours of context in its head, so i hand it the whole messy thing at once: full context, the constraints, what i'm actually scared of i gave it a refactor i'd been dreading for weeks in a single brief and it came back done. drip-feeding it line-by-line wastes the one thing it's best at 4. i frame requests defensively to dodge the classifier the new one is jumpy and government-mandated. harmless prompts (especially security or bio stuff) get silently downgraded to Opus learned this the hard way. now i phrase it defensively: "review this for compliance" instead of "find vulnerabilities" and if it trips anyway, i don't argue with it. fresh chat, rephrase neutrally, move on 5. i never ask it to explain its reasoning that one request can trip the same filter, and your work quietly gets handled by a weaker model while you think you're still on Fable 6. i give it a finish line it can't fake instead of "make it work" i write "run the tests, paste the output, or stop after 25 turns" i skipped the brake once and watched it burn through my afternoon. someone else got billed $960 on a single prompt. the pasted-proof line also kills fake "done" reports 7. i save it for the hardest thing on my desk if you use it like a cheaper model, it performs like one and just costs more the gap only shows up on problems hard enough to reveal it. so i bring it the thing i've been postponing for 3 weeks use the specialist like a specialist and your limit lasts all week gl
Original Article
View Cached Full Text

Cached at: 07/02/26, 08:27 PM

i ran Fable 5 the whole day and still haven’t touched my limits

why?

i stopped paying surgeon rates for small talk

here’s how i actually run it:

  1. i don’t small-talk it

every “thanks!” makes it re-read the whole conversation at its price point

it’s an architect, not a roommate. Fable treats Opus 4.8 like the new Haiku now, so i push everyday tasks (standups, cleanup, small edits) to Opus and save Fable for the real problem

  1. i route everything from one CLAUDE.md

this is the big one. i keep a single routing table in my CLAUDE.md and let Fable act as the orchestrator that reads it and dispatches

rough shape of the file:

  • Fable → planning, architecture, reviewing every stage
  • Opus / Codex → implementation labor
  • DeepSeek + GLM + Qwen (dirt cheap) → bulk grunt work: boilerplate, test writing, data cleaning, translations, first-draft docs
  • Kimi / long-context models → reading huge files so Fable never spends its tokens on it

Fable never touches the cheap work directly. it plans, delegates to the right tier per task, then checks it against the plan. the expensive brain only spends tokens deciding

that one file is why my bill went DOWN while my output went up

  1. one big brief, not twenty tiny prompts

it holds hours of context in its head, so i hand it the whole messy thing at once: full context, the constraints, what i’m actually scared of

i gave it a refactor i’d been dreading for weeks in a single brief and it came back done. drip-feeding it line-by-line wastes the one thing it’s best at

  1. i frame requests defensively to dodge the classifier

the new one is jumpy and government-mandated. harmless prompts (especially security or bio stuff) get silently downgraded to Opus

learned this the hard way. now i phrase it defensively: “review this for compliance” instead of “find vulnerabilities”

and if it trips anyway, i don’t argue with it. fresh chat, rephrase neutrally, move on

  1. i never ask it to explain its reasoning

that one request can trip the same filter, and your work quietly gets handled by a weaker model while you think you’re still on Fable

  1. i give it a finish line it can’t fake

instead of “make it work” i write “run the tests, paste the output, or stop after 25 turns”

i skipped the brake once and watched it burn through my afternoon. someone else got billed $960 on a single prompt. the pasted-proof line also kills fake “done” reports

  1. i save it for the hardest thing on my desk

if you use it like a cheaper model, it performs like one and just costs more

the gap only shows up on problems hard enough to reveal it. so i bring it the thing i’ve been postponing for 3 weeks

use the specialist like a specialist and your limit lasts all week

gl

Similar Articles