@FinanceYF5: 3/ Building the compound stack from bottom to top four layers. Bottom layer is primitives: Fable 5, sub-agents, worktree - most people only encounter this layer. Second layer is orchestration: goal loops, dynamic workflows, cloud Routines. Third layer is memory: state files, Skills, knowledge bases. Top layer is self-improvement: visual self-...
Summary
This tweet describes the four-layer compound stack structure of the AI agent system: bottom layer primitives (Fable 5, sub-agents, worktree), orchestration layer (goal loops, dynamic workflows, cloud Routines), memory layer (state files, Skills, knowledge bases), and top layer self-improvement (visual self-inspection, evaluation loops, rule distillation).
View Cached Full Text
Cached at: 06/16/26, 05:38 PM
3/ 🏗 Build the compound stack from the bottom up with four layers
The bottom layer is primitives: Fable 5, sub-agents, worktree—most people only encounter this layer. The second layer is orchestration: goal loops, dynamic workflows, cloud-based Routines. The third layer is memory: state files, Skills, knowledge bases. The top layer is self-improvement: visual self-checks, evaluation loops, rule distillation. https://t.co/iTu1dGA3Gb
Regret not using Fable 5 before it was shut down? It ran continuously for 6 days
1/ Most people treat Fable 5 as a faster chat box
Someone let a Fable 5 agent run continuously for 6 days with no one at the helm, and only then wrote down the conclusion: 90% of people only use 10% of its capabilities. It was built to run for days, yet people use it for just a few minutes.
2/ The real dividing line: self-learning vs self-improvement
Self-learning is the model changing its own weights. Fable 5 does not do this, and currently no production model does. Self-improvement is the system outside the model compounding: each run writes lessons into memory, skills become sharper with use. The model stays the same, but the environment gets smarter with every run.
3/ Build the compound stack from the bottom up with four layers
The bottom layer is primitives: Fable 5, sub-agents, worktree—most people only encounter this layer. The second layer is orchestration: goal loops, dynamic workflows, cloud-based Routines. The third layer is memory: state files, Skills, knowledge bases. The top layer is self-improvement: visual self-checks, evaluation loops, rule distillation.
4/ Don’t throw everything at Fable 5 to run
Its cost per token is about 5 times that of Opus 4.8 ($10 per million input, $50 per million output). Let Fable 5 be the orchestrator, Sonnet 4.6 do the heavy lifting, Haiku 4.5 act as the scorer, and automatically fall back to Opus 4.8 when blocked by the safety classifier.
5/ Never let the model grade itself
Anthropic’s own experiments: the version with an independent validator dared to make bigger changes, pushing a failed experiment to maximum victory; the self-scoring version only dared to tweak one safety parameter and quit early. The agent that writes code should never be the one that grades it.
6/ The five stages of memory: Failure → Investigation → Verification → Distillation → Retrieval
Sonnet 4.6 mostly stops at the first step, piling up failure notes no one ever reads. Fable 5 can complete the entire process; at its peak, verification coverage reached over 70%, distilling facts into reusable rules. The gap is not in the model, but in whether you have state files.
7/ Self-improvement is a property of the system, not the model
In every experiment that proves this, the models on both sides are identical. What changes is the outer system: the verifier, state files, evaluation loops. Pick a layer you haven’t implemented yet, add it tomorrow, then add the next one.
Read the original article:
That’s all.
If you like this topic:
- Follow me (@FinanceYF5)
- Like and retweet the first post below
Similar Articles
@vincemask: Put together, this is the complete AI pipeline: Underlying principles → Model operation → Capability optimization → Product deployment. Breaking it into 4 layers makes it clear: 1. Principle layer: AI's foundation. Neural networks, tokenization, embeddings, attention, Transformer. Addresses: how models understand text, semantics, and context. ...
This post divides the complete AI pipeline into four layers: Principle layer, LLM operation layer, Optimization layer, and System layer, explaining respectively how models understand language, generate answers, optimize performance, and deliver products.
@FinanceYF5: 2/ After the bottleneck disappears, it's all about ambition. Fiona's original words: AI has raised the ceiling of what anyone can do; theoretically, everything is possible. An engineer who doesn't understand mobile development used Claude to directly add the App functionality.
Fiona believes AI has raised the ceiling of achievement; an engineer unfamiliar with mobile development used Claude to fill in the App functionality.
@freeman1266: https://x.com/freeman1266/status/2064702757773496552
This article introduces the concept of Loop Engineering, which involves designing automated systems that allow AI agents to work in autonomous loops, including elements such as automated tasks, work trees, skills, plugins, and sub-agents, thereby replacing manual prompting and improving development efficiency.
@servasyy_ai: https://x.com/servasyy_ai/status/2067382844410966078
This article proposes a 14-step roadmap from single agent to self-evolving system, emphasizing that base engineering (models, tools, permissions, context) is the key to determining the quality of loop output, and details practical methods for building an efficient base such as CLAUDE.md, sub-agents, skills, hooks, and state files.
@FinanceYF5: 2/ He never looks at benchmark numbers when evaluating models. The only thing he truly cares about is: [The shape of the model's thinking] — How deeply can it understand user intent? — How far can it iterate in its thinking? — Does it make you feel like there's someone on the other side? Fable gave him this sense of aliveness. 'It feels like returning to 2023'
This tweet emphasizes that when evaluating AI models, one should not only look at benchmark numbers but focus on the model's 'shape of thinking' — the depth of understanding user intent, the ability to iterate in thinking, and the feeling of 'someone on the other side'. The author believes Fable excels in this regard, reminiscent of the experience in 2023.