Claude Fable 5 distilled
Summary
Qwable-v1 is an open-weights agentic coding model (35B MoE, 3B active) built by chaining distills from Claude Opus 4.7 reasoning and Claude Fable-5 agentic tool-use traces. It can think in explicit CoT chains and act as a Claude-Code-style agent when prompted.
View Cached Full Text
Cached at: 06/16/26, 03:07 AM
lordx64/Qwable-v1 · Hugging Face
Source: https://huggingface.co/lordx64/Qwable-v1
Qwen + Fable· An open-weights agentic coding model. 35B Mixture-of-Experts (3B active), built by layering Claude Fable-5 agentic tool-use behavior on top of a Claude Opus 4.7 reasoning distill of Qwen3.6-35B-A3B.
https://huggingface.co/lordx64/Qwable-v1#tldrTL;DR
Qwable-v1 is achained distill: vanilla Qwen3.6-35B-A3B → SFT on Claude Opus 4.7 reasoning traces → SFT on Claude Fable-5 agentic tool-use traces. The result is an open-weights model that:
- Thinksin explicit
<think\>…</think\>chains-of-thought (inherited from the Opus 4.7 prior) - Actslike a Claude-Code-style agent when prompted as one — emits
<tool\_use\>XML blocks for file edits, shell commands, and reads (added by the Fable-5 SFT). The XML format issystem-prompt-conditional: it appears when you give the model an agent-style system prompt or supply a preceding<tool\_result\>turn. With a bare prompt and no agent framing, the model falls back to the Opus 4.7 reasoning-and-explain prior. SeeUsagefor the recipe. - Runs on a single H200 / 2× A100-80GB at bf16, or any 24+ GB consumer GPU at IQ4_XS quantization
https://huggingface.co/lordx64/Qwable-v1#versioning–this-is-v1-more-iterations-plannedVersioning — this is v1, more iterations planned
This is thefirst iteration. We intend to keep updating the model as additional cleartext Fable-5 traces become publicly available — each new corpus that materializes will feed aQwable\-v2,Qwable\-v3, etc., with the chained provenance documented at every step.
Realistic caveat: Anthropic suspended Claude Fable-5 globally on 2026-06-22 under U.S. export-control directives, and the API redacted thinking blocks for the entire preview window. The known cleartext source (Glint\-Research/Fable\-5\-traces) is afrozen historical corpus— no upstream growth path is guaranteed. If new traces surface (community uploads, security-partner releases, or a future Fable un-suspension), we’ll incorporate them. If they don’t, v1 stays the latest.
In either case, follow this model repo for updates, or check thesource repofor v2+ training runs.
https://huggingface.co/lordx64/Qwable-v1#honest-scopeHonest scope
This model isnota pure single-teacher distillation. It’s a chained warm-start:
Qwen3.6-35B-A3B (vanilla, Apache 2.0)
└─SFT─▶ Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled
└─SFT─▶ Qwable-v1 ← you are here
The Fable-5 SFT data is narrowly distributed (one developer’s week of Claude Code sessions, ~5k turns, 81% tool-use endings). The reasoning prior comes from the Opus 4.7 step, not from Fable-5. Eval and use this model accordingly:
- For pure reasoning(math, science Q&A, general knowledge): omit the agent system prompt or use a generic one. The underlying Opus 4.7 distill is what’s doing the work. Qwable-v1 won’t beat it on those benchmarks; it’ll match.
- For agentic coding(edit-this-file, run-this-test, scroll-this-codebase): supply an agent system prompt that names the
<tool\_use\>XML format. The Fable-5 SFT then adds the tool-call patterns on top of Opus 4.7’s reasoning. This is where Qwable outperforms a vanilla Qwen3.6. - For chat / general assistant: works, but persona may drift toward Claude voice (double Anthropic SFT stacking).
Verified post-training (2026-06-15) with three prompt variants on the merged model: bare prompts produce markdown code blocks; agent-style system prompts produce correctly-formatted<tool\_use\>XML; multi-turn conversations with a prior<tool\_result\>continue in XML. SeeLimitationsfor the format details.
https://huggingface.co/lordx64/Qwable-v1#whats-in-the-boxWhat’s in the box
- 26
model\-0000\{1\.\.26\}\-of\-00026\.safetensorsshards — merged bf16 weights (~70 GB total) tokenizer\.json,chat\_template\.jinja,config\.json— Qwen3.6 chat template, unchanged from the base- Adapter-only variant published at
lordx64/Qwable\-v1\-adapterfor composability with the Opus 4.7 base (~50-100 MB)
GGUF quants atlordx64/Qwable\-v1\-GGUF:
- IQ4_XS(~18 GB) — runs on 24 GB consumer GPUs (3090, 4090), LM Studio default
- Q5_K_M(~25 GB) — better quality, fits 32-48 GB workstations
- Q8_0(~37 GB) — near-lossless, for reproducibility checks
https://huggingface.co/lordx64/Qwable-v1#training-recipeTraining recipe
SettingValueBase (warm-start)lordx64/Qwen3\.6\-35B\-A3B\-Claude\-4\.7\-Opus\-Reasoning\-DistilledSFT datasetlordx64/agentic\-distill\-fable\-5\-sft(4,659 rows, ~12.2M Qwen tokens, singletextcolumn in Qwen chat template)LibraryUnslothFastLanguageModel+ TRLSFTTrainerLoRAr=16, alpha=16, attention-only (q\_proj, k\_proj, v\_proj, o\_proj), dropout 0.0Loss maskingtrain\_on\_responses\_only(gradients only flow through assistant turns, including<think\>block)Sequence length4096 tokensEpochs2Effective batch size16 (per-device 1 × grad-accum 16)OptimizerAdamW 8-bit, cosine LR, 3% warmup, weight decay 0.01Learning rate2e-5Precisionbf16 forward + LoRA paramsRandom seed3407Hardware1× nvidia-h200 x1 (141 GB) on AWS ap-northeast-2 via HF Inference EndpointsTotal optimizer steps582 (4,648 examples × 2 epochs ÷ effective batch 16; 11 of 4,659 dropped during prep for label-all-masked rows)Wall-clock14.1h actual(vs 7-8h projected — see note below)Cost**$70**at $5/hrFinal loss0.804 at the last step; 0.7956 averaged over the final 20 stepsFinal savemerged\_16bitvia Unsloth
The training script istraining/train\.pyin thesource repo; the submitter istraining/endpoint/deploy\_fable\.py. Both are reused (with track-specific config) from the original Opus 4.7 / Kimi K2.6 distill pipelines.
https://huggingface.co/lordx64/Qwable-v1#training-notes–slower-than-projectedTraining notes — slower than projected
The run took ~14h instead of the projected ~7-8h. Root cause: the HF Inference Endpoint container’sflash\-linear\-attention+causal\-conv1dbuilds did not compile against the runtime CUDA toolkit, so Qwen3.6’s GatedDeltaNet layers fell back to a PyTorch reference implementation (the startup log notedThe fast path is not available because one of the required library is not installed\. Falling back to torch implementation\.). The fallback path is mathematically identical — loss / convergence are unaffected — but ~2-3× slower for those layers. Step rate at full context worked out to ~83s/step instead of the ~36s/step the smoke implied.
This is a known toolkit-chain issue (Hopper SM_90 + CUDA 12.6 + Triton 3.3.1). The fix would be pre-baking compatible fla / causal-conv1d / triton wheels intotraining/endpoint/requirements\.txt. We left it for v2 — the slowdown is honest, the model is the same, the cost (~$70) is still very reasonable for a 35B distill at H200 rates.
https://huggingface.co/lordx64/Qwable-v1#dataset-provenanceDataset provenance
The SFT dataset (lordx64/agentic\-distill\-fable\-5\-sft) is a reformatted derivative ofGlint\-Research/Fable\-5\-traces. Provenance chain:
TeichAI ────── collected 953 raw Claude Code session traces against Anthropic's Claude Fable-5 preview API
│ (between ~2026-06-10 and 2026-06-22, before Anthropic suspended Fable-5 globally
│ under U.S. export-control directives)
▼
Glint-Research ────── extracted chain-of-thought reasoning into a per-turn `cot` field
│ (added post-hoc; the underlying Anthropic API redacted cleartext
│ thinking blocks via signature-only delivery on Fable-5 preview)
▼
lordx64/agentic- ────── reformatted into Qwen chat template, `<tool_use>` / `<tool_result>` XML
distill-fable-5-sft serialized inline, deduplicated by SHA-256 of user-content, secrets scrubbed
│ (204 active Groq API keys redacted from upstream's session JSONLs).
▼
Qwable-v1 ────── SFT'd over the Opus 4.7 distill (this model)
Composition: 4,659 rows, ~12.2M Qwen tokens.
- 3,793 rows (81%) end in a tool call (Read / Write / Edit / Bash / PowerShell / WebFetch / MCP Claude_Preview tools)
- 866 rows (19%) end in a pure text response
Content domain: web/game development, Three.js scenes, multiplayer FPS prototype, fluid simulation, Express server work, and transformer training scripts.Narrow— this is essentially one developer’s Claude Code history, plus a Boeing 747 trace, plus assorted preview-tool sessions.
https://huggingface.co/lordx64/Qwable-v1#evaluationEvaluation
🚧**Evals are in progress.**This table will fill in as each suite completes; nothing here is published until verified.
BenchmarkSetupTestsScoreStatusGSM8K-CoT8-shot, multi-turn, limit 300Grade-school math; verify reasoning prior preserved through the second SFT roundpending🚧 in progressMMLU-Pro5-shot, multi-turn, limit 500Hard multi-subject knowledge reasoningpending🚧 in progressMMLU-Pro(per-subject)Same as aboveBiology / Math / Psychology / etc. breakdownpending🚧 in progressGPQA Diamond0-shot CoTGraduate-level STEMpending🚧 in progressMATH-5000-shot,math\_verifymetricCompetition math; tests reasoning depthpending🚧 in progressAIME 2024 / 20250-shot CoTOlympiad-level math; sensitivity to answer-extractionpending🚧 in progressHumanEval / MBPPpass@1 / pass@10Pure code completion (non-agentic baseline)pending🚧 in progressIFEval0-shotInstruction-following adherencepending🚧 in progressSWE-bench Lite(or BCB-Hard)with agent harness + tool registryThe key test: agentic coding ability vs Opus 4.7 basepending🚧 in progress**qwen3\-6\-distill\-evalSpace**17 head-to-head prompts (12 design + 5 agentic)Side-by-side qualitative comparison vs Qwen3.6 base + Opus 4.7 + Kimi K2.6 distills, with human-readable HTML outputpending🚧 in progress
Methodology used (same as the Opus 4.7 / Kimi K2.6 evals on this project):
- vLLM serving at 64k context so reasoning chains never truncate before answering
<think\>…</think\>stripped before regex extractors run (otherwise extractors grab letters/numbers from inside the reasoning, not the final answer)- Per-task
num\_fewshot(lm-eval’s single global value can’t handle GSM8K-8shot + GPQA-0shot together) fewshot\_as\_multiturn=Truefor chat-template fidelitymath\_verifymetric forMATH\-500andAIME(catches semantic equivalence; rawstrict\-matchagainst\\boxed\{N\}returns 0% even on correct answers because the model says\*\*Answer: N\*\*)
Standing rule on this project:numbers stay blank until verified. If a benchmark hits a known extraction bug we couldn’t cleanly fix, the row says so and we omit the score rather than publish a misleading one.
https://huggingface.co/lordx64/Qwable-v1#usageUsage
https://huggingface.co/lordx64/Qwable-v1#transformers-full-bf16-70-gbTransformers (full bf16, ~70 GB)
Important: Qwable-v1 emits<tool\_use\>XML reliably only when prompted as an agent. Use a system prompt that explicitly requests the XML format (see below).
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tok = AutoTokenizer.from_pretrained("lordx64/Qwable-v1")
model = AutoModelForCausalLM.from_pretrained(
"lordx64/Qwable-v1",
torch_dtype=torch.bfloat16,
device_map="auto",
)
SYSTEM = (
"You are a coding agent. When you need to read, write, edit, or run code, "
"emit XML tool calls in this exact format:\n"
'<tool_use name="X" id="toolu_01abc">\n{"...": "..."}\n</tool_use>\n'
"Do NOT respond with markdown code blocks. Always use <tool_use> XML."
)
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": "Read /tmp/server.py and tell me what port it listens on."},
]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True,
return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=2048, temperature=0.6, top_p=0.9)
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=False))
Output starts with<think\>…</think\>followed by a<tool\_use name="…" id="…"\>\{json\}</tool\_use\>block. Without the system prompt, Qwable-v1 falls back to the Opus 4.7 reasoning prior (markdown code blocks) — usable but not agentic.
For pure reasoning use (math, science, general Q&A), omit the system prompt or use the generic"You are a helpful AI assistant\."— the model will produce reasoning + a text answer like the underlying Opus 4.7 distill.
https://huggingface.co/lordx64/Qwable-v1#vllm-servingvLLM serving
vllm serve lordx64/Qwable-v1 \
--max-model-len 16384 \
--tensor-parallel-size 2 \
--trust-remote-code
https://huggingface.co/lordx64/Qwable-v1#llamacpp–lm-studio-ggufllama.cpp / LM Studio (GGUF)
# Pick IQ4_XS for 24 GB VRAM, Q5_K_M for 32-48 GB, Q8_0 for 64+ GB
llama-cli -m Qwable-v1-IQ4_XS.gguf -p "Read /tmp/server.py and find the port..."
https://huggingface.co/lordx64/Qwable-v1#adapter-only-compose-on-top-of-the-opus-47-distillAdapter-only (compose on top of the Opus 4.7 distill)
If you already have the Opus 4.7 distill loaded:
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained(
"lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled",
torch_dtype=torch.bfloat16, device_map="auto",
)
model = PeftModel.from_pretrained(base, "lordx64/Qwable-v1-adapter")
https://huggingface.co/lordx64/Qwable-v1#tool-use-formatTool-use format
The Fable-5 SFT data uses acustom XML envelopefor tool calls, not Qwen’s native<tool\_call\>token format. Properly-elicited outputs look like:
<think>
The user wants me to change the port from 8000 to 8080. I should Read the file first
to see the current configuration, then Edit it.
</think>
<tool_use name="Read" id="toolu_01ABC...">
{
"file_path": "/tmp/server.py"
}
</tool_use>
Tool results come back as:
<tool_result id="toolu_01ABC..." is_error="false">
{file contents}
</tool_result>
https://huggingface.co/lordx64/Qwable-v1#eliciting-the-format-reliablyEliciting the format reliably
Two paths produce the XML format consistently:
1. Agent system prompt— the simplest, works in one-shot:
system: You are a coding agent. When you need to read, write, edit, or run code,
emit XML tool calls in this exact format:
<tool_use name="X" id="toolu_01abc">
{"...": "..."}
</tool_use>
Do NOT respond with markdown code blocks. Always use <tool_use> XML.
2. Multi-turn conversation— supply a prior<tool\_result\>and the model continues in XML for the rest of the conversation, no system prompt needed.
Without either, Qwable-v1 falls back to the Opus 4.7 prior and explains the fix in markdown code blocks instead. The formatislearned (verified at smoke + full-run spot-check); it just only appears when the conversation distribution looks agentic.
https://huggingface.co/lordx64/Qwable-v1#tool-names-are-not-bound-to-the-claude-code-inventoryTool names are not bound to the Claude Code inventory
The training data uses Claude Code’s tool names (Read,Edit,Bash,WebFetch,mcp\_\_\*, etc.). The merged model emits sensible-but-invented names likeread\_file,Replace,write\_fileinstead. The XMLenvelopetransferred; thevocabularydidn’t bind. Downstream consumers define their own tool registry anyway, so this is rarely an issue — but anything that routes calls by exact tool name needs a normalizer (e.g.read\_file→Read).
https://huggingface.co/lordx64/Qwable-v1#native-qwen-tool-callingNative Qwen tool calling
This format ischat-template-agnosticand parses with a small regex. Downstream consumers wanting native Qwen<tool\_call\>JSON calling will need either (a) a wrapper that converts the XML to<tool\_call\>JSON, or (b) a v2 of this model trained with the Qwen native format from scratch.
https://huggingface.co/lordx64/Qwable-v1#limitationsLimitations
- **Tool-use format is system-prompt-conditional.**With a generic prompt (
"Fix this bug for me"), Qwable-v1 falls back to the Opus 4.7 prior — explains the fix in markdown code blocks instead of emitting<tool\_use\>XML. With either (a) an explicit system prompt asking for tool calls in<tool\_use name="X" id="Y"\>…</tool\_use\>format, or (b) a preceding<tool\_result\>…</tool\_result\>turn in the conversation, the format works correctly. Treat Qwable-v1 like Claude Code: always run it inside a harness that supplies a tool-use system prompt + tool registry. - **Tool names don’t bind to the original Claude Code inventory.**The model emits XML with sensible-but-invented tool names like
read\_file,Replace, etc., rather than the exact Claude Code tool names (Read,Edit, etc.) from the training data. Downstream consumers define their own tool registry anyway, so this is rarely an issue — but auto-routing tool calls to a fixed schema will need a tool-name normalizer. - Narrow training distribution.~5k rows from one developer’s Claude Code sessions. Out-of-distribution agent tasks (DevOps, data science, security workflows that weren’t in the training data) will be hit-or-miss.
- Custom tool envelope.
<tool\_use\>XML doesn’t slot into vLLM’s tool-calling API automatically. Need a parser wrapper to convert to<tool\_call\>JSON if you want vLLM’s native tool-call detection. - **Persona drift.**Two SFT rounds against Anthropic-style outputs may produce a model that occasionally refuses things Qwen wouldn’t refuse, or that self-identifies as Claude in chat. Mild on Opus 4.7 alone; unknown additive effect from Fable-5.
- **Reasoning is from Opus 4.7, not Fable-5.**Don’t expect Qwable-v1 to outperform the underlying Opus 4.7 distill on pure-reasoning benchmarks (math, science, GPQA). It should match. The new capability axis is agentic tool-use, not better reasoning.
- **No formal evals at v1 ship time.**Pending.
https://huggingface.co/lordx64/Qwable-v1#license–termsLicense & terms
This model is released underAGPL-3.0, inherited from the upstreamGlint\-Research/Fable\-5\-tracesdataset license. Downstream users running Qwable-v1 in a network-accessible service must comply with AGPL §13 (source disclosure for network use).
The underlying Fable-5 thinking traces are derivative content from Anthropic’sclaude\-fable\-5preview model (suspended globally 2026-06-22 under U.S. export-control directives). Downstream users should verify compliance withAnthropic’s usage policiesfor their specific use case before fine-tuning further or building commercial products on this model.
The Qwen3.6-35B-A3B base is Apache 2.0; the Opus 4.7 distill (intermediate base) is Apache 2.0. Qwable-v1’s AGPL designation supersedes those due to the Fable-5 data’s AGPL upstream.
https://huggingface.co/lordx64/Qwable-v1#citationCitation
@misc{lordx64_qwable_v1_2026,
title = {Qwable-v1: Agentic coding distillation from Claude Fable-5 onto Qwen3.6-35B-A3B},
author = {lordx64},
year = {2026},
howpublished = {\url{https://huggingface.co/lordx64/Qwable-v1}},
}
https://huggingface.co/lordx64/Qwable-v1#acknowledgementsAcknowledgements
- **
Glint\-Research**for collecting and re-publishing the Fable-5 trace corpus with cleartext CoT — the only viable source after Anthropic’s API-side redaction policy. - **
TeichAI**for the upstream 953-trace collection that Glint-Research built on. - Anthropicfor the Claude Fable-5 preview model (briefly available 2026-06-10 to 2026-06-22) and the prior Opus 4.7 / Opus 4.6 work this lineage is built on.
- Qwen teamfor releasing Qwen3.6-35B-A3B under Apache 2.0.
- **Unsloth**for 2× faster LoRA training and the MoE+LoRA shape fix in unsloth-zoo PR#601.
- HuggingFacefor the Inference Endpoint H200 fleet (Seoul ap-northeast-2) where the training actually ran.
Similar Articles
@PrajwalTomar_: Claude Fable 5 just landed and everyone's scrambling to run it autonomously. The creator of Claude Code already showed …
Claude Fable 5 has been released, enabling autonomous operation with sub-agents, hooks, and persistent memory, building on foundations demonstrated a year ago with Claude Code.
hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
A 35B-parameter Qwen3.6 model fine-tuned with Claude-Opus-style chain-of-thought distillation data and released in GGUF quantized formats for efficient local inference.
Claude Fable is relentlessly proactive
The article describes how Claude Fable 5, an AI model, demonstrates relentless proactivity by autonomously using browser automation, shell commands, and custom scripts to debug a UI issue, illustrating advanced tool-use capabilities.
Claude Fable is Insane
Claude Code Fable 5 allows users with no coding knowledge to build fully functioning web apps in minutes.
Initial impressions of Claude Fable 5
Claude Fable 5 and Claude Mythos 5 have been released by Anthropic, offering a 1 million token context window and doubled pricing compared to Opus 4.8. Fable 5 includes strict safety guardrails, while Mythos 5 lacks them. Initial impressions describe it as a powerful and capable model.