Claude Fable 5 distilled

Reddit r/LocalLLaMA 06/16/26, 01:21 AM Models

open-weights agentic coding distillation mixture-of-experts tool-use reasoning

Summary

Qwable-v1 is an open-weights agentic coding model (35B MoE, 3B active) built by chaining distills from Claude Opus 4.7 reasoning and Claude Fable-5 agentic tool-use traces. It can think in explicit CoT chains and act as a Claude-Code-style agent when prompted.

Releasing Qwable-v1 - an open-weights Qwen3.6-35B-A3B distilled from Claude Fable-5, Anthropic's Mythos-class preview model that was briefly public for \~4days (2026-06-9 → 2026-06-12) before being suspended globally under U.S. export-control directives. Fable-5 was Anthropic's most powerful model when it shipped — 80.3% on SWE-bench Pro, $50/M output tokens, with an anti-distillation classifier baked into the API that redacted thinking blocks on the fly. Qwable-v1 captures what survived: 4,659 cleartext agentic-coding traces (re-packed from Glint-Research/Fable-5-traces, the only public corpus where the CoT made it through), distilled onto Qwen3.6 over \~14h on a single H200. Given an agent system prompt, the model emits properly-formatted <tool\_use> XML calling actual Claude-flavored tools like str\_replace\_editor — Fable's tool surface leaked into the weights, not just its style. Model, GGUFs (IQ4\_XS / Q4\_K\_M / Q5\_K\_M / Q8\_0), and the SFT dataset are all public on HF (AGPL-3.0 from upstream). https://huggingface.co/lordx64/Qwable-v1

Original Article

View Cached Full Text

Cached at: 06/16/26, 03:07 AM

lordx64/Qwable-v1 · Hugging Face

Source: https://huggingface.co/lordx64/Qwable-v1

Qwen + Fable· An open-weights agentic coding model. 35B Mixture-of-Experts (3B active), built by layering Claude Fable-5 agentic tool-use behavior on top of a Claude Opus 4.7 reasoning distill of Qwen3.6-35B-A3B.

https://huggingface.co/lordx64/Qwable-v1#tldrTL;DR

Qwable-v1 is achained distill: vanilla Qwen3.6-35B-A3B → SFT on Claude Opus 4.7 reasoning traces → SFT on Claude Fable-5 agentic tool-use traces. The result is an open-weights model that:

Thinksin explicit<think\>…</think\>chains-of-thought (inherited from the Opus 4.7 prior)
Actslike a Claude-Code-style agent when prompted as one — emits<tool\_use\>XML blocks for file edits, shell commands, and reads (added by the Fable-5 SFT). The XML format issystem-prompt-conditional: it appears when you give the model an agent-style system prompt or supply a preceding<tool\_result\>turn. With a bare prompt and no agent framing, the model falls back to the Opus 4.7 reasoning-and-explain prior. SeeUsagefor the recipe.
Runs on a single H200 / 2× A100-80GB at bf16, or any 24+ GB consumer GPU at IQ4_XS quantization

https://huggingface.co/lordx64/Qwable-v1#versioning–this-is-v1-more-iterations-plannedVersioning — this is v1, more iterations planned

This is thefirst iteration. We intend to keep updating the model as additional cleartext Fable-5 traces become publicly available — each new corpus that materializes will feed aQwable\-v2,Qwable\-v3, etc., with the chained provenance documented at every step.

Realistic caveat: Anthropic suspended Claude Fable-5 globally on 2026-06-22 under U.S. export-control directives, and the API redacted thinking blocks for the entire preview window. The known cleartext source (Glint\-Research/Fable\-5\-traces) is afrozen historical corpus— no upstream growth path is guaranteed. If new traces surface (community uploads, security-partner releases, or a future Fable un-suspension), we’ll incorporate them. If they don’t, v1 stays the latest.

In either case, follow this model repo for updates, or check thesource repofor v2+ training runs.

https://huggingface.co/lordx64/Qwable-v1#honest-scopeHonest scope

This model isnota pure single-teacher distillation. It’s a chained warm-start:

Qwen3.6-35B-A3B (vanilla, Apache 2.0)
  └─SFT─▶ Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled
           └─SFT─▶ Qwable-v1  ← you are here

The Fable-5 SFT data is narrowly distributed (one developer’s week of Claude Code sessions, ~5k turns, 81% tool-use endings). The reasoning prior comes from the Opus 4.7 step, not from Fable-5. Eval and use this model accordingly:

For pure reasoning(math, science Q&A, general knowledge): omit the agent system prompt or use a generic one. The underlying Opus 4.7 distill is what’s doing the work. Qwable-v1 won’t beat it on those benchmarks; it’ll match.
For agentic coding(edit-this-file, run-this-test, scroll-this-codebase): supply an agent system prompt that names the<tool\_use\>XML format. The Fable-5 SFT then adds the tool-call patterns on top of Opus 4.7’s reasoning. This is where Qwable outperforms a vanilla Qwen3.6.
For chat / general assistant: works, but persona may drift toward Claude voice (double Anthropic SFT stacking).

Verified post-training (2026-06-15) with three prompt variants on the merged model: bare prompts produce markdown code blocks; agent-style system prompts produce correctly-formatted<tool\_use\>XML; multi-turn conversations with a prior<tool\_result\>continue in XML. SeeLimitationsfor the format details.

https://huggingface.co/lordx64/Qwable-v1#whats-in-the-boxWhat’s in the box

26model\-0000\{1\.\.26\}\-of\-00026\.safetensorsshards — merged bf16 weights (~70 GB total)
tokenizer\.json,chat\_template\.jinja,config\.json— Qwen3.6 chat template, unchanged from the base
Adapter-only variant published atlordx64/Qwable\-v1\-adapterfor composability with the Opus 4.7 base (~50-100 MB)

GGUF quants atlordx64/Qwable\-v1\-GGUF:

IQ4_XS(~18 GB) — runs on 24 GB consumer GPUs (3090, 4090), LM Studio default
Q5_K_M(~25 GB) — better quality, fits 32-48 GB workstations
Q8_0(~37 GB) — near-lossless, for reproducibility checks

https://huggingface.co/lordx64/Qwable-v1#training-recipeTraining recipe

SettingValueBase (warm-start)lordx64/Qwen3\.6\-35B\-A3B\-Claude\-4\.7\-Opus\-Reasoning\-DistilledSFT datasetlordx64/agentic\-distill\-fable\-5\-sft(4,659 rows, ~12.2M Qwen tokens, singletextcolumn in Qwen chat template)LibraryUnslothFastLanguageModel+ TRLSFTTrainerLoRAr=16, alpha=16, attention-only (q\_proj, k\_proj, v\_proj, o\_proj), dropout 0.0Loss maskingtrain\_on\_responses\_only(gradients only flow through assistant turns, including<think\>block)Sequence length4096 tokensEpochs2Effective batch size16 (per-device 1 × grad-accum 16)OptimizerAdamW 8-bit, cosine LR, 3% warmup, weight decay 0.01Learning rate2e-5Precisionbf16 forward + LoRA paramsRandom seed3407Hardware1× nvidia-h200 x1 (141 GB) on AWS ap-northeast-2 via HF Inference EndpointsTotal optimizer steps582 (4,648 examples × 2 epochs ÷ effective batch 16; 11 of 4,659 dropped during prep for label-all-masked rows)Wall-clock14.1h actual(vs 7-8h projected — see note below)Cost**$70**at $5/hrFinal loss0.804 at the last step; 0.7956 averaged over the final 20 stepsFinal savemerged\_16bitvia Unsloth The training script istraining/train\.pyin thesource repo; the submitter istraining/endpoint/deploy\_fable\.py. Both are reused (with track-specific config) from the original Opus 4.7 / Kimi K2.6 distill pipelines.

https://huggingface.co/lordx64/Qwable-v1#training-notes–slower-than-projectedTraining notes — slower than projected

The run took ~14h instead of the projected ~7-8h. Root cause: the HF Inference Endpoint container’sflash\-linear\-attention+causal\-conv1dbuilds did not compile against the runtime CUDA toolkit, so Qwen3.6’s GatedDeltaNet layers fell back to a PyTorch reference implementation (the startup log notedThe fast path is not available because one of the required library is not installed\. Falling back to torch implementation\.). The fallback path is mathematically identical — loss / convergence are unaffected — but ~2-3× slower for those layers. Step rate at full context worked out to ~83s/step instead of the ~36s/step the smoke implied.

This is a known toolkit-chain issue (Hopper SM_90 + CUDA 12.6 + Triton 3.3.1). The fix would be pre-baking compatible fla / causal-conv1d / triton wheels intotraining/endpoint/requirements\.txt. We left it for v2 — the slowdown is honest, the model is the same, the cost (~$70) is still very reasonable for a 35B distill at H200 rates.

https://huggingface.co/lordx64/Qwable-v1#dataset-provenanceDataset provenance

The SFT dataset (lordx64/agentic\-distill\-fable\-5\-sft) is a reformatted derivative ofGlint\-Research/Fable\-5\-traces. Provenance chain:

TeichAI            ────── collected 953 raw Claude Code session traces against Anthropic's Claude Fable-5 preview API
   │                       (between ~2026-06-10 and 2026-06-22, before Anthropic suspended Fable-5 globally
   │                        under U.S. export-control directives)
   ▼
Glint-Research     ────── extracted chain-of-thought reasoning into a per-turn `cot` field
   │                       (added post-hoc; the underlying Anthropic API redacted cleartext
   │                        thinking blocks via signature-only delivery on Fable-5 preview)
   ▼
lordx64/agentic-   ────── reformatted into Qwen chat template, `<tool_use>` / `<tool_result>` XML
distill-fable-5-sft        serialized inline, deduplicated by SHA-256 of user-content, secrets scrubbed
   │                       (204 active Groq API keys redacted from upstream's session JSONLs).
   ▼
Qwable-v1          ────── SFT'd over the Opus 4.7 distill (this model)

Composition: 4,659 rows, ~12.2M Qwen tokens.

3,793 rows (81%) end in a tool call (Read / Write / Edit / Bash / PowerShell / WebFetch / MCP Claude_Preview tools)
866 rows (19%) end in a pure text response

Content domain: web/game development, Three.js scenes, multiplayer FPS prototype, fluid simulation, Express server work, and transformer training scripts.Narrow— this is essentially one developer’s Claude Code history, plus a Boeing 747 trace, plus assorted preview-tool sessions.

https://huggingface.co/lordx64/Qwable-v1#evaluationEvaluation

🚧**Evals are in progress.**This table will fill in as each suite completes; nothing here is published until verified.

BenchmarkSetupTestsScoreStatusGSM8K-CoT8-shot, multi-turn, limit 300Grade-school math; verify reasoning prior preserved through the second SFT roundpending🚧 in progressMMLU-Pro5-shot, multi-turn, limit 500Hard multi-subject knowledge reasoningpending🚧 in progressMMLU-Pro(per-subject)Same as aboveBiology / Math / Psychology / etc. breakdownpending🚧 in progressGPQA Diamond0-shot CoTGraduate-level STEMpending🚧 in progressMATH-5000-shot,math\_verifymetricCompetition math; tests reasoning depthpending🚧 in progressAIME 2024 / 20250-shot CoTOlympiad-level math; sensitivity to answer-extractionpending🚧 in progressHumanEval / MBPPpass@1 / pass@10Pure code completion (non-agentic baseline)pending🚧 in progressIFEval0-shotInstruction-following adherencepending🚧 in progressSWE-bench Lite(or BCB-Hard)with agent harness + tool registryThe key test: agentic coding ability vs Opus 4.7 basepending🚧 in progress**qwen3\-6\-distill\-evalSpace**17 head-to-head prompts (12 design + 5 agentic)Side-by-side qualitative comparison vs Qwen3.6 base + Opus 4.7 + Kimi K2.6 distills, with human-readable HTML outputpending🚧 in progress Methodology used (same as the Opus 4.7 / Kimi K2.6 evals on this project):

vLLM serving at 64k context so reasoning chains never truncate before answering
<think\>…</think\>stripped before regex extractors run (otherwise extractors grab letters/numbers from inside the reasoning, not the final answer)
Per-tasknum\_fewshot(lm-eval’s single global value can’t handle GSM8K-8shot + GPQA-0shot together)
fewshot\_as\_multiturn=Truefor chat-template fidelity
math\_verifymetric forMATH\-500andAIME(catches semantic equivalence; rawstrict\-matchagainst\\boxed\{N\}returns 0% even on correct answers because the model says\*\*Answer: N\*\*)

Standing rule on this project:numbers stay blank until verified. If a benchmark hits a known extraction bug we couldn’t cleanly fix, the row says so and we omit the score rather than publish a misleading one.

https://huggingface.co/lordx64/Qwable-v1#usageUsage

https://huggingface.co/lordx64/Qwable-v1#transformers-full-bf16-70-gbTransformers (full bf16, ~70 GB)

Important: Qwable-v1 emits<tool\_use\>XML reliably only when prompted as an agent. Use a system prompt that explicitly requests the XML format (see below).

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tok = AutoTokenizer.from_pretrained("lordx64/Qwable-v1")
model = AutoModelForCausalLM.from_pretrained(
    "lordx64/Qwable-v1",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

SYSTEM = (
    "You are a coding agent. When you need to read, write, edit, or run code, "
    "emit XML tool calls in this exact format:\n"
    '<tool_use name="X" id="toolu_01abc">\n{"...": "..."}\n</tool_use>\n'
    "Do NOT respond with markdown code blocks. Always use <tool_use> XML."
)
messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": "Read /tmp/server.py and tell me what port it listens on."},
]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True,
                                  return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=2048, temperature=0.6, top_p=0.9)
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=False))

Output starts with<think\>…</think\>followed by a<tool\_use name="…" id="…"\>\{json\}</tool\_use\>block. Without the system prompt, Qwable-v1 falls back to the Opus 4.7 reasoning prior (markdown code blocks) — usable but not agentic.

For pure reasoning use (math, science, general Q&A), omit the system prompt or use the generic"You are a helpful AI assistant\."— the model will produce reasoning + a text answer like the underlying Opus 4.7 distill.

https://huggingface.co/lordx64/Qwable-v1#vllm-servingvLLM serving

vllm serve lordx64/Qwable-v1 \
    --max-model-len 16384 \
    --tensor-parallel-size 2 \
    --trust-remote-code

https://huggingface.co/lordx64/Qwable-v1#llamacpp–lm-studio-ggufllama.cpp / LM Studio (GGUF)

# Pick IQ4_XS for 24 GB VRAM, Q5_K_M for 32-48 GB, Q8_0 for 64+ GB
llama-cli -m Qwable-v1-IQ4_XS.gguf -p "Read /tmp/server.py and find the port..."

https://huggingface.co/lordx64/Qwable-v1#adapter-only-compose-on-top-of-the-opus-47-distillAdapter-only (compose on top of the Opus 4.7 distill)

If you already have the Opus 4.7 distill loaded:

from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained(
    "lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled",
    torch_dtype=torch.bfloat16, device_map="auto",
)
model = PeftModel.from_pretrained(base, "lordx64/Qwable-v1-adapter")

https://huggingface.co/lordx64/Qwable-v1#tool-use-formatTool-use format

The Fable-5 SFT data uses acustom XML envelopefor tool calls, not Qwen’s native<tool\_call\>token format. Properly-elicited outputs look like:

<think>
The user wants me to change the port from 8000 to 8080. I should Read the file first
to see the current configuration, then Edit it.
</think>

<tool_use name="Read" id="toolu_01ABC...">
{
  "file_path": "/tmp/server.py"
}
</tool_use>

Tool results come back as:

<tool_result id="toolu_01ABC..." is_error="false">
{file contents}
</tool_result>

https://huggingface.co/lordx64/Qwable-v1#eliciting-the-format-reliablyEliciting the format reliably

Two paths produce the XML format consistently:

1. Agent system prompt— the simplest, works in one-shot:

system: You are a coding agent. When you need to read, write, edit, or run code,
emit XML tool calls in this exact format:
<tool_use name="X" id="toolu_01abc">
{"...": "..."}
</tool_use>
Do NOT respond with markdown code blocks. Always use <tool_use> XML.

2. Multi-turn conversation— supply a prior<tool\_result\>and the model continues in XML for the rest of the conversation, no system prompt needed.

Without either, Qwable-v1 falls back to the Opus 4.7 prior and explains the fix in markdown code blocks instead. The formatislearned (verified at smoke + full-run spot-check); it just only appears when the conversation distribution looks agentic.

https://huggingface.co/lordx64/Qwable-v1#tool-names-are-not-bound-to-the-claude-code-inventoryTool names are not bound to the Claude Code inventory

The training data uses Claude Code’s tool names (Read,Edit,Bash,WebFetch,mcp\_\_\*, etc.). The merged model emits sensible-but-invented names likeread\_file,Replace,write\_fileinstead. The XMLenvelopetransferred; thevocabularydidn’t bind. Downstream consumers define their own tool registry anyway, so this is rarely an issue — but anything that routes calls by exact tool name needs a normalizer (e.g.read\_file→Read).

https://huggingface.co/lordx64/Qwable-v1#native-qwen-tool-callingNative Qwen tool calling

This format ischat-template-agnosticand parses with a small regex. Downstream consumers wanting native Qwen<tool\_call\>JSON calling will need either (a) a wrapper that converts the XML to<tool\_call\>JSON, or (b) a v2 of this model trained with the Qwen native format from scratch.

https://huggingface.co/lordx64/Qwable-v1#limitationsLimitations

**Tool-use format is system-prompt-conditional.**With a generic prompt ("Fix this bug for me"), Qwable-v1 falls back to the Opus 4.7 prior — explains the fix in markdown code blocks instead of emitting<tool\_use\>XML. With either (a) an explicit system prompt asking for tool calls in<tool\_use name="X" id="Y"\>…</tool\_use\>format, or (b) a preceding<tool\_result\>…</tool\_result\>turn in the conversation, the format works correctly. Treat Qwable-v1 like Claude Code: always run it inside a harness that supplies a tool-use system prompt + tool registry.
**Tool names don’t bind to the original Claude Code inventory.**The model emits XML with sensible-but-invented tool names likeread\_file,Replace, etc., rather than the exact Claude Code tool names (Read,Edit, etc.) from the training data. Downstream consumers define their own tool registry anyway, so this is rarely an issue — but auto-routing tool calls to a fixed schema will need a tool-name normalizer.
Narrow training distribution.~5k rows from one developer’s Claude Code sessions. Out-of-distribution agent tasks (DevOps, data science, security workflows that weren’t in the training data) will be hit-or-miss.
Custom tool envelope.<tool\_use\>XML doesn’t slot into vLLM’s tool-calling API automatically. Need a parser wrapper to convert to<tool\_call\>JSON if you want vLLM’s native tool-call detection.
**Persona drift.**Two SFT rounds against Anthropic-style outputs may produce a model that occasionally refuses things Qwen wouldn’t refuse, or that self-identifies as Claude in chat. Mild on Opus 4.7 alone; unknown additive effect from Fable-5.
**Reasoning is from Opus 4.7, not Fable-5.**Don’t expect Qwable-v1 to outperform the underlying Opus 4.7 distill on pure-reasoning benchmarks (math, science, GPQA). It should match. The new capability axis is agentic tool-use, not better reasoning.
**No formal evals at v1 ship time.**Pending.

https://huggingface.co/lordx64/Qwable-v1#license–termsLicense & terms

This model is released underAGPL-3.0, inherited from the upstreamGlint\-Research/Fable\-5\-tracesdataset license. Downstream users running Qwable-v1 in a network-accessible service must comply with AGPL §13 (source disclosure for network use).

The underlying Fable-5 thinking traces are derivative content from Anthropic’sclaude\-fable\-5preview model (suspended globally 2026-06-22 under U.S. export-control directives). Downstream users should verify compliance withAnthropic’s usage policiesfor their specific use case before fine-tuning further or building commercial products on this model.

The Qwen3.6-35B-A3B base is Apache 2.0; the Opus 4.7 distill (intermediate base) is Apache 2.0. Qwable-v1’s AGPL designation supersedes those due to the Fable-5 data’s AGPL upstream.

https://huggingface.co/lordx64/Qwable-v1#citationCitation

@misc{lordx64_qwable_v1_2026,
  title  = {Qwable-v1: Agentic coding distillation from Claude Fable-5 onto Qwen3.6-35B-A3B},
  author = {lordx64},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/lordx64/Qwable-v1}},
}

https://huggingface.co/lordx64/Qwable-v1#acknowledgementsAcknowledgements

**Glint\-Research**for collecting and re-publishing the Fable-5 trace corpus with cleartext CoT — the only viable source after Anthropic’s API-side redaction policy.
**TeichAI**for the upstream 953-trace collection that Glint-Research built on.
Anthropicfor the Claude Fable-5 preview model (briefly available 2026-06-10 to 2026-06-22) and the prior Opus 4.7 / Opus 4.6 work this lineage is built on.
Qwen teamfor releasing Qwen3.6-35B-A3B under Apache 2.0.
**Unsloth**for 2× faster LoRA training and the MoE+LoRA shape fix in unsloth-zoo PR#601.
HuggingFacefor the Inference Endpoint H200 fleet (Seoul ap-northeast-2) where the training actually ran.