@cjzafir: A 3B parameter SLM: VibeThinker (fine-tuned on Qwen 2.5) matches Claude Opus 4.5 performance. Same performance as: > De…
Summary
VibeThinker, a 3B parameter model fine-tuned on Qwen 2.5, achieves performance comparable to Claude Opus 4.5 and much larger models like DeepSeek v3 through innovative post-training that includes multi-path thinking and staged training on math, coding, and science.
View Cached Full Text
Cached at: 06/18/26, 12:05 AM
A 3B parameter SLM: VibeThinker (fine-tuned on Qwen 2.5) matches Claude Opus 4.5 performance.
Same performance as:
Deepseek v3 (671B parameters) but 220x smaller. Kimi k2.5 (1T parameters) but 330x smaller. GLM-5 (744B parameters) but 248x smaller.
You’ll be able to run this model on your Macs.
Its not a fluke. Their post training work is very interesting.
-
They trained in a smart order: math first → coding second → science third.
-
For each problem, they made the model think in multiple different ways before picking the best answer.
-
They trained in two stages: first on many normal problems, then on hard, long reasoning problems.
-
They focused on verifiable high quality synthetic dataset. Also heavily filtered all bad examples.
-
They focused on long horizon tasks. Trained it on long text all at once (instead of gradually making it longer) so it can think for a long time without getting confused.
-
At the end, they added training to make the model give shorter but still correct answers (more efficient).
Post-training (finetuning) innovation is very important and what happened to Fable 5 should make you realize how important it is to own your intelligence.
I’ll be testing this model and share my findings.
Francesco Bertolotti (@f14bertolotti): Stellar performance from a 3B model. These results were achieved primarily through post-training refinements on Qwen2.5-Coder. The paper doesn’t provide many details, but it appears they distill from RL ckpts and then do a final RL-based instruct RL.
🔗
Similar Articles
@TheAhmadOsman: 3B model with Opus 4.5 performance VibeThinker 3B (based on Qwen 2.5)
Ahmad Osman announces VibeThinker 3B, a 3-billion-parameter model based on Qwen 2.5 that claims performance comparable to Claude Opus 4.5, predicting local deployment on consumer hardware.
@f14bertolotti: Stellar performance from a 3B model. These results were achieved primarily through post-training refinements on Qwen2.5…
This technical report introduces VibeThinker-3B, a 3B parameter model that achieves frontier-level verifiable reasoning performance through post-training refinements on Qwen2.5-Coder, including curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation, matching or exceeding much larger models like DeepSeek V3.2.
WeiboAI/VibeThinker-3B
VibeThinker-3B is a 3B-parameter model that achieves frontier-level reasoning performance on math, coding, and STEM benchmarks by optimizing the Spectrum-to-Signal Principle (SSP) post-training pipeline, reaching performance comparable to much larger models.
@rasbt: Crazy model! It actually uses the old Qwen2.5-Coder-3B stack and got really great performance with their post-training …
A 3B parameter model using the Qwen2.5-Coder-3B stack achieves coding benchmark scores comparable to Claude Opus 4.5, with detailed post-training techniques including synthetic data, filtering, two-stage SFT, and a novel RL method (MGPO).
@cjzafir: Qwen 3.5 4B model and 8B are too good. I fine-tuned a 4B model today and got 98% accuracy on full precision and Q8 quan…
A developer reports achieving high accuracy with fine-tuned Qwen 3.5 4B and 8B models using Unsloth, suggesting a shift towards specialized Expert Language Models (ELMs) for niche tasks.