@cjzafir: A 3B parameter SLM: VibeThinker (fine-tuned on Qwen 2.5) matches Claude Opus 4.5 performance. Same performance as: > De…

X AI KOLs Timeline 06/17/26, 03:39 PM Models

small-language-model qwen fine-tuning post-training reasoning efficient

Summary

VibeThinker, a 3B parameter model fine-tuned on Qwen 2.5, achieves performance comparable to Claude Opus 4.5 and much larger models like DeepSeek v3 through innovative post-training that includes multi-path thinking and staged training on math, coding, and science.

A 3B parameter SLM: VibeThinker (fine-tuned on Qwen 2.5) matches Claude Opus 4.5 performance. Same performance as: > Deepseek v3 (671B parameters) but 220x smaller. > Kimi k2.5 (1T parameters) but 330x smaller. > GLM-5 (744B parameters) but 248x smaller. You'll be able to run this model on your Macs. Its not a fluke. Their post training work is very interesting. - They trained in a smart order: math first → coding second → science third. - For each problem, they made the model think in multiple different ways before picking the best answer. - They trained in two stages: first on many normal problems, then on hard, long reasoning problems. - They focused on verifiable high quality synthetic dataset. Also heavily filtered all bad examples. - They focused on long horizon tasks. Trained it on long text all at once (instead of gradually making it longer) so it can think for a long time without getting confused. - At the end, they added training to make the model give shorter but still correct answers (more efficient). Post-training (finetuning) innovation is very important and what happened to Fable 5 should make you realize how important it is to own your intelligence. I'll be testing this model and share my findings.

Original Article

View Cached Full Text

Cached at: 06/18/26, 12:05 AM

A 3B parameter SLM: VibeThinker (fine-tuned on Qwen 2.5) matches Claude Opus 4.5 performance.

Same performance as:

Deepseek v3 (671B parameters) but 220x smaller. Kimi k2.5 (1T parameters) but 330x smaller. GLM-5 (744B parameters) but 248x smaller.

You’ll be able to run this model on your Macs.

Its not a fluke. Their post training work is very interesting.

They trained in a smart order: math first → coding second → science third.
For each problem, they made the model think in multiple different ways before picking the best answer.
They trained in two stages: first on many normal problems, then on hard, long reasoning problems.
They focused on verifiable high quality synthetic dataset. Also heavily filtered all bad examples.
They focused on long horizon tasks. Trained it on long text all at once (instead of gradually making it longer) so it can think for a long time without getting confused.
At the end, they added training to make the model give shorter but still correct answers (more efficient).

Post-training (finetuning) innovation is very important and what happened to Fable 5 should make you realize how important it is to own your intelligence.

I’ll be testing this model and share my findings.

Francesco Bertolotti (@f14bertolotti): Stellar performance from a 3B model. These results were achieved primarily through post-training refinements on Qwen2.5-Coder. The paper doesn’t provide many details, but it appears they distill from RL ckpts and then do a final RL-based instruct RL.

🔗

@cjzafir: A 3B parameter SLM: VibeThinker (fine-tuned on Qwen 2.5) matches Claude Opus 4.5 performance. Same performance as: > De…

Similar Articles

@TheAhmadOsman: 3B model with Opus 4.5 performance VibeThinker 3B (based on Qwen 2.5)

@f14bertolotti: Stellar performance from a 3B model. These results were achieved primarily through post-training refinements on Qwen2.5…

WeiboAI/VibeThinker-3B

@rasbt: Crazy model! It actually uses the old Qwen2.5-Coder-3B stack and got really great performance with their post-training …

@cjzafir: Qwen 3.5 4B model and 8B are too good. I fine-tuned a 4B model today and got 98% accuracy on full precision and Q8 quan…

Submit Feedback

Similar Articles

@TheAhmadOsman: 3B model with Opus 4.5 performance VibeThinker 3B (based on Qwen 2.5)

@f14bertolotti: Stellar performance from a 3B model. These results were achieved primarily through post-training refinements on Qwen2.5…

@rasbt: Crazy model! It actually uses the old Qwen2.5-Coder-3B stack and got really great performance with their post-training …

@cjzafir: Qwen 3.5 4B model and 8B are too good. I fine-tuned a 4B model today and got 98% accuracy on full precision and Q8 quan…