@witcheer: this is the first Qwen3.6-27B coding tune I've measured that improves real bug-fixing (!!!). - quality (MMLU/ARC/HellaS…

X AI KOLs Timeline Models

Summary

A community fine-tune of Qwen3.6-27B improves real bug-fixing on SWE-bench while maintaining quality, unlike synthetic distillations that regress.

this is the first Qwen3.6-27B coding tune I've measured that improves real bug-fixing (!!!). - quality (MMLU/ARC/HellaSwag/GSM8K/HumanEval): 93.3 vs base 94.0. flat. - agentic score (native tool-calling, 40 tasks): 98.0 vs base 98.6. flat. - real bugs (SWE-bench Verified, 30, official harness): 20/30 vs base 19/30. up. it solves 2 the base can't and gives up less (6 empty patches vs 8). - MTP drafter: 2.0 to 2.4x vs base 1.8 to 2.2x. the fine-tune kept its drafter. this is the third Qwen3.6-27B coding tune I've benched. the other two were distilled on synthetic agent traces and both regressed on real bugs. ~~~ across all three the synthetic agentic score is a 2.4pt band (97.6 to 100) while real SWE spans 11 to 20. the cheap axis can't tell them apart. pi-tune even has the lowest quality of the group and the best real resolve. real capability tracks the training data, not the agentic coder label: real traces improved it, synthetic distill narrowed it. only the reality anchor could see the difference.
Original Article
View Cached Full Text

Cached at: 06/17/26, 04:00 PM

this is the first Qwen3.6-27B coding tune I’ve measured that improves real bug-fixing (!!!).

  • quality (MMLU/ARC/HellaSwag/GSM8K/HumanEval): 93.3 vs base 94.0. flat.

  • agentic score (native tool-calling, 40 tasks): 98.0 vs base 98.6. flat.

  • real bugs (SWE-bench Verified, 30, official harness): 20/30 vs base 19/30. up. it solves 2 the base can’t and gives up less (6 empty patches vs 8).

  • MTP drafter: 2.0 to 2.4x vs base 1.8 to 2.2x. the fine-tune kept its drafter.

this is the third Qwen3.6-27B coding tune I’ve benched. the other two were distilled on synthetic agent traces and both regressed on real bugs.

across all three the synthetic agentic score is a 2.4pt band (97.6 to 100) while real SWE spans 11 to 20. 
the cheap axis can't tell them apart. 

pi-tune even has the lowest quality of the group and the best real resolve. real capability tracks the training data, not the agentic coder label: real traces improved it, synthetic distill narrowed it. 

only the reality anchor could see the difference.

> **Tongyi Lab (@Ali_TongyiLab):**
> We are pleased to highlight an excellent community model from developer : Qwen3.6-27B-MTP-pi-reasoning-GGUF.
> 
> Built on our Qwen3.6-27B base model, this release focuses on optimizing automated programming and debugging workflows for local coding agents.
> 
> If you are exploring local

Similar Articles

bytkim/Qwen3.6-27B-MTP-pi-tune-GGUF

Hugging Face Models Trending

bytkim releases a 4-bit QLoRA SFT Multi-Token Prediction fine-tune of Qwen3.6-27B, packaged as GGUF for local agentic coding. The no-thinking tune is designed for low-latency direct output in agent loops.