@witcheer: this is the first Qwen3.6-27B coding tune I've measured that improves real bug-fixing (!!!). - quality (MMLU/ARC/HellaS…
Summary
A community fine-tune of Qwen3.6-27B improves real bug-fixing on SWE-bench while maintaining quality, unlike synthetic distillations that regress.
View Cached Full Text
Cached at: 06/17/26, 04:00 PM
this is the first Qwen3.6-27B coding tune I’ve measured that improves real bug-fixing (!!!).
-
quality (MMLU/ARC/HellaSwag/GSM8K/HumanEval): 93.3 vs base 94.0. flat.
-
agentic score (native tool-calling, 40 tasks): 98.0 vs base 98.6. flat.
-
real bugs (SWE-bench Verified, 30, official harness): 20/30 vs base 19/30. up. it solves 2 the base can’t and gives up less (6 empty patches vs 8).
-
MTP drafter: 2.0 to 2.4x vs base 1.8 to 2.2x. the fine-tune kept its drafter.
this is the third Qwen3.6-27B coding tune I’ve benched. the other two were distilled on synthetic agent traces and both regressed on real bugs.
across all three the synthetic agentic score is a 2.4pt band (97.6 to 100) while real SWE spans 11 to 20.
the cheap axis can't tell them apart.
pi-tune even has the lowest quality of the group and the best real resolve. real capability tracks the training data, not the agentic coder label: real traces improved it, synthetic distill narrowed it.
only the reality anchor could see the difference.
> **Tongyi Lab (@Ali_TongyiLab):**
> We are pleased to highlight an excellent community model from developer : Qwen3.6-27B-MTP-pi-reasoning-GGUF.
>
> Built on our Qwen3.6-27B base model, this release focuses on optimizing automated programming and debugging workflows for local coding agents.
>
> If you are exploring local
Similar Articles
I can't get Qwen3.6 27B to outperform Qwen-Coder-Next and I'm not sure why
A user reports that Qwen-Coder-Next outperforms Qwen3.6 27B in both real-world tests and synthetic benchmarks, despite others praising 27B, and seeks advice on possible setup issues.
bytkim/Qwen3.6-27B-MTP-pi-tune-GGUF
bytkim releases a 4-bit QLoRA SFT Multi-Token Prediction fine-tune of Qwen3.6-27B, packaged as GGUF for local agentic coding. The no-thinking tune is designed for low-latency direct output in agent loops.
@populartourist: Having worked consistently with Qwen3.6 27B NVFP4 on repos - it's clear that this quant is not reliable, at least for c…
The user reports that the Qwen3.6 27B NVFP4 quantization is unreliable for coding, with inconsistent quality despite high throughput, and suggests that Q4_K_M may be more consistent.
@songjunkr: SuperQwen3.6-35B-DFlash-MLX is ready. Benchmark: Comparison of original vs. tuned versions on 100 actual items from com…
A fine-tuned 35B-parameter Qwen model optimized for MLX shows benchmark gains on GPQA Diamond, MMLU-Pro, IFEval, HumanEval+ and MBPP+ and ships without censorship.
Any news (or hope) of Qwen-3.6 14B and 9B distills for local coding ?
The author inquires about potential distilled 9B and 14B variants of the Qwen-3.6 model for local coding, citing specific tool-calling and file structure issues encountered with Qwen-3.5 9B on limited hardware.