Qwen 3.6 27B on DeepSWE
Summary
Qwen 3.6 27B scored 2% on the DeepSWE benchmark, placing 18/20 above Haiku 4.5 and Minimax M2.7, highlighting the gap between local and leading-edge models.
Similar Articles
Qwen 3.7 Max scores 60.6% on SWE-Bench Pro
Qwen 3.7 Max achieves a score of 60.6% on SWE-Bench Pro, demonstrating competitive performance on software engineering tasks.
Qwen3.7 Max scored by Artificial Analysis, 27B/35B waiting room
Qwen3.7 Max ranks 5th on Artificial Analysis benchmarks, matching GPT-5.4 and outperforming Gemini 3.5 Flash, while Qwen3.6 27B trails significantly.
Qwen3.7: The Agent Frontier (15 minute read)
Alibaba's Qwen team has released Qwen3.7-Max, a proprietary agent-foundation model achieving top scores on multiple benchmarks including Terminal-Bench 2.0, SWE-Pro, and GPQA Diamond, with consistent performance across various code environments.
Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard!
Qwen3.6-35B-A3B and Qwen3.5-9B models are officially on the Terminal-Bench 2.0 leaderboard, with little-coder achieving 24.6% on the 35B variant, surpassing Gemini 2.5 Pro and Qwen3-Coder-480B, while the 9B model shows that sub-10B local models can compete on hard agentic benchmarks.
Qwen 3.6 35B A3B vs Qwen 3.5 122B A10B
User reports Qwen 3.5 122B significantly outperforms Qwen 3.6 35B on multi-step tasks despite benchmark claims, questioning if quantization or setup issues are to blame.