Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard!
Summary
Qwen3.6-35B-A3B and Qwen3.5-9B models are officially on the Terminal-Bench 2.0 leaderboard, with little-coder achieving 24.6% on the 35B variant, surpassing Gemini 2.5 Pro and Qwen3-Coder-480B, while the 9B model shows that sub-10B local models can compete on hard agentic benchmarks.
Similar Articles
The Qwen 3.6 35B A3B hype is real!!!
The author benchmarks small local LLMs, highlighting Qwen 3.6 35B A3B for its superior ability to map academic code to research papers compared to models like Gemma 4 and Nemotron 3 Nano.
Qwen3.7 Max scored by Artificial Analysis, 27B/35B waiting room
Qwen3.7 Max ranks 5th on Artificial Analysis benchmarks, matching GPT-5.4 and outperforming Gemini 3.5 Flash, while Qwen3.6 27B trails significantly.
Qwen 3.6 27B on DeepSWE
Qwen 3.6 27B scored 2% on the DeepSWE benchmark, placing 18/20 above Haiku 4.5 and Minimax M2.7, highlighting the gap between local and leading-edge models.
gemma-4-12b-it vs Qwen3.5-9B on shared benchmarks: Qwen is overall winner beating gemma in 5/8 benchmarks despite a smaller footprint
Qwen3.5-9B outperforms gemma-4-12b-it on 5 of 8 benchmarks despite having a smaller footprint, with gemma only slightly better at coding.
Qwen3.6-35B becomes competitive with cloud models when paired with the right agent
By pairing Qwen3.6-35B with the little-coder agent scaffold, the model hits 78.7% on the Polyglot coding benchmark, placing in the public top 10 and rivaling cloud models.