Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard!

Reddit r/LocalLLaMA 05/16/26, 07:19 AM Models

qwen terminal-bench open-source agentic-benchmark little-coder local-models

Summary

Qwen3.6-35B-A3B and Qwen3.5-9B models are officially on the Terminal-Bench 2.0 leaderboard, with little-coder achieving 24.6% on the 35B variant, surpassing Gemini 2.5 Pro and Qwen3-Coder-480B, while the 9B model shows that sub-10B local models can compete on hard agentic benchmarks.

Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard! little-coder × Qwen3.6-35B-A3B hit 24.6% (±3.2), and **now land above Gemini 2.5 Pro on Gemini CLI (19.6%)** and Qwen3-Coder-480B on Terminus 2 (23.9%). I didn’t expect the scaffold-model gap from Polyglot to hold on a benchmark this hard but it did! little-coder × Qwen3.5-9B came in at 9.2% which is more humble. Yet, it also shows again that **sub-10B local models are now measurable on a hard agentic benchmark**, not assumed unworthy of a slot. Just felt it was right to follow up here as you requested, and say a genuine thanks to this community. It really is the place currently driving innovation toward less compute, and this run exists there because you pushed for it. Now it’s time to head for the top of the leaderboard 👀 let’s go open source! https://github.com/itayinbarr/little-coder

Original Article

Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard!

Similar Articles

The Qwen 3.6 35B A3B hype is real!!!

Qwen3.7 Max scored by Artificial Analysis, 27B/35B waiting room

Qwen 3.6 27B on DeepSWE

gemma-4-12b-it vs Qwen3.5-9B on shared benchmarks: Qwen is overall winner beating gemma in 5/8 benchmarks despite a smaller footprint

Qwen3.6-35B becomes competitive with cloud models when paired with the right agent

Submit Feedback

Similar Articles

The Qwen 3.6 35B A3B hype is real!!!

Qwen3.7 Max scored by Artificial Analysis, 27B/35B waiting room
Qwen3.7 Max ranks 5th on Artificial Analysis benchmarks, matching GPT-5.4 and outperforming Gemini 3.5 Flash, while Qwen3.6 27B trails significantly.

gemma-4-12b-it vs Qwen3.5-9B on shared benchmarks: Qwen is overall winner beating gemma in 5/8 benchmarks despite a smaller footprint
Qwen3.5-9B outperforms gemma-4-12b-it on 5 of 8 benchmarks despite having a smaller footprint, with gemma only slightly better at coding.

Qwen3.6-35B becomes competitive with cloud models when paired with the right agent