Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard!

Reddit r/LocalLLaMA Models

Summary

Qwen3.6-35B-A3B and Qwen3.5-9B models are officially on the Terminal-Bench 2.0 leaderboard, with little-coder achieving 24.6% on the 35B variant, surpassing Gemini 2.5 Pro and Qwen3-Coder-480B, while the 9B model shows that sub-10B local models can compete on hard agentic benchmarks.

Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard! little-coder × Qwen3.6-35B-A3B hit 24.6% (±3.2), and **now land above Gemini 2.5 Pro on Gemini CLI (19.6%)** and Qwen3-Coder-480B on Terminus 2 (23.9%). I didn’t expect the scaffold-model gap from Polyglot to hold on a benchmark this hard but it did! little-coder × Qwen3.5-9B came in at 9.2% which is more humble. Yet, it also shows again that **sub-10B local models are now measurable on a hard agentic benchmark**, not assumed unworthy of a slot. Just felt it was right to follow up here as you requested, and say a genuine thanks to this community. It really is the place currently driving innovation toward less compute, and this run exists there because you pushed for it. Now it’s time to head for the top of the leaderboard 👀 let’s go open source! https://github.com/itayinbarr/little-coder
Original Article

Similar Articles

The Qwen 3.6 35B A3B hype is real!!!

Reddit r/LocalLLaMA

The author benchmarks small local LLMs, highlighting Qwen 3.6 35B A3B for its superior ability to map academic code to research papers compared to models like Gemma 4 and Nemotron 3 Nano.

Qwen 3.6 27B on DeepSWE

Reddit r/LocalLLaMA

Qwen 3.6 27B scored 2% on the DeepSWE benchmark, placing 18/20 above Haiku 4.5 and Minimax M2.7, highlighting the gap between local and leading-edge models.