Qwen3.6-35B becomes competitive with cloud models when paired with the right agent

Reddit r/LocalLLaMA 04/22/26, 11:22 AM Models

Summary

By pairing Qwen3.6-35B with the little-coder agent scaffold, the model hits 78.7% on the Polyglot coding benchmark, placing in the public top 10 and rivaling cloud models.

A short follow-up to my previous post, where I showed that changing the scaffold around the same 9B Qwen model moved benchmark performance from 19.11% to 45.56%: https://www.reddit.com/r/LocalLLaMA/s/JMHuAGj1LV After feedback from people here, I tried little-coder with Qwen3.6 35B. It now lands in the public Polyglot top 10 with a success rate of 78.7%, making it actually competitive with the best models out there for this benchmark! At this point I’m increasingly convinced that part of the performance gap to cloud models is harness mismatch: we may have been testing local coding models inside scaffolds built for a different class of model. Next up is Terminal Bench, then likely GAIA for research capabilities. Would love to hear your feedback here! Full write up: https://open.substack.com/pub/itayinbarr/p/honey-i-shrunk-the-coding-agent GitHub: https://github.com/itayinbarr/little-coder Full benchmark results: https://github.com/itayinbarr/little-coder/blob/main/docs/benchmark-qwen3.6-35b-a3b.md

Original Article

Similar Articles

Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard!

Reddit r/LocalLLaMA

Qwen3.6-35B-A3B and Qwen3.5-9B models are officially on the Terminal-Bench 2.0 leaderboard, with little-coder achieving 24.6% on the 35B variant, surpassing Gemini 2.5 Pro and Qwen3-Coder-480B, while the 9B model shows that sub-10B local models can compete on hard agentic benchmarks.

Qwen3.6-35B becomes competitive with cloud models when paired with the right agent

Similar Articles

Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard!

Qwen3.7: The Agent Frontier (15 minute read)

Tried Qwen3.6-27B-UD-Q6_K_XL.gguf with CloudeCode, well I can't believe but it is usable

Qwen 3.6 27B is the sweet spot for local development

Qwen3.6-27B

Submit Feedback