The Qwen 3.6 35B A3B hype is real!!!

Reddit r/LocalLLaMA 05/11/26, 07:51 AM Models

local-llm qwen benchmark code-understanding open-weight long-context

Summary

The author benchmarks small local LLMs, highlighting Qwen 3.6 35B A3B for its superior ability to map academic code to research papers compared to models like Gemma 4 and Nemotron 3 Nano.

My personal test for small local LLM intelligence is to check whether a model has any ability to understand the code that I write for my own academic research. My research is on some pretty niche topics and I doubt that anything like it is substantively present in the training sets for LLMs. A few months ago, small local models' ability to understand my code was nominal at best with [Devstral Small 2 being the top performer](https://www.reddit.com/r/LocalLLaMA/comments/1ry93gz/devstral_small_2_24b_severely_underrated/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button). However, several small open weight models now have methods of accommodating fairly **long contexts** (gated delta net, hybrid Mamba2, sliding window attention) which makes them ***extremely*** **smarter**. I can now feed a model an entire academic paper along with accompanying code and ask it to use the paper to work out what the code is doing. I just spent a couple days experimenting with: * Qwen 3.6 35B A3B * Qwen 3.6 27B * Gemma 4 26B A4B * Nemotron 3 Nano **All** of them were able to comprehend my code significantly better than what any *small* local model could do a few months ago. I did try Devstral Small 2 since I recently went from a single 16GB graphics card to two; however, I simply couldn't fit the long context in 32GB of ram. I hope Mistral releases a new small model with a gated delta net, because I think it could take the throne. [These are my detailed findings](https://github.com/nathanlgabriel/paper_code_mapping_assessment/blob/main/README.md) from asking local models to explain how my code maps to the research paper it corresponds to. TLDR: All four models listed above are incredibly capable local models, with Qwen 3.6 35B A3B standing out as the best. I'm also inclined to think that an intelligent human with *any* of these four models is more capable than something like Opus 4.7 on its own (see the detailed findings). Please let me know your thoughts!

Original Article

The Qwen 3.6 35B A3B hype is real!!!

Similar Articles

I tested Qwen3.6-27B, Qwen3.6-35B-A3B, Qwen3.5-27B and Gemma 4 on the same real architecture-writing task on an RTX 5090

Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it

Qwen 3.6 35B A3B vs Qwen 3.5 122B A10B

I benchmarked 21 local LLMs on a MacBook Air M5 for code quality AND speed

Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context

Submit Feedback

Similar Articles

I tested Qwen3.6-27B, Qwen3.6-35B-A3B, Qwen3.5-27B and Gemma 4 on the same real architecture-writing task on an RTX 5090

Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it

Qwen 3.6 35B A3B vs Qwen 3.5 122B A10B

I benchmarked 21 local LLMs on a MacBook Air M5 for code quality AND speed

Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context