The Qwen 3.6 35B A3B hype is real!!!

Reddit r/LocalLLaMA Models

Summary

The author benchmarks small local LLMs, highlighting Qwen 3.6 35B A3B for its superior ability to map academic code to research papers compared to models like Gemma 4 and Nemotron 3 Nano.

My personal test for small local LLM intelligence is to check whether a model has any ability to understand the code that I write for my own academic research. My research is on some pretty niche topics and I doubt that anything like it is substantively present in the training sets for LLMs. A few months ago, small local models' ability to understand my code was nominal at best with [Devstral Small 2 being the top performer](https://www.reddit.com/r/LocalLLaMA/comments/1ry93gz/devstral_small_2_24b_severely_underrated/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button). However, several small open weight models now have methods of accommodating fairly **long contexts** (gated delta net, hybrid Mamba2, sliding window attention) which makes them ***extremely*** **smarter**. I can now feed a model an entire academic paper along with accompanying code and ask it to use the paper to work out what the code is doing. I just spent a couple days experimenting with: * Qwen 3.6 35B A3B * Qwen 3.6 27B * Gemma 4 26B A4B * Nemotron 3 Nano **All** of them were able to comprehend my code significantly better than what any *small* local model could do a few months ago. I did try Devstral Small 2 since I recently went from a single 16GB graphics card to two; however, I simply couldn't fit the long context in 32GB of ram. I hope Mistral releases a new small model with a gated delta net, because I think it could take the throne. [These are my detailed findings](https://github.com/nathanlgabriel/paper_code_mapping_assessment/blob/main/README.md) from asking local models to explain how my code maps to the research paper it corresponds to. TLDR: All four models listed above are incredibly capable local models, with Qwen 3.6 35B A3B standing out as the best. I'm also inclined to think that an intelligent human with *any* of these four models is more capable than something like Opus 4.7 on its own (see the detailed findings). Please let me know your thoughts!
Original Article

Similar Articles

Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it

Reddit r/LocalLLaMA

A user compares Qwen3.6 35B-A3B and Gemma 4 26B-A4B-IT running locally on a 16GB VRAM GPU via LM Studio, finding Qwen3.6 produces more detailed outputs while both run at comparable speeds. The post is an informal community comparison using quantized models.

Qwen 3.6 35B A3B vs Qwen 3.5 122B A10B

Reddit r/LocalLLaMA

User reports Qwen 3.5 122B significantly outperforms Qwen 3.6 35B on multi-step tasks despite benchmark claims, questioning if quantization or setup issues are to blame.

I benchmarked 21 local LLMs on a MacBook Air M5 for code quality AND speed

Reddit r/LocalLLaMA

A developer benchmarked 21 local LLMs on MacBook Air M5 using HumanEval+ and found Qwen 3.6 35B-A3B (MoE) leads at 89.6% with 16.9 tok/s, while Qwen 2.5 Coder 7B offers the best RAM-to-performance ratio at 84.2% in 4.5 GB. Notably, Gemma 4 models significantly underperformed expectations (31.1% for 31B), possibly due to Q4_K_M quantization effects.

Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context

Reddit r/LocalLLaMA

The author shares a high-performance local inference configuration for running Qwen3.6 35B A3B on limited hardware (8GB VRAM, 32GB RAM) using a modified llama.cpp with TurboQuant support, achieving ~37-51 tok/sec with ~190k context.