Youtuber tries Qwen 3.5 35B, Qwen 3.6 35B, and Gemma 4 27b to reverse engineer some large JS, with good results for Qwen 3.6

Reddit r/LocalLLaMA Models

Summary

Qwen 3.6 35B achieves near-perfect 283/285 line recall on a 108 k-token JS file, outperforming Gemma 4 27B (6/16 passes) and fixing long-context weaknesses of earlier Qwen versions.

Found this interesting and thought i'd share. A big problem i've had with Qwen 3 MoE is how bad at instruction following it was, and also, it's 'dumb point' in the context window was really low. I was so turned off by it that i never tried Qwen 3.5 and kept using SEED OSS 36B for coding. 3.6 appears to have better instruction following than prior models, do you find this to be the case yourself?
Original Article
View Cached Full Text

Cached at: 04/22/26, 05:10 AM

TL;DR: A head-to-head recall test on a 108 k-token JS file shows Qwen 3.6 35B remembering 283 of 285 target lines while Gemma 4 27B tops out at 6 of 16 tries, proving the new Qwen release fixes the “dumb point” that plagued earlier versions. ## The challenge: reverse-engineering 8000 lines of minified JavaScript The author needs a **local LLM** that can digest a 336 KB `service.js` (beautified to 108 k tokens) and extract the login-plus-API sequence for an LTE modem’s signal-strength scraper. The file contains 8 000+ lines of repetitive boiler-plate—ideal torture material for context-window stress testing. ## Test design: 16 spot-checks on exact line recall To avoid IDE helpers skewing results, a standalone client feeds the entire file plus a single prompt: “From the function starting at line X, quote the 20 lines that immediately follow its opening brace.” 1300 functions exist; 16 are sampled at random. A run is scored “pass” if ≥8 lines match the ground truth. All models use 8-bit KV-cache (Q8) to stay within 24 GB VRAM. ## Round 1 – Gemma 4 27B (A4B) ### Unsloth Q4K-XL - 6 / 16 passes - Giant return statements consistently truncated - Several commands silently dropped ### LM-Studio Q4KM - 2 / 16 passes - Same sliding-window 1 k-token limitation evident ## Round 2 – Qwen 3.5 35B (DeltaNet) ### LM-Studio community build - 11 / 16 passes - 245 correct lines recalled, 98 bonus lines also accurate, only 50 truncated - No failures on large return blocks ### Unsloth Q4KM - 10 / 16 passes - Slightly worse; confirms quantization choice matters ## Round 3 – Qwen 3.6 35B (A3B) ### LM-Studio - 15 / 16 perfect recalls, total miss count: 9 lines ### llama.cpp same quant - 283 / 285 lines exact, only 2 hallucinated - Effectively zero context degradation at 108 k tokens ## Take-away Gemma 4’s 1 k sliding-window attention makes long-file reverse engineering unreliable. Qwen 3.6 35B delivers **near-perfect positional recall** under the same memory budget, finally erasing the “dumb point” that discouraged many from the Qwen 3-series MoE models. Source: [YouTube – mr_zerolith](https://www.youtube.com/watch?v=ONQcX9s6_co)

Similar Articles

Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it

Reddit r/LocalLLaMA

A user compares Qwen3.6 35B-A3B and Gemma 4 26B-A4B-IT running locally on a 16GB VRAM GPU via LM Studio, finding Qwen3.6 produces more detailed outputs while both run at comparable speeds. The post is an informal community comparison using quantized models.

Qwen 3.6 27B kick balls

Reddit r/LocalLLaMA

A user shares their positive experience using Qwen 3.6 27B locally for complex research and coding, finding it outperforms Gemini Pro in career advice and immigration research, while also noting performance issues with Gemma 4 31B.

Qwen 3.6 35B A3B vs Qwen 3.5 122B A10B

Reddit r/LocalLLaMA

User reports Qwen 3.5 122B significantly outperforms Qwen 3.6 35B on multi-step tasks despite benchmark claims, questioning if quantization or setup issues are to blame.