Dynamically allocating compute budget to hard set of problems and evolving the sections with Qwen-35B-A3B gets you near GPT-5.4-xHigh on HLE
Summary
A method that dynamically allocates compute budget to hard problems using Qwen-35B-A3B achieves performance near GPT-5.4-xHigh on the HLE benchmark.
Similar Articles
On a difficult new SWE benchmark, ProgramBench, GPT5.5 high/xhigh solves a task for first time, significantly outperforms Opus 4.7
GPT5.5 achieved the first solve on the difficult ProgramBench SWE benchmark, significantly outperforming Opus 4.7.
Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context
The author shares a high-performance local inference configuration for running Qwen3.6 35B A3B on limited hardware (8GB VRAM, 32GB RAM) using a modified llama.cpp with TurboQuant support, achieving ~37-51 tok/sec with ~190k context.
I tested Qwen3.6-27B, Qwen3.6-35B-A3B, Qwen3.5-27B and Gemma 4 on the same real architecture-writing task on an RTX 5090
A hands-on benchmark of four local LLMs—Qwen3.6-27B, Qwen3.6-35B, Qwen3.5-27B and Gemma 4—on a 20k-token architecture-writing task shows Qwen3.6-27B delivering the best overall balance of clarity, completeness and usefulness on an RTX 5090.
Tried Qwen3.6-27B-UD-Q6_K_XL.gguf with CloudeCode, well I can't believe but it is usable
User reports surprisingly usable coding performance from Qwen3-27B-UD-Q6_K_XL.gguf running locally on RTX 5090 at ~50 tok/s with 200K context, marking a significant leap in local model quality.
Qwen3.5-27B, Qwen3.5-122B, and Qwen3.6-35B on 4x RTX 3090 — MoEs struggle with strict global rules
A user benchmarks three Qwen models (Qwen3.5-27B dense, Qwen3.5-122B-A10B MoE, Qwen3.6-35B-A3B MoE) on 4x RTX 3090 GPUs under real agentic workloads, finding that MoE models consistently underperform the dense 27B at following strict global rules despite speed advantages, with the Qwen3.6-35B leading in generation throughput.