Dynamically allocating compute budget to hard set of problems and evolving the sections with Qwen-35B-A3B gets you near GPT-5.4-xHigh on HLE

Reddit r/LocalLLaMA 05/15/26, 08:51 PM Papers

dynamic-compute-allocation qwen budget-allocation hard-problems hle ai-research compute-efficiency

Summary

A method that dynamically allocates compute budget to hard problems using Qwen-35B-A3B achieves performance near GPT-5.4-xHigh on the HLE benchmark.

No content available

Original Article

Similar Articles

On a difficult new SWE benchmark, ProgramBench, GPT5.5 high/xhigh solves a task for first time, significantly outperforms Opus 4.7

Reddit r/singularity

GPT5.5 achieved the first solve on the difficult ProgramBench SWE benchmark, significantly outperforming Opus 4.7.

Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context

Reddit r/LocalLLaMA

The author shares a high-performance local inference configuration for running Qwen3.6 35B A3B on limited hardware (8GB VRAM, 32GB RAM) using a modified llama.cpp with TurboQuant support, achieving ~37-51 tok/sec with ~190k context.

I tested Qwen3.6-27B, Qwen3.6-35B-A3B, Qwen3.5-27B and Gemma 4 on the same real architecture-writing task on an RTX 5090

Reddit r/LocalLLaMA

A hands-on benchmark of four local LLMs—Qwen3.6-27B, Qwen3.6-35B, Qwen3.5-27B and Gemma 4—on a 20k-token architecture-writing task shows Qwen3.6-27B delivering the best overall balance of clarity, completeness and usefulness on an RTX 5090.

Tried Qwen3.6-27B-UD-Q6_K_XL.gguf with CloudeCode, well I can't believe but it is usable

Reddit r/LocalLLaMA

User reports surprisingly usable coding performance from Qwen3-27B-UD-Q6_K_XL.gguf running locally on RTX 5090 at ~50 tok/s with 200K context, marking a significant leap in local model quality.

Qwen3.5-27B, Qwen3.5-122B, and Qwen3.6-35B on 4x RTX 3090 — MoEs struggle with strict global rules

Reddit r/LocalLLaMA

A user benchmarks three Qwen models (Qwen3.5-27B dense, Qwen3.5-122B-A10B MoE, Qwen3.6-35B-A3B MoE) on 4x RTX 3090 GPUs under real agentic workloads, finding that MoE models consistently underperform the dense 27B at following strict global rules despite speed advantages, with the Qwen3.6-35B leading in generation throughput.

Similar Articles

On a difficult new SWE benchmark, ProgramBench, GPT5.5 high/xhigh solves a task for first time, significantly outperforms Opus 4.7

Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context

I tested Qwen3.6-27B, Qwen3.6-35B-A3B, Qwen3.5-27B and Gemma 4 on the same real architecture-writing task on an RTX 5090

Tried Qwen3.6-27B-UD-Q6_K_XL.gguf with CloudeCode, well I can't believe but it is usable

Qwen3.5-27B, Qwen3.5-122B, and Qwen3.6-35B on 4x RTX 3090 — MoEs struggle with strict global rules

Submit Feedback