Dynamically allocating compute budget to hard set of problems and evolving the sections with Qwen-35B-A3B gets you near GPT-5.4-xHigh on HLE

Reddit r/LocalLLaMA Papers

Summary

A method that dynamically allocates compute budget to hard problems using Qwen-35B-A3B achieves performance near GPT-5.4-xHigh on the HLE benchmark.

No content available
Original Article

Similar Articles

Running Qwen3.6 35b a3b on 8gb vram and 32gb ram ~190k context

Reddit r/LocalLLaMA

The author shares a high-performance local inference configuration for running Qwen3.6 35B A3B on limited hardware (8GB VRAM, 32GB RAM) using a modified llama.cpp with TurboQuant support, achieving ~37-51 tok/sec with ~190k context.