Tag
A tweet recommending --ddtree-budget 36 for Nvidia RTX 4090, claiming 2.5x speedup during decoding for Qwen3.6_27B.
The author demonstrates how to reduce RTX 4090 power consumption by up to 40% while running quantized Qwen models via llama.cpp, without sacrificing inference speed. By capping GPU power limits through nvidia-smi and adjusting llama-server parameters, users can significantly lower heat, noise, and extend hardware lifespan.