Found a way to cool the DGX

Reddit r/LocalLLaMA News

Summary

A user reports successfully using tap water to cool a DGX server while running the Qwen3.5-122b model at high GPU utilization, maintaining safe temperatures.

Tap water keeps the temperature below 68 degree Celsius at 95% GPU utilization running Qwen3.5-122b-a10B Q6\_K precision. 110 GB Memory usage, 80k context window, 18.77 tokens/second for continuous vision analyses. Not sure how often do I have to change the water but so far so good.
Original Article

Similar Articles

Stop wasting electricity

Reddit r/LocalLLaMA

The author demonstrates how to reduce RTX 4090 power consumption by up to 40% while running quantized Qwen models via llama.cpp, without sacrificing inference speed. By capping GPU power limits through nvidia-smi and adjusting llama-server parameters, users can significantly lower heat, noise, and extend hardware lifespan.

Finding the 4x 3090 Sweet Spot

Reddit r/LocalLLaMA

A user shares power limit testing on a 4x RTX 3090 setup running Qwen3.6-27B with vLLM, finding 220W as the sweet spot for peak efficiency with minimal throughput loss.

RTX Pro 4500 Blackwell - Qwen 3.6 27B?

Reddit r/LocalLLaMA

A developer shares local inference benchmarks and systemd configurations for running the Qwen3.6-27B model on an NVIDIA RTX Pro 4500 Blackwell GPU using llama.cpp. The post requests optimization tips for throughput and explores potential use cases for larger models.