If you're using Windows, disable memory compression to stop bottlenecks!

Reddit r/LocalLLaMA Tools

Summary

A user shares a fix for performance bottlenecks when running AI models on AMD GPUs in Windows 11 by disabling memory compression via the command 'Disable-mmagent -mc'.

This is a follow up to this post: [https://www.reddit.com/r/LocalLLaMA/comments/1ta3ben/dont\_you\_have\_issues\_in\_w11\_with\_amd\_gpu\_where/](https://www.reddit.com/r/LocalLLaMA/comments/1ta3ben/dont_you_have_issues_in_w11_with_amd_gpu_where/) I fixed this never-ending issue by just disabling memory compression via admin terminal: `Disable-mmagent -mc` All issues have been resolved, I can open any game and my IA won't slow down at all like before (even when the games are closed)!
Original Article

Similar Articles

Running local models on an M4 with 24GB memory

Hacker News Top

A guide on running local AI models like Qwen 3.5-9B on an M4 MacBook with 24GB RAM using tools like LM Studio, Ollama, and pi, including specific configuration tips for optimal performance.

Stop wasting electricity

Reddit r/LocalLLaMA

The author demonstrates how to reduce RTX 4090 power consumption by up to 40% while running quantized Qwen models via llama.cpp, without sacrificing inference speed. By capping GPU power limits through nvidia-smi and adjusting llama-server parameters, users can significantly lower heat, noise, and extend hardware lifespan.