Big News for AMD / Strix Halo+ Owners

Reddit r/LocalLLaMA 06/24/26, 03:16 PM News

amd strix-halo npu hybrid-model rocm lemonade ai-inference

Summary

The NPU on AMD Strix Halo devices is now usable for AI inference, enabling hybrid mode that combines NPU and iGPU for faster prompt processing. Tools like Lemonade and AMD's ROCm software make this possible.

Admittedly this is news for me, but I'm hoping it could be of some use to others here as well! So, THE NPU IS USABLE!! I've owned an AMD Ryzen 395 Max AI+ (or whatever the naming is lol) for about a year now and have relied solely on GGUFs and Vulkan. I acknowledge that the AMD Ryzen AI team has been working hard to get their ROCm software up to speed w/ their hardware. https://kyuz0.github.io/amd-strix-halo-toolboxes/ This database did NOT look so ROCm friendly 6 months ago. Why should I care? If you own a device w/ both an NPU and a iGPU (like the strix halo series) then you WANT hybrid models. The NPU is CRAZY FAST at PromptProcessing, and can run parallel to gpu firing. Okay, What is Hybrid Mode? So, LLMs can run through the NPU only. If they're built for it. Check out "FastFlowLM NPU" models for examples that do that. BUT HYBRID mode combines the best of both, and FINALLY utilizes the hardware purchased nearly a year go (for some, more than that). What can i do to test this? Download Lemonade! Thanks to their efforts that focus primarily on Ryzen AI and working directly w AMD, I've FINALLY got my machine working in ways it couldn't a year ago and Lemonade made it happen. It's GUI is ultra bare-bones and I wouldn't recommend it for any actual agentic/chat/harness usage BUT being able to sanity-test software without investing days or weeks into it? 10/10 Here's the link: lemonade-server.ai Speaking of links, read more about Hybrid Mode and making your own Hybrid Models here: https://ryzenai.docs.amd.com/en/latest/llm/overview.htmlhttps://ryzenai.docs.amd.com/en/latest/llm/overview.html --- So, that's it. Just wanted to share. REALLY EXCITED that my year old computer is still advancing in the software science of it all. I have a single wishlist/request now: MTP-supported Hybrid Models. Qwen 3.6 has that speedup tech introduced by Unsloth, and AMD has a guide for "new processor shapes" since 3.6 GGUF can't simply be "converted to ONNX". Here's that guide: https://ryzenai.docs.amd.com/en/latest/oga_op_prepare.html If anyone attempts it, please share on huggingface! This was all written by hand btw, no llm assistance, just passionate dev obsessed w "new shiny".

Original Article

Big News for AMD / Strix Halo+ Owners

Similar Articles

xdna-top: unified NPU+iGPU terminal monitor for Strix Halo (Ryzen AI Max) — finally see the NPU work

ROCm 7.13 nightly adds strix halo optimizations

@pupposandro: https://x.com/pupposandro/status/2054241934164492328

@pupposandro: 2.5x faster than llama.cpp on Strix Halo. We just shipped DFlash + PFlash for the AMD Ryzen AI MAX+ 395 iGPU (gfx1151, …

Scrambling to max StrixHalo (+NVLink dual eGPU 3090 mod)

Submit Feedback

Similar Articles

xdna-top: unified NPU+iGPU terminal monitor for Strix Halo (Ryzen AI Max) — finally see the NPU work

ROCm 7.13 nightly adds strix halo optimizations
AMD's ROCm 7.13 tech preview adds optimizations for Strix Halo (Ryzen AI Max 300) and open-sources the ROCprof Trace Decoder.

@pupposandro: https://x.com/pupposandro/status/2054241934164492328

@pupposandro: 2.5x faster than llama.cpp on Strix Halo. We just shipped DFlash + PFlash for the AMD Ryzen AI MAX+ 395 iGPU (gfx1151, …

Scrambling to max StrixHalo (+NVLink dual eGPU 3090 mod)