Tag
The NPU on AMD Strix Halo devices is now usable for AI inference, enabling hybrid mode that combines NPU and iGPU for faster prompt processing. Tools like Lemonade and AMD's ROCm software make this possible.
Reverse engineering the Qualcomm NPU compiler reveals undocumented VTCM memory management, MILP-based placement, automatic precision alteration, and a hidden analytical simulator (Hextimate) for edge deployment optimization.
Gemma 4 E2B achieves 1.3x faster prefill and 2.8x better performance-per-watt on Intel AI PCs using OpenVINO with LiteRT NPU support, enabling efficient background LLM tasks.
xdna-top is a terminal monitor that shows both NPU and iGPU activity on Ryzen AI Max/Strix Halo systems, providing an honest view of NPU counter deltas instead of fake utilization percentages.
This paper presents the first end-to-end RAG pipeline running entirely on a mobile NPU (Qualcomm Hexagon on Snapdragon X Elite), achieving up to 18x faster LLM prefilling and 4x lower energy vs. CPU, with no quality regression.
Sipeed's new K3 RISC-V single-board computers feature 32GB LPDDR5 and a 60 TOPS NPU, enabling local inference of large language models at up to 15 tokens per second.
A technical deep-dive into achieving peak TOPS performance on the AMD Ryzen AI 7 350 NPU, comparing it to Xilinx AIE-ML v2 AI engines and explaining the hardware architecture for matrix multiplication workloads.
A tweet highlights how coding agents can clarify complex ideas, using GPU vs NPU memory competition on devices as an example demonstrated through code.