npu

#npu

Big News for AMD / Strix Halo+ Owners

Reddit r/LocalLLaMA ↗ · 6d ago

The NPU on AMD Strix Halo devices is now usable for AI inference, enabling hybrid mode that combines NPU and iGPU for faster prompt processing. Tools like Lemonade and AMD's ROCm software make this possible.

0 favorites 0 likes

#npu

Reverse Engineering the Qualcomm NPU Compiler

Lobsters Hottest ↗ · 2026-06-20 Cached

Reverse engineering the Qualcomm NPU compiler reveals undocumented VTCM memory management, MILP-based placement, automatic precision alteration, and a hidden analytical simulator (Hextimate) for edge deployment optimization.

0 favorites 0 likes

#npu

@googlegemma: Gemma 4 E2B goes super fast on Intel AI PCs thanks to LiteRT NPU support on OpenVINO! 1.3x faster prefill performance o…

X AI KOLs Timeline ↗ · 2026-06-16 Cached

Gemma 4 E2B achieves 1.3x faster prefill and 2.8x better performance-per-watt on Intel AI PCs using OpenVINO with LiteRT NPU support, enabling efficient background LLM tasks.

0 favorites 0 likes

#npu

xdna-top: unified NPU+iGPU terminal monitor for Strix Halo (Ryzen AI Max) — finally see the NPU work

Reddit r/LocalLLaMA ↗ · 2026-06-11

xdna-top is a terminal monitor that shows both NPU and iGPU activity on Ryzen AI Max/Strix Halo systems, providing an honest view of NPU counter deltas instead of fake utilization percentages.

0 favorites 0 likes

#npu

Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite

arXiv cs.CL ↗ · 2026-06-11 Cached

This paper presents the first end-to-end RAG pipeline running entirely on a mobile NPU (Qualcomm Hexagon on Snapdragon X Elite), achieving up to 18x faster LLM prefilling and 4x lower energy vs. CPU, with no quality regression.

0 favorites 0 likes

#npu

Sipeed's K3 RISC-V SBCs can run 30B-parameter LLMs 60 TOPS (INT4), Supports BF16/FP16/INT4

Reddit r/LocalLLaMA ↗ · 2026-05-13

Sipeed's new K3 RISC-V single-board computers feature 32GB LPDDR5 and a 60 TOPS NPU, enabling local inference of large language models at up to 15 tokens per second.

0 favorites 0 likes

#npu

Getting peak TOPS on a Ryzen AI 7 350 NPU

Lobsters Hottest ↗ · 2026-05-08 Cached

A technical deep-dive into achieving peak TOPS performance on the AMD Ryzen AI 7 350 NPU, comparing it to Xilinx AIE-ML v2 AI engines and explaining the hardware architecture for matrix multiplication workloads.

0 favorites 0 likes

#npu

@agupta: some ideas are much clearer when you can use coding agents to show a proof of concept. eg I hadn’t really understood ho…

X AI KOLs Following ↗ · 2026-04-20 Cached

A tweet highlights how coding agents can clarify complex ideas, using GPU vs NPU memory competition on devices as an example demonstrated through code.

0 favorites 0 likes

npu

Submit Feedback