RAG on Snapdragon X2 Laptop, 200K documents.

Reddit r/LocalLLaMA 05/15/26, 09:02 PM Tools

rag snapdragon-x2 vecml ai-pc on-device-ai ai-database qualcomm

Summary

VecML demonstrates its AI-PC software running RAG on 200K documents using the new Snapdragon X2 laptop, achieving low-token and low-memory retrieval. The software integrates multiple database functions into one platform, and controlled testing for macOS is now open.

Qualcomm recently released the new 𝐒𝐧𝐚𝐩𝐝𝐫𝐚𝐠𝐨𝐧 𝐗2 𝐥𝐚𝐩𝐭𝐨𝐩 𝐜𝐡𝐢𝐩𝐬𝐞𝐭. I immediately ordered one: ASUS Zenbook A16 16" 3K OLED Touchscreen Laptop — Snapdragon X2 Elite Extreme (2026) A few things I really like about this machine: 1. 𝐄𝐱𝐭𝐫𝐞𝐦𝐞𝐥𝐲 𝐥𝐢𝐠𝐡𝐭. Recently, I carried it single-handedly across Hong Kong Airport from customs all the way to Gate G46 while still running programs before boarding. I felt I was holding a big cell phone. 2. 𝐕𝐞𝐫𝐲 𝐩𝐨𝐫𝐭𝐚𝐛𝐥𝐞 𝐩𝐨𝐰𝐞𝐫 𝐚𝐝𝐚𝐩𝐭𝐨𝐫. Compared to the heavy power brick required by RTX laptops, the adaptor is dramatically lighter. Nevertheless, its power consumption still exceeds the in-flight charging limit on United. 3. 𝐒𝐭𝐫𝐨𝐧𝐠 𝐍𝐏𝐔 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞. When the NPU is properly utilized, performance is good. For example, embedding/indexing speed reaches roughly 50% of an RTX 5060 laptop, while operating in a much lighter and quieter form factor. The attached video demonstrates VecML’s AI-PC software running on this laptop. 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬: • 𝐌𝐚𝐬𝐬𝐢𝐯𝐞 𝐝𝐨𝐜𝐮𝐦𝐞𝐧𝐭 𝐜𝐨𝐥𝐥𝐞𝐜𝐭𝐢𝐨𝐧: \~200,000 files being indexed (\~100,000 completed in this run) • 𝐋𝐨𝐰-𝐭𝐨𝐤𝐞𝐧 𝐫𝐞𝐭𝐫𝐢𝐞𝐯𝐚𝐥: only \~1200 retrieval tokens used in this experiment • 𝐋𝐨𝐰-𝐦𝐞𝐦𝐨𝐫𝐲 𝐑𝐀𝐆: most data offloaded to disk with only a 128-shard active buffer • 𝐅𝐚𝐬𝐭 𝐚𝐧𝐝 𝐚𝐜𝐜𝐮𝐫𝐚𝐭𝐞 𝐑𝐀𝐆 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞 𝐨𝐧-𝐝𝐞𝐯𝐢𝐜𝐞 𝐁𝐞𝐡𝐢𝐧𝐝 𝐭𝐡𝐞 𝐬𝐜𝐞𝐧𝐞𝐬, 𝐕𝐞𝐜𝐌𝐋’𝐬 𝐚𝐥𝐥-𝐢𝐧-𝐨𝐧𝐞 𝐀𝐈 𝐝𝐚𝐭𝐚𝐛𝐚𝐬𝐞 𝐩𝐥𝐚𝐲𝐬 𝐚 𝐤𝐞𝐲 𝐫𝐨𝐥𝐞. Enterprise-scale AI systems typically require multiple databases working together: • Vector database • Graph database • Relational database • Key-value store • Search database • Document database We developed an in-house AI database platform that integrates the core functionality of all six systems into a unified architecture for enterprise AI and agent systems. This enables joint optimization across indexing, retrieval, graph traversal, storage, and memory management, helping achieve low-token, low-memory, fast, and accurate AI systems on both cloud and AI-PC deployments. The demo shown here runs on a Snapdragon X2 Windows laptop. 𝐎𝐮𝐫 𝐦𝐚𝐜𝐎𝐒 𝐀𝐈-𝐏𝐂 𝐬𝐨𝐟𝐭𝐰𝐚𝐫𝐞 𝐢𝐬 𝐧𝐨𝐰 𝐨𝐩𝐞𝐧 𝐟𝐨𝐫 𝐜𝐨𝐧𝐭𝐫𝐨𝐥𝐥𝐞𝐝 𝐭𝐞𝐬𝐭𝐢𝐧𝐠.

Original Article

RAG on Snapdragon X2 Laptop, 200K documents.

Similar Articles

Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite

Radxa Dragon Q8B: A Laptop Cosplaying as an SBC?

@PrajwalTomar_: https://x.com/PrajwalTomar_/status/2069409824824316060

@techwith_ram: A 10M document corpus eats 31 GB of RAM as float32 Most teams hit that wall & reach for a managed vector database. $400…

Submit Feedback

Similar Articles

Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite

@vintcessun: Feeding too many documents into RAG causes retrieval quality to drop from 75% to 40%? Vector search is diluted by a large amount of irrelevant content, causing a sharp drop in hit rate in real deployment. Root cause: heterogeneous documents are retrieved together, noise drowns out signal. Multi-agent orchestration seems intelligent but actually introduces a precision-fidelity paradox—poor configuration leads to failure in both aspects. The paper proposes MA…

Radxa Dragon Q8B: A Laptop Cosplaying as an SBC?

@PrajwalTomar_: https://x.com/PrajwalTomar_/status/2069409824824316060

@techwith_ram: A 10M document corpus eats 31 GB of RAM as float32 Most teams hit that wall & reach for a managed vector database. $400…