RAG on Snapdragon X2 Laptop, 200K documents.

Reddit r/LocalLLaMA Tools

Summary

VecML demonstrates its AI-PC software running RAG on 200K documents using the new Snapdragon X2 laptop, achieving low-token and low-memory retrieval. The software integrates multiple database functions into one platform, and controlled testing for macOS is now open.

Qualcomm recently released the new ๐’๐ง๐š๐ฉ๐๐ซ๐š๐ ๐จ๐ง ๐—2 ๐ฅ๐š๐ฉ๐ญ๐จ๐ฉ ๐œ๐ก๐ข๐ฉ๐ฌ๐ž๐ญ. I immediately ordered one: ASUS Zenbook A16 16" 3K OLED Touchscreen Laptop โ€” Snapdragon X2 Elite Extreme (2026) A few things I really like about this machine: 1. ๐„๐ฑ๐ญ๐ซ๐ž๐ฆ๐ž๐ฅ๐ฒ ๐ฅ๐ข๐ ๐ก๐ญ. Recently, I carried it single-handedly across Hong Kong Airport from customs all the way to Gate G46 while still running programs before boarding. I felt I was holding a big cell phone. 2. ๐•๐ž๐ซ๐ฒ ๐ฉ๐จ๐ซ๐ญ๐š๐›๐ฅ๐ž ๐ฉ๐จ๐ฐ๐ž๐ซ ๐š๐๐š๐ฉ๐ญ๐จ๐ซ. Compared to the heavy power brick required by RTX laptops, the adaptor is dramatically lighter. Nevertheless, its power consumption still exceeds the in-flight charging limit on United. 3. ๐’๐ญ๐ซ๐จ๐ง๐  ๐๐๐” ๐ฉ๐ž๐ซ๐Ÿ๐จ๐ซ๐ฆ๐š๐ง๐œ๐ž. When the NPU is properly utilized, performance is good. For example, embedding/indexing speed reaches roughly 50% of an RTX 5060 laptop, while operating in a much lighter and quieter form factor. The attached video demonstrates VecMLโ€™s AI-PC software running on this laptop. ๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ: โ€ข ๐Œ๐š๐ฌ๐ฌ๐ข๐ฏ๐ž ๐๐จ๐œ๐ฎ๐ฆ๐ž๐ง๐ญ ๐œ๐จ๐ฅ๐ฅ๐ž๐œ๐ญ๐ข๐จ๐ง: \~200,000 files being indexed (\~100,000 completed in this run) โ€ข ๐‹๐จ๐ฐ-๐ญ๐จ๐ค๐ž๐ง ๐ซ๐ž๐ญ๐ซ๐ข๐ž๐ฏ๐š๐ฅ: only \~1200 retrieval tokens used in this experiment โ€ข ๐‹๐จ๐ฐ-๐ฆ๐ž๐ฆ๐จ๐ซ๐ฒ ๐‘๐€๐†: most data offloaded to disk with only a 128-shard active buffer โ€ข ๐…๐š๐ฌ๐ญ ๐š๐ง๐ ๐š๐œ๐œ๐ฎ๐ซ๐š๐ญ๐ž ๐‘๐€๐† ๐ฉ๐ž๐ซ๐Ÿ๐จ๐ซ๐ฆ๐š๐ง๐œ๐ž ๐จ๐ง-๐๐ž๐ฏ๐ข๐œ๐ž ๐๐ž๐ก๐ข๐ง๐ ๐ญ๐ก๐ž ๐ฌ๐œ๐ž๐ง๐ž๐ฌ, ๐•๐ž๐œ๐Œ๐‹โ€™๐ฌ ๐š๐ฅ๐ฅ-๐ข๐ง-๐จ๐ง๐ž ๐€๐ˆ ๐๐š๐ญ๐š๐›๐š๐ฌ๐ž ๐ฉ๐ฅ๐š๐ฒ๐ฌ ๐š ๐ค๐ž๐ฒ ๐ซ๐จ๐ฅ๐ž. Enterprise-scale AI systems typically require multiple databases working together: โ€ข Vector database โ€ข Graph database โ€ข Relational database โ€ข Key-value store โ€ข Search database โ€ข Document database We developed an in-house AI database platform that integrates the core functionality of all six systems into a unified architecture for enterprise AI and agent systems. This enables joint optimization across indexing, retrieval, graph traversal, storage, and memory management, helping achieve low-token, low-memory, fast, and accurate AI systems on both cloud and AI-PC deployments. The demo shown here runs on a Snapdragon X2 Windows laptop. ๐Ž๐ฎ๐ซ ๐ฆ๐š๐œ๐Ž๐’ ๐€๐ˆ-๐๐‚ ๐ฌ๐จ๐Ÿ๐ญ๐ฐ๐š๐ซ๐ž ๐ข๐ฌ ๐ง๐จ๐ฐ ๐จ๐ฉ๐ž๐ง ๐Ÿ๐จ๐ซ ๐œ๐จ๐ง๐ญ๐ซ๐จ๐ฅ๐ฅ๐ž๐ ๐ญ๐ž๐ฌ๐ญ๐ข๐ง๐ .
Original Article

Similar Articles

@vintcessun: Feeding too many documents into RAG causes retrieval quality to drop from 75% to 40%? Vector search is diluted by a large amount of irrelevant content, causing a sharp drop in hit rate in real deployment. Root cause: heterogeneous documents are retrieved together, noise drowns out signal. Multi-agent orchestration seems intelligent but actually introduces a precision-fidelity paradoxโ€”poor configuration leads to failure in both aspects. The paper proposes MAโ€ฆ

X AI KOLs Timeline

This paper identifies 'vector search dilution' in RAG systems when scaling to large heterogeneous document collections, where accuracy dropped from 75% to 40% in a real-world deployment. The proposed MASDR-RAG method uses domain scoping via organizational metadata before retrieval, improving P@10 from 0.77 to 0.86 with low cost and easy deployment.

Radxa Dragon Q8B: A Laptop Cosplaying as an SBC?

Hacker News Top

Radxa announces the Dragon Q8B single-board computer powered by a Qualcomm Snapdragon 8cx Gen 3 SoC, with up to 32GB RAM. Early benchmarks show it outperforming the Raspberry Pi 5, though software is still maturing.