@Andy_ShuoYang: FlashLib update: we now support ANN search with IVF-Flat — up to 6.5× faster than cuVS on real-world vector workloads (…

X AI KOLs Following 06/03/26, 07:11 PM Tools

vector-search ann ivf-flat flashlib performance open-source

Summary

FlashLib updates to support ANN search with IVF-Flat, achieving up to 6.5× faster performance than cuVS on real-world vector workloads. LEANN now integrates FlashLib as a backend, offering substantial speedups in build and search operations.

FlashLib update: we now support ANN search with IVF-Flat — up to 6.5× faster than cuVS on real-world vector workloads (SIFT-1M) while matching recall. LEANN now supports FlashLib as a backend: 26× faster build, 29× faster single-query, and 298× faster batch search. Huge thanks to @YichuanM for the help! We’re also opening Discord / Slack channels — join us to suggest new operators you want to see, and hardware backends you want FlashLib to support next! Slack: https://join.slack.com/t/flashml/shared_invite/zt-3zpdh5j10-9dwTXrgLiqpVxizhA9KVbA… Discord: https://discord.gg/ce5Xa5pf

Original Article

View Cached Full Text

Cached at: 06/05/26, 05:11 AM

FlashLib update: we now support ANN search with IVF-Flat — up to 6.5× faster than cuVS on real-world vector workloads (SIFT-1M) while matching recall.

LEANN now supports FlashLib as a backend: 26× faster build, 29× faster single-query, and 298× faster batch search. Huge thanks to @YichuanM for the help!

We’re also opening Discord / Slack channels — join us to suggest new operators you want to see, and hardware backends you want FlashLib to support next!

Slack: https://join.slack.com/t/flashml/shared_invite/zt-3zpdh5j10-9dwTXrgLiqpVxizhA9KVbA…

Discord: https://discord.gg/ce5Xa5pf

Similar Articles

@Andy_ShuoYang: Flash-KMeans was only the beginning. Today, from the Flash-KMeans team, we are releasing FlashLib — a GPU library for f…

X AI KOLs Following

The Flash-KMeans team releases FlashLib, a GPU library for classical ML operators that achieves up to 208x speedups over cuML on Hopper GPUs, with a focus on fast, predictable performance for agentic AI workloads.

@neural_avb: Deep learning bros and sisters, don't sleep on this. You can cluster millions of documents in embedding space, mass-ann…

X AI KOLs Timeline

Shuo Yang and team release FlashLib, a GPU library that accelerates classical ML operators like KMeans, KNN, HDBSCAN, PCA, and t-SNE, claiming speedups up to 208x.

@pupposandro: 2.5x faster than llama.cpp on Strix Halo. We just shipped DFlash + PFlash for the AMD Ryzen AI MAX+ 395 iGPU (gfx1151, …

X AI KOLs Following

A new toolset (DFlash + PFlash) achieves 2.5x faster inference than llama.cpp on AMD Ryzen AI MAX+ 395 iGPU, demonstrating significant speedups for Qwen3.6-27B with 128 GiB unified memory.

@davideciffa: Huge thanks to @csujun, now Luce DFlash is 10-15% faster, by implementing per-layer K/V truncation in the draft graph f…

X AI KOLs Timeline

Luce DFlash has achieved a 10-15% speedup by implementing per-layer K/V truncation in the draft graph for SWA layers.

@vintcessun: Compressing 10 million vectors from 31GB to 4GB, with search even faster than FAISS — sounds crazy, but Turbovec actually did it. The core is Google's TurboQuant data-independent quantization: no training, no parameter tuning, just add vectors and index. Handwritten NEON/AVX-512 implementations are genuinely 12-20% faster, supporting filtered search by ID, saving a ton of post-processing hassle. Rust under the hood + pip install, minimal maintenance cost.

X AI KOLs Timeline

Turbovec, based on Google's TurboQuant algorithm, compresses 10 million vectors from 31GB to 4GB, with search speed 12-20% faster than FAISS, supports filtered search, and offers a Rust implementation with a Python package.

Similar Articles

@Andy_ShuoYang: Flash-KMeans was only the beginning. Today, from the Flash-KMeans team, we are releasing FlashLib — a GPU library for f…

@neural_avb: Deep learning bros and sisters, don't sleep on this. You can cluster millions of documents in embedding space, mass-ann…

@pupposandro: 2.5x faster than llama.cpp on Strix Halo. We just shipped DFlash + PFlash for the AMD Ryzen AI MAX+ 395 iGPU (gfx1151, …

@davideciffa: Huge thanks to @csujun, now Luce DFlash is 10-15% faster, by implementing per-layer K/V truncation in the draft graph f…

Submit Feedback