@antoine_chaffin: Whether you are GPU poor or GPU rich, today's release of PyLate has something for you! GPU maxxers: MaxSim kernels grea…
Summary
The release of PyLate introduces MaxSim kernels for GPU-accelerated training with lower memory requirements and TACHIOM for fast multi-vector indexing and search on CPU.
View Cached Full Text
Cached at: 06/11/26, 07:41 PM
Whether you are GPU poor or GPU rich, today’s release of PyLate has something for you! GPU maxxers: MaxSim kernels greatly speed up training while lowering the memory requirements CPU enjoyers: TACHIOM enables lightning fast multi-vector indexing and search directly on CPU https://t.co/GJ3HGtWZws
Similar Articles
@ErikKaum: Releasing my first kernel on @huggingface: MaxSim Late-interaction retrieval (ColBERT / PyLate) bottlenecks on material…
Releases a kernel on Hugging Face that accelerates MaxSim late-interaction retrieval by using tiled scoring with SIMD group matrix operations (Metal and WMMA), achieving 3–5× speedup over the naive implementation.
@raphaelsrty: Computing max similarity (scoring step of colbert, colpali) on gpus can be optimized and this is what @tonywu_71 did. I…
Tony Wu released late-interaction-kernels (LIK): fused Triton kernels for MaxSim, the scoring step behind ColBERT and ColPali, integrated into PyLate and colpali-engine, offering memory efficiency and performance gains.
@cosimorulli1: Happy to share that our recent work, TACHIOM, got integrated into the PyLate ecosystem! https://arxiv.org/pdf/2604.2814…
TACHIOM, a multivector retrieval system with token-aware clustering and hierarchical indexing, has been integrated into the PyLate ecosystem. It achieves up to 247x faster clustering and 9.8x retrieval speedup over state-of-the-art systems while maintaining comparable effectiveness.
@charles_irl: New articles in the GPU Glossary for CuTe DSL, CUTLASS, and CuTe -- the tools used to write some of the highest-perform…
New articles in the GPU Glossary cover CuTe DSL, CUTLASS, and CuTe – tools for writing high-performance GPU kernels on data center GPUs, with examples in Python.
Block-sparse GPU kernels
OpenAI releases block-sparse GPU kernels, a tool for efficient sparse matrix multiplication on GPUs that reduces computation and memory requirements for neural network operations.