@jobergum: You know me as the BM25 guy, but embeddings are cool too. New post from the @HornetDev team just dropped. ANN tuning at…
Summary
HornetDev team published a post on tuning approximate-nearest-neighbor search at 100M scale, covering embedding bias, graph connectivity, and quantization limits.
Similar Articles
Why I stopped using semantic embeddings for tool selection and switched back to BM25 [D]
The author shares their experience switching from semantic embeddings to BM25 for tool selection in agents, finding that BM25 achieves 81% top-1 accuracy vs. 64% for embeddings on a corpus of 200 query-tool pairs, because tool descriptions are short and keyword-driven rather than semantically rich like documents.
@mixedbreadai: By now, everyone knows that single-vector embedding models are hugely limiting for modern workflows. But they contain t…
Single-vector embedding models can be used to extract sparse latent terms, and BM25 can turn this vocabulary into a strong retriever.
@philipkiely: https://x.com/philipkiely/status/2069212319746506968
Baseten announces the world's fastest API for the GLM-5.2 open model, achieving over 280 tokens per second via NVFP4 quantization, disaggregated inference, and other optimizations.
@garrytan: My newest gbrain-evals just dropped - this is how gbrain does vs other options. http://ZeroEntropy.dev is SOTA for rera…
Garry Tan released new gbrain-evals benchmarks showing ZeroEntropy.dev achieves state-of-the-art performance in reranking and embedding cost, speed, and retrieval success, beating MemPalace and Vector RAG.
@gordic_aleksa: new in-depth blog post time: Inside the Transformer: The Life of a Token a deep dive into a modern dense transformer, i…
An in-depth blog post exploring the inner workings of modern dense transformers, covering topics such as YaRN for positional information, hybrid attention for long context lengths, soft capping, QK normalization, and transformer math including FLOPs/token formulas and cluster sizing.