@bo_wangbo: We causally trained a lot of SOTA search models internally, shall we make some small release from time to time
Summary
暗示即将以低调方式发布一个强大的开源多语言ColBERT搜索模型。
View Cached Full Text
Cached at: 05/19/26, 08:45 AM
We causally trained a lot of SOTA search models internally, shall we make some small release from time to time 🤣🤣
Antoine Chaffin (@antoine_chaffin): @bo_wangbo stealth releasing probably the strongest open multilingual ColBERT (and it’s an encoder-based one 🫶)
Very happy to see this, I’ve played with @perplexity_ai’s Qwen based encoder in PyLate, it’s really cool to see it works just with
trust_remote_code=True!
Similar Articles
@bo_wangbo: okay maybe it's a good time? We have a small colbert model trained at pplx, it is a continue-training of pplx-embed-0.6…
Perplexity AI releases pplx-embed-v1-late-0.6b, a small ColBERT late-interaction embedding model for retrieval, fine-tuned from their existing embedding model and optimized for MaxSim scoring, now open-source on HuggingFace.
@antoine_chaffin: Reason-ModernColBERT nearly solved BrowseComp-Plus, smashing SOTA and outperforming models models 54× bigger Not bad fo…
Reason-ModernColBERT achieves near-perfect results on BrowseComp-Plus, surpassing SOTA and models 54× larger, then Agent-ModernColBERT further improves with minimal training.
@raphaelsrty: We're releasing LateOn and DenseOn today. Two open retrieval models, 149M parameters each. LateOn (ColBERT, multi-vecto…
Raphael released two open-source retrieval models, LateOn (ColBERT multi-vector) and DenseOn (single-vector), each 149M parameters and outperforming 4× larger models on BEIR.
@Honcia13: Open-source TTS is going crazy! New weapons for industrial park scams? Tsinghua OpenBMB just released VoxCPM2: 20 billion parameters + 2 million hours of multilingual data training, 48kHz studio-quality sound! The most intense part is—no Tokenizer needed at all, performing diffusion autoregression directly in continuous latent space, maximizing detail retention!
Tsinghua University's OpenBMB has released VoxCPM2, an open-source multilingual TTS model with 20 billion parameters. It supports continuous latent space diffusion autoregressive generation without a Tokenizer, offering 48kHz studio-quality audio and powerful voice cloning and design capabilities.
@lxfater: NetEase Youdao open-sourced ZiYue 4 model, within 27B parameters, SOTA in math and science. But what really interests me is its voice feature!! Cloning a voice is nothing new, ElevenLabs could do it long ago. But they all share a common flaw: cross-language accent. Take your Chinese voice and use it to speak Japanese — it has a Chinese accent, you can tell it's a foreigner struggling...
NetEase Youdao open-sourced the ZiYue 4 model with 27B parameters, achieving SOTA in math and science; its voice feature supports 3-second cross-language voice cloning across 14 languages with no accent issue, along with open-sourcing the all-scenario intelligent agent 'Longxia' (Lobster).