@bo_wangbo: We causally trained a lot of SOTA search models internally, shall we make some small release from time to time

X AI KOLs Following 05/18/26, 06:05 PM Models

search colbert multilingual open-source encoder-model model-release

Summary

暗示即将以低调方式发布一个强大的开源多语言ColBERT搜索模型。

We causally trained a lot of SOTA search models internally, shall we make some small release from time to time 🤣🤣

Original Article

View Cached Full Text

Cached at: 05/19/26, 08:45 AM

We causally trained a lot of SOTA search models internally, shall we make some small release from time to time 🤣🤣

Antoine Chaffin (@antoine_chaffin): @bo_wangbo stealth releasing probably the strongest open multilingual ColBERT (and it’s an encoder-based one 🫶)

Very happy to see this, I’ve played with @perplexity_ai’s Qwen based encoder in PyLate, it’s really cool to see it works just with trust_remote_code=True!

Similar Articles

@bo_wangbo: okay maybe it's a good time? We have a small colbert model trained at pplx, it is a continue-training of pplx-embed-0.6…

X AI KOLs Following

Perplexity AI releases pplx-embed-v1-late-0.6b, a small ColBERT late-interaction embedding model for retrieval, fine-tuned from their existing embedding model and optimized for MaxSim scoring, now open-source on HuggingFace.

@antoine_chaffin: Reason-ModernColBERT nearly solved BrowseComp-Plus, smashing SOTA and outperforming models models 54× bigger Not bad fo…

X AI KOLs Following

Reason-ModernColBERT achieves near-perfect results on BrowseComp-Plus, surpassing SOTA and models 54× larger, then Agent-ModernColBERT further improves with minimal training.

@raphaelsrty: We're releasing LateOn and DenseOn today. Two open retrieval models, 149M parameters each. LateOn (ColBERT, multi-vecto…

X AI KOLs Following

Raphael released two open-source retrieval models, LateOn (ColBERT multi-vector) and DenseOn (single-vector), each 149M parameters and outperforming 4× larger models on BEIR.

@Honcia13: Open-source TTS is going crazy! New weapons for industrial park scams? Tsinghua OpenBMB just released VoxCPM2: 20 billion parameters + 2 million hours of multilingual data training, 48kHz studio-quality sound! The most intense part is—no Tokenizer needed at all, performing diffusion autoregression directly in continuous latent space, maximizing detail retention!

X AI KOLs Timeline

Tsinghua University's OpenBMB has released VoxCPM2, an open-source multilingual TTS model with 20 billion parameters. It supports continuous latent space diffusion autoregressive generation without a Tokenizer, offering 48kHz studio-quality audio and powerful voice cloning and design capabilities.

@lxfater: NetEase Youdao open-sourced ZiYue 4 model, within 27B parameters, SOTA in math and science. But what really interests me is its voice feature!! Cloning a voice is nothing new, ElevenLabs could do it long ago. But they all share a common flaw: cross-language accent. Take your Chinese voice and use it to speak Japanese — it has a Chinese accent, you can tell it's a foreigner struggling...

X AI KOLs Timeline

NetEase Youdao open-sourced the ZiYue 4 model with 27B parameters, achieving SOTA in math and science; its voice feature supports 3-second cross-language voice cloning across 14 languages with no accent issue, along with open-sourcing the all-scenario intelligent agent 'Longxia' (Lobster).

Similar Articles

@bo_wangbo: okay maybe it's a good time? We have a small colbert model trained at pplx, it is a continue-training of pplx-embed-0.6…

@antoine_chaffin: Reason-ModernColBERT nearly solved BrowseComp-Plus, smashing SOTA and outperforming models models 54× bigger Not bad fo…

@raphaelsrty: We're releasing LateOn and DenseOn today. Two open retrieval models, 149M parameters each. LateOn (ColBERT, multi-vecto…

Submit Feedback