@bo_wangbo: 好吧，也许这是个好时机？我们在pplx训练了一个小型colbert模型，它是对pplx-embed-0.6的继续训练…

X AI KOLs Following 2026/05/18 17:07 模型

colbert embedding retrieval perplexity open-source multilingual maxsim

摘要

Perplexity AI发布了pplx-embed-v1-late-0.6b，一个用于检索的小型ColBERT后期交互嵌入模型，基于他们现有的嵌入模型微调并针对MaxSim评分进行了优化，现已在HuggingFace上开源。

好吧，也许这是个好时机？我们在pplx训练了一个小型colbert模型，它是对pplx-embed-0.6b的继续训练，因此原生支持多语言，刚刚将其开源并添加了如何使用MaxSim内核的部分：https://huggingface.co/perplexity-ai/pplx-embed-v1-late-0.6b…

查看原文

查看缓存全文

缓存时间: 2026/05/19 00:39

perplexity-ai/pplx-embed-v1-late-0.6b · Hugging Face

来源：https://huggingface.co/perplexity-ai/pplx-embed-v1-late-0.6b Perplexity 标识

pplx-embed-v1-late-0.6b：后期交互嵌入

pplx-embed-v1-late-0.6b 是一个用于检索的令牌级后期交互嵌入模型，使用 MaxSim 评分。它是在 pplx-embed-v1-0.6b（https://huggingface.co/perplexity-ai/pplx-embed-v1-0.6b）基础上继续进行训练，采用 ContrastiveLoss 以优化令牌级 MaxSim。

令牌级嵌入维度为 128，这将触发可选 erikkaum/maxsim（https://huggingface.co/kernels/erikkaum/maxsim） MaxSim 内核的快速路径。

使用

使用 PyLate（索引 + 检索）

from pylate import indexes, models, retrieve

model = models.ColBERT(
    model_name_or_path="perplexity-ai/pplx-embed-v1-late-0.6b",
    trust_remote_code=True,
)

documents_ids = ["1", "2", "3"]
documents = [
    "Scientists explore the universe driven by curiosity.",
    "Children learn through curious exploration.",
    "Historical discoveries began with curious questions.",
]

index = indexes.PLAID(
    index_folder="pylate-index",
    index_name="pplx-embed-v1-late-0.6b",
    override=True,
)
documents_embeddings = model.encode(documents, is_query=False)
index.add_documents(documents_ids=documents_ids, documents_embeddings=documents_embeddings)

retriever = retrieve.ColBERT(index=index)
queries_embeddings = model.encode(["What motivates scientific discovery?"], is_query=True)
scores = retriever.retrieve(queries_embeddings=queries_embeddings, k=3)
print(scores)

使用 erikkaum/maxsim 内核（快速 MaxSim 评分）

融合 MaxSim 用于重排序、配对评分或评估。支持 CUDA（sm_80/86/89）和 Metal（Apple Silicon）；输入 fp32/fp16/bf16，输出 fp32；仅前向。

import torch
from kernels import get_kernel
from pylate import models

device = "cuda" if torch.cuda.is_available() else "mps"
model = models.ColBERT(
    model_name_or_path="perplexity-ai/pplx-embed-v1-late-0.6b",
    trust_remote_code=True,
    device=device,
)
maxsim = get_kernel("erikkaum/maxsim", version=1, trust_remote_code=True)

q_emb = model.encode(["What motivates scientific discovery?"], is_query=True, convert_to_tensor=True)
d_emb = model.encode([
    "Scientists explore the universe driven by curiosity.",
    "Children learn through curious exploration.",
    "Historical discoveries began with curious questions.",
], is_query=False, convert_to_tensor=True)

# 填充为 [B=1, n_candidates, Ld_max, dim] 以用于 score_candidates_padded。
Lq, dim = q_emb[0].shape
n, Ld_max = len(d_emb), max(d.shape[0] for d in d_emb)
queries_pad = q_emb[0].unsqueeze(0).to(device, torch.float16)
documents_pad = torch.zeros(1, n, Ld_max, dim, device=device, dtype=torch.float16)
for i, d in enumerate(d_emb):
    documents_pad[0, i, : d.shape[0]] = d.to(device, torch.float16)
query_lengths = torch.tensor([Lq], dtype=torch.int32, device=device)
doc_lengths = torch.tensor([[d.shape[0] for d in d_emb]], dtype=torch.int32, device=device)

scores = maxsim.score_candidates_padded(queries_pad, documents_pad, query_lengths, doc_lengths)
print(scores[0].tolist())  # 每个候选的 fp32 分数

对于不规则变长配对评分（评估、蒸馏、难负样本挖掘），请改用 maxsim.score_pairs_packed(...) —— 打包 API 详情请参见内核卡片。

性能

我们在两个标准后期交互检索测试集上评估 pplx-embed-v1-late-0.6b，并报告平均 nDCG@10：

BEIR—— 15 个英文检索任务的平均值。
MIRACL—— 18 种语言的平均值。

基准	pplx-embed-v1-late-0.6b	参考
BEIR（15 个任务）	56.61	colbert-zero：55.43
MIRACL（18 种语言）	66.62	jina-colbert-v2：62.28

技术细节

该模型使用后期交互：查询和文档被编码为令牌级向量，并使用 MaxSim 进行评分，而不是池化为单个向量。

关于基础嵌入族的背景信息，请参见 pplx-embed-v1-0.6b（https://huggingface.co/perplexity-ai/pplx-embed-v1-0.6b）模型卡片和技术报告：https://arxiv.org/abs/2602.11151。

Erik Kaunismäki (@ErikKaum)： 在 @huggingface 上发布我的第一个内核： MaxSim

后期交互检索（ColBERT / PyLate）的瓶颈在于物化完整的相似度矩阵。该内核通过使用 simdgroup_matrix（Metal）和 WMMA 进行分块评分来避免这一点。

与朴素实现相比，速度提升 3–5 倍。

@bo_wangbo: 好吧，也许这是个好时机？我们在pplx训练了一个小型colbert模型，它是对pplx-embed-0.6的继续训练…

perplexity-ai/pplx-embed-v1-late-0.6b · Hugging Face

使用

性能

技术细节

相似文章

@AmelieTabatta: ColBERT 模型继续让体积为其 54 倍的模型颜面扫地，这就是我们信任 Late Interaction @LightOnIO 的原因。一条 1 年…

@bo_wangbo：我们在内部随意训练了很多SOTA搜索模型，要不要时不时搞个小发布？

@LightOnIO：Reason-ModernColBERT 仅凭 149M 参数便在 BrowseComp-Plus 中拔得头筹。如今，Agent-ModernColBERT 在此基础上又提升了约 10%。达到…

@liquidai: 介绍 LFM2.5-Embedding-350M 和 LFM2.5-ColBERT-350M：两款为超快且精准的多语言检索模型

@lateinteraction：对于某些晚期交互来说，永远都不会太晚——太酷了 @sirupsen @turbopuffer！

提交意见反馈