LiquidAI/LFM2.5-Embedding-350M

Hugging Face Models Trending 2026/05/05 15:00 模型

embedding retrieval multilingual liquid-ai rag open-source

摘要

Liquid AI 发布了 LFM2.5-Embedding-350M，这是一种密集双编码器，用于多语言检索，支持11种语言，可作为 RAG 流水线的直接替代方案。

任务：句子相似度标签：sentence-transformers, safetensors, lfm2, liquid, lfm2.5, edge, 句子相似度, 特征提取, 自定义代码, en, es, de, fr, it, pt, ar, sv, no, ja, ko, arxiv:2511.23404, base_model:LiquidAI/LFM2.5-350M-Base, base_model:finetune:LiquidAI/LFM2.5-350M-Base, 许可证：其他, 端点兼容, 区域：us

查看原文

查看缓存全文

缓存时间: 2026/06/20 14:21

LiquidAI/LFM2.5-Embedding-350M · Hugging Face

来源：https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M Liquid AI

我们发布了两个新的同类最佳多语言检索模型：

LFM2.5-Embedding-350M — 一个密集双编码器，每个文档一个向量。索引最小、速度最快。
LFM2.5-ColBERT-350M (https://huggingface.co/LiquidAI/LFM2.5-ColBERT-350M) — 一个延迟交互模型。每个token一个向量，通过 MaxSim 匹配。准确率更高、泛化能力更强，但索引规模更大。

两个模型均为 3.5 亿参数，是 LFM 家族首批双向成员，基于 LFM2.5-350M-Base (https://huggingface.co/LiquidAI/LFM2.5-350M-Base) 构建。它们可作为您现有 RAG 管道的即插即用替代，面向 11 种语言实现快速、低成本且可靠的多语言/跨语言搜索。

关于双向架构和训练方法的更多细节，请参见我们的博文 (https://www.liquid.ai/blog/lfm2-5-retrievers)。

bienc (https://cdn-uploads.huggingface.co/production/uploads/63f389fda096536aeaae0a66/LjpFnq59BbuhKLVTExtcU.png)

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#%F0%9F%93%84-model-details📄 模型详情

属性	LFM2.5-Embedding-350M	LFM2.5-ColBERT-350M (https://huggingface.co/LiquidAI/LFM2.5-ColBERT-350M)
类型	密集双编码器（单向量）	延迟交互（逐 token 向量）
总参数量	~354M	~353M
骨干网络	LFM2.5-350M-Base (https://huggingface.co/LiquidAI/LFM2.5-350M-Base) + 双向补丁	LFM2.5-350M-Base (https://huggingface.co/LiquidAI/LFM2.5-350M-Base) + 双向补丁
层数	17（10 卷积 + 6 注意力 + 1 池化）	17（10 卷积 + 6 注意力 + 1 密集）
词表大小	65,536	64,402
输出	1024 维 CLS 向量	每 token 128 维
相似度	余弦相似度	MaxSim
训练精度	BF16	BF16
许可证	LFM Open License v1.0	LFM Open License v1.0

文档长度： 512 tokens

支持语言： 英语、西班牙语、德语、法语、意大利语、葡萄牙语、阿拉伯语、瑞典语、挪威语、日语、韩语。

架构：

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: Lfm2BidirectionalModel
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False})
)

非对称提示词： 查询用 query:，段落用 document:。它们存储在模型配置中，通过 prompt_name 自动应用。

我们推荐 LFM2.5-Embedding-350M 和 LFM2.5-ColBERT-350M 用于短上下文检索场景，例如：

电子商务：通过大规模语义搜索，跨多种语言查找商品。
FAQ 和知识库支持：在面向客户的界面中可靠地检索正确答案。
设备端语义搜索：在消费级硬件上本地搜索文件、邮件和笔记。
企业知识助手：跨语言检索内部法律、金融和技术文档。

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#%F0%9F%8F%83-how-to-run🏃 如何运行

首先，安装 sentence-transformers：

pip install -U sentence-transformers

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#encoding-queries-and-documents编码查询和文档

加载 LFM2.5-Embedding-350M，分别编码查询和文档，每侧使用匹配的提示词名称。余弦相似度（或归一化点积）对文档进行排序：

from sentence_transformers import SentenceTransformer

# 加载模型（trust_remote_code 应用双向补丁）
model = SentenceTransformer(
    "LiquidAI/LFM2.5-Embedding-350M",
    trust_remote_code=True,
)

queries = [
    "What is the capital of France?",
    "Which city is Japan's capital?",
]
documents = [
    "Paris is the capital and largest city of France. Located on the Seine River in northern France, it serves as the country's political, economic, and cultural center.",
    "Tokyo, officially the Tokyo Metropolis, is the capital of Japan. It is the most populous metropolitan area in the world and serves as Japan's administrative, financial, and commercial hub.",
    "Berlin is the capital and largest city of Germany. Reunified in 1990 after the fall of the Berlin Wall, it now serves as a major cultural and political center in Europe.",
]

# 使用匹配的提示词名称进行编码；归一化后点积等于余弦相似度
q_emb = model.encode(queries,   prompt_name="query",    normalize_embeddings=True)
d_emb = model.encode(documents, prompt_name="document", normalize_embeddings=True)

scores = q_emb @ d_emb.T  # 形状: (n_queries, n_documents)

始终为查询传递 prompt_name="query"，为段落传递 prompt_name="document"——模型是用这些前缀训练的，省略它们会无声地降低检索质量。

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#flash-attention-2-optionalFlash Attention 2（可选）

LFM2.5-Embedding-350M 可配合 FlashAttention-2 运行（需要安装 flash-attn）：

import torch
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "LiquidAI/LFM2.5-Embedding-350M",
    trust_remote_code=True,
    model_kwargs={"attn_implementation": "flash_attention_2", "dtype": torch.bfloat16},
)

经验证，在 bf16 噪声范围内与默认实现等价（多语言 NanoBEIR ndcg@10 在 11 种语言间差异小于 0.002）。在模型 512 token 最大长度下，速度提升较小（约 5%）；如果对骨干网络进行微调或运行更长上下文，FA2 主要有助于显存和吞吐量。

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#fine-tuning微调

标准的 sentence-transformers 训练可直接使用。例如使用 MultipleNegativesRankingLoss：

from datasets import Dataset
from sentence_transformers import (
    SentenceTransformer,
    SentenceTransformerTrainer,
    SentenceTransformerTrainingArguments,
)
from sentence_transformers.losses import MultipleNegativesRankingLoss

model = SentenceTransformer("LiquidAI/LFM2.5-Embedding-350M", trust_remote_code=True)
loss = MultipleNegativesRankingLoss(model)

train_ds = Dataset.from_dict({
    "query":    [...],
    "positive": [...],
    # 可选: "negative": [...],
})

args = SentenceTransformerTrainingArguments(
    output_dir="out",
    num_train_epochs=1,
    per_device_train_batch_size=64,
    learning_rate=2e-5,
    warmup_ratio=0.1,
    bf16=True,
    prompts={"query": "query: ", "positive": "document: "},
)

trainer = SentenceTransformerTrainer(model=model, args=args, train_dataset=train_ds, loss=loss)
trainer.train()

注意事项：

训练时始终传递非对称提示词（模型是用它们训练的）。
对于更大有效批量而不出现 OOM，可将 MultipleNegativesRankingLoss 替换为 CachedMultipleNegativesRankingLoss。
使用 model.save_pretrained(...) 保存；建模文件和 auto_map 会保留，使得补丁行为在重新加载后依然生效。

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#%F0%9F%93%88-performance📈 性能

我们在每项指标上以粗体标示最佳的双编码器和最佳的延迟检索模型。

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#nanobeir-multilingual-extended–ndcg10NanoBEIR Multilingual Extended — NDCG@10

LiquidAI/nanobeir-multilingual-extended (https://huggingface.co/datasets/LiquidAI/nanobeir-multilingual-extended)。多语言检索能力。

模型	类型	avg	ar	de	en	es	fr	it	ja	ko	no	pt	sv
LiquidAI/LFM2.5-ColBERT-350M	late	0.605	0.551	0.606	0.687	0.607	0.622	0.606	0.614	0.590	0.570	0.613	0.586
LiquidAI/LFM2.5-Embedding-350M	dense	0.577	0.529	0.581	0.644	0.581	0.592	0.583	0.575	0.563	0.557	0.581	0.566
Qwen/Qwen3-Embedding-0.6B	dense	0.556	0.514	0.560	0.649	0.568	0.565	0.565	0.551	0.530	0.516	0.571	0.525
LiquidAI/LFM2-ColBERT-350M	late	0.540	0.491	0.563	0.661	0.563	0.564	0.543	0.557	0.527	0.449	0.547	0.480
Alibaba-NLP/gte-multilingual-base	dense	0.528	0.477	0.523	0.624	0.537	0.542	0.528	0.511	0.494	0.516	0.534	0.526
lightonai/GTE-ModernColBERT-v1	late	0.489	0.309	0.499	0.680	0.525	0.546	0.516	0.459	0.368	0.465	0.530	0.483
lightonai/LateOn	late	0.484	0.307	0.505	0.690	0.531	0.537	0.514	0.442	0.326	0.465	0.533	0.475
lightonai/DenseOn	dense	0.432	0.178	0.474	0.676	0.496	0.520	0.487	0.378	0.197	0.422	0.493	0.433
Alibaba-NLP/gte-modernbert-base	dense	0.383	0.112	0.449	0.666	0.448	0.475	0.408	0.275	0.180	0.376	0.431	0.391
BAAI/bge-large-en-v1.5	dense	0.359	0.059	0.419	0.642	0.445	0.475	0.431	0.198	0.132	0.358	0.434	0.353

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#mkqa-11–recall20MKQA-11 — Recall@20

MKQA (https://github.com/apple/ml-mkqa)。跨语言能力（我们目标语言子集）。

模型	类型	avg	ar	de	en	es	fr	it	ja	ko	no	pt	sv
LiquidAI/LFM2.5-ColBERT-350M	late	0.694	0.608	0.709	0.748	0.711	0.715	0.707	0.703	0.640	0.689	0.703	0.700
LiquidAI/LFM2.5-Embedding-350M	dense	0.691	0.610	0.709	0.738	0.708	0.715	0.703	0.685	0.630	0.691	0.710	0.708
Alibaba-NLP/gte-multilingual-base	dense	0.675	0.567	0.692	0.741	0.705	0.703	0.697	0.655	0.563	0.698	0.700	0.699
LiquidAI/LFM2-ColBERT-350M	late	0.646	0.554	0.696	0.754	0.711	0.710	0.667	0.658	0.558	0.541	0.669	0.589
Qwen/Qwen3-Embedding-0.6B	dense	0.638	0.520	0.671	0.723	0.678	0.672	0.671	0.635	0.543	0.620	0.667	0.620
lightonai/GTE-ModernColBERT-v1	late	0.459	0.092	0.532	0.754	0.552	0.615	0.510	0.275	0.166	0.503	0.524	0.524
lightonai/LateOn	late	0.454	0.157	0.492	0.755	0.537	0.577	0.481	0.316	0.209	0.472	0.502	0.501
lightonai/DenseOn	dense	0.435	0.165	0.482	0.751	0.491	0.553	0.457	0.325	0.222	0.438	0.443	0.453
BAAI/bge-large-en-v1.5	dense	0.413	0.133	0.471	0.748	0.450	0.531	0.461	0.208	0.172	0.456	0.443	0.467
Alibaba-NLP/gte-modernbert-base	dense	0.295	0.060	0.333	0.736	0.273	0.417	0.291	0.100	0.052	0.332	0.326	0.330

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#inference-speed—llamacpp推理速度 — llama.cpp

在MacBook Pro M4 Max上通过llama.cpp以fp16测量端到端延迟，32 token 查询和256 token 文档。Docs cached表示文档嵌入已预计算并从索引中查找。

模型	阶段	文档缓存	p50	p95
LFM2.5-Embedding-350M	查询嵌入	是	7.3 ms	9.6 ms
LFM2.5-ColBERT-350M	查询嵌入	是	8.1 ms	8.5 ms
LFM2.5-ColBERT-350M	查询嵌入 + MaxSim	是	8.2 ms	15.2 ms
LFM2.5-ColBERT-350M	查询嵌入 + 文档嵌入 + MaxSim	否	34.3 ms	36.3 ms

两个模型 LiquiAI/LFM2.5-ColBERT-350M-GGUF (https://huggingface.co/LiquidAI/LFM2.5-ColBERT-350M-GGUF/) 和 LiquidAI/LFM2.5-Embedding-350M-GGUF (https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M-GGUF/) 均提供不同量化架构的 Hugging Face 版本，用于 llama.cpp。

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#inference-speed—enterprise-gpu推理速度 — 企业级 GPU

对于大规模生产级企业部署，我们还使用内部 GPU 栈进行实验，以在高并发负载下实现极低延迟服务。我们观察到低至 1 ms 的延迟。

GPU 服务延迟 (https://cdn-uploads.huggingface.co/production/uploads/63f389fda096536aeaae0a66/WTdmKJ2LpG07-iAqXYGDe.png)

加载	设置	p50	p95	p99
LFM2.5-Embedding-350M	查询嵌入	1.5 ms	1.6 ms	1.7 ms
LFM2.5-ColBERT-350M	查询嵌入	1.3 ms	1.4 ms	1.5 ms
LFM2.5-ColBERT-350M	查询嵌入 + MaxSim	2.5 ms	2.7 ms	2.8 ms
LFM2.5-ColBERT-350M	查询嵌入 + 文档嵌入 + MaxSim	22.8 ms	24.1 ms	26.4 ms

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#%F0%9F%93%AC-contact📬 联系方式

有问题或想联系？加入我们的 Discord 社区 (https://discord.com/invite/liquid-ai)。
如果您对边缘部署的定制解决方案感兴趣，请联系我们的销售团队 (https://www.liquid.ai/contact)。

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#citation引用

@article{liquidai2025lfm2,
  title={LFM2 Technical Report},
  author={Liquid AI},
  journal={arXiv preprint arXiv:2511.23404},
  year={2025}
}

LiquidAI/LFM2.5-Embedding-350M

LiquidAI/LFM2.5-Embedding-350M · Hugging Face

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#%F0%9F%93%84-model-details📄 模型详情

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#%F0%9F%8F%83-how-to-run🏃 如何运行

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#encoding-queries-and-documents编码查询和文档

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#flash-attention-2-optionalFlash Attention 2（可选）

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#fine-tuning微调

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#%F0%9F%93%88-performance📈 性能

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#nanobeir-multilingual-extended–ndcg10NanoBEIR Multilingual Extended — NDCG@10

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#mkqa-11–recall20MKQA-11 — Recall@20

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#inference-speed—llamacpp推理速度 — llama.cpp

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#inference-speed—enterprise-gpu推理速度 — 企业级 GPU

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#%F0%9F%93%AC-contact📬 联系方式

https://huggingface.co/LiquidAI/LFM2.5-Embedding-350M#citation引用

相似文章

LiquidAI/LFM2.5-ColBERT-350M

@liquidai: 介绍 LFM2.5-Embedding-350M 和 LFM2.5-ColBERT-350M：两款为超快且精准的多语言检索模型

LiquidAI/LFM2.5-230M

LiquidAI/LFM2.5-8B-A1B-GGUF

Liquid AI 发布 LFM2.5-8B-A1B

提交意见反馈