semantic-embeddings

#semantic-embeddings

Robust Text Watermarking for Large Language Models via Dual Semantic Embeddings

arXiv cs.CL ↗ · 2026-07-01 Cached

This paper presents Dual-Embedding Watermarking (DEW), a semantic watermarking scheme for LLMs that improves robustness against paraphrasing and translation by leveraging contextual and token-level embeddings. Experimental results show improved detection after paraphrasing and translation compared to prior methods.

0 favorites 0 likes

#semantic-embeddings

Examining the Limits of Word2Vec with Toki Pona

arXiv cs.CL ↗ · 2026-06-17 Cached

This paper investigates whether Word2Vec can generate meaningful semantic embeddings for Toki Pona, a constructed language with only ~130 words, using a corpus of 1.4 million sentences, and examines the effect of non-Toki Pona tokens on embedding quality.

0 favorites 0 likes

#semantic-embeddings

Why I stopped using semantic embeddings for tool selection and switched back to BM25 [D]

Reddit r/MachineLearning ↗ · 2026-06-08

The author shares their experience switching from semantic embeddings to BM25 for tool selection in agents, finding that BM25 achieves 81% top-1 accuracy vs. 64% for embeddings on a corpus of 200 query-tool pairs, because tool descriptions are short and keyword-driven rather than semantically rich like documents.

0 favorites 0 likes

#semantic-embeddings

Capturing LLM Capabilities via Evidence-Calibrated Query Clustering

Hugging Face Daily Papers ↗ · 2026-05-16 Cached

This paper proposes Evidence-Calibrated Query Clustering (ECC), an algorithm that aligns semantic embeddings with latent LLM capability demands using posterior model comparisons and Bradley-Terry modeling, significantly improving capability ranking quality for LLM evaluation.

0 favorites 0 likes

semantic-embeddings

Robust Text Watermarking for Large Language Models via Dual Semantic Embeddings

Examining the Limits of Word2Vec with Toki Pona

Why I stopped using semantic embeddings for tool selection and switched back to BM25 [D]

Capturing LLM Capabilities via Evidence-Calibrated Query Clustering

Submit Feedback