tokenizer-optimization

Tag

Cards List
#tokenizer-optimization

Adapting Multilingual Embedding Models to Turkish via Cross-Lingual Tokenizer Surgery and Offline Distillation

Hugging Face Daily Papers · 2026-05-28 Cached

This paper presents embeddingmagibu-200m, a Turkish-focused sentence embedding model built via cross-lingual tokenizer surgery and offline distillation, achieving strong performance on Turkish benchmarks with a cost-quality trade-off.

0 favorites 0 likes
← Back to home

Submit Feedback