Tag
This paper develops a geometric framework to measure semantic content of texts using sentence embeddings, proposing a three-coordinate semantic profile (novelty, breadth, integration) and a scalar trade-off triangle, validated across synthetic categories and novels.
This paper presents embeddingmagibu-200m, a Turkish-focused sentence embedding model built via cross-lingual tokenizer surgery and offline distillation, achieving strong performance on Turkish benchmarks with a cost-quality trade-off.