Tag
This paper presents a controlled, multi-seed study testing whether adapting frozen sentence embeddings to input difficulty improves performance. It finds that per-sentence complexity conditioning fails, while a pair-level residual gated by a cross-encoder difficulty signal yields consistent gains on semantic similarity tasks.
This paper introduces jina-embeddings-v5-omni, a suite of multimodal embedding models that extend text embeddings to image, audio, and video using frozen-tower composition. The method trains only 0.35% of the total weights, maintaining text geometry while achieving competitive state-of-the-art performance with significantly lower computational cost.