multimodal-embeddings

Tag

Cards List
#multimodal-embeddings

jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition

arXiv cs.CL · 2d ago Cached

This paper introduces jina-embeddings-v5-omni, a suite of multimodal embedding models that extend text embeddings to image, audio, and video using frozen-tower composition. The method trains only 0.35% of the total weights, maintaining text geometry while achieving competitive state-of-the-art performance with significantly lower computational cost.

0 favorites 0 likes
← Back to home

Submit Feedback