Tag
FAST-GOAL is a fine-tuning method that enhances CLIP's ability to align global and local semantics in images and lengthy text, introducing FLISM and TSL modules and the GLIT100k dataset. It achieves improvements on long caption datasets.
This paper proposes a data-driven framework using embeddings from multilingual LLMs to detect lexical gaps between languages, achieving high accuracy in Korean-English pairs.
SemBridge is a novel embedding initialization method that leverages multilingual bridge models to establish semantic alignments between source and target vocabularies, improving cross-lingual sparse encoder adaptation and retrieval performance across multiple languages.
Qwen-Image-VAE-2.0 is a high-compression Variational Autoencoder suite that improves reconstruction fidelity and diffusability through enhanced architecture, large-scale training, and semantic alignment strategies.