Tag
This paper proposes a cross-modal knowledge distillation framework that works without paired data by aligning feature and label distributions, offering theoretical guarantees and outperforming prior methods on multimodal benchmarks.