Tag
OpenMedQ is a fully-open medical vision-language model pretrained on 14 datasets (~3.35M samples), achieving state-of-the-art results on medical VQA and classification benchmarks.
This paper introduces LLaVA-UHD v4, which improves visual encoding efficiency in multimodal large language models by using slice-based encoding and intra-ViT early compression. It reduces computational costs by over 55% while maintaining or improving performance on high-resolution image tasks.