Tag
ViQ presents a visual quantization framework that balances semantic richness and detail preservation in discrete representations, enabling efficient multimodal training with native-resolution inputs by using text-aligned pre-training and proximal representation learning.