Tag
Tuna-2 is a unified multimodal model that achieves state-of-the-art performance by processing visual understanding and generation directly from pixel embeddings, eliminating the need for pretrained vision encoders.
LLaDA2.0-Uni unifies multimodal understanding and generation within a single diffusion-based large language model architecture.