Tag
Introduces Semantic Generative Tuning (SGT), a paradigm that uses image segmentation as a generative proxy to align visual understanding and generation in unified multimodal models, improving both comprehension and fidelity.