Tag
LLaDA2.0-Uni unifies multimodal understanding and generation within a single diffusion-based large language model architecture.
unsloth releases a GGUF quantized version of Baidu's ERNIE-Image-Turbo model using Unsloth Dynamic 2.0 methodology, enabling efficient text-to-image generation in 8 inference steps on consumer GPUs with 24GB VRAM.
Comfy-Org has repackaged Baidu's ERNIE-Image and ERNIE-Image-Turbo models for ComfyUI integration, providing ready-to-use model files organized for the ComfyUI node-based image generation framework.
Netflix releases VOID, a video inpainting model that removes objects from videos while realistically simulating physical interactions (e.g., objects falling when a person is removed), built on CogVideoX and fine-tuned with interaction-aware quadmask conditioning.
OpenAI presents improved techniques for training consistency models that enable high-quality single-step image generation without distillation, achieving significant FID improvements on CIFAR-10 and ImageNet 64×64 through novel loss functions and training strategies.