Tag
Jordi Pons announces Stable Audio 3, a family of open-weight models for generating instrumental music and sound effects, supporting fast generation and editing on licensed audio.
Proposes TAP, a tabular augmentation policy that couples diffusion inpainting with a learner-conditioned policy to improve downstream model performance under data scarcity, outperforming strong baselines on real-world datasets.
This paper analyzes zero-shot conditional sampling with pretrained diffusion models for linear inverse problems, providing information-theoretic guarantees and proposing a projected-Langevin initialization method.
Netflix releases VOID, a video inpainting model that removes objects from videos while realistically simulating physical interactions (e.g., objects falling when a person is removed), built on CogVideoX and fine-tuned with interaction-aware quadmask conditioning.