Introducing SAM Audio: The First Unified Multimodal Model for Audio Separation

Meta AI Blog 12/15/25, 04:00 PM Models

Summary

SAM Audio is introduced as the first unified multimodal model for audio separation, enabling users to isolate specific sounds from complex mixtures using text, visual, or temporal prompts.

SAM Audio transforms audio processing by making it easy to isolate any sound from complex audio mixtures using natural, multimodal prompts — whether through text, visual cues, or marking time segments.

Original Article

Similar Articles

SAM 3.1: Faster and More Accessible Real-Time Video Detection and Tracking With Multiplexing and Global Reasoning

Meta AI Blog

Meta AI releases SAM 3.1, an update to the Segment Anything Model that enhances real-time video detection and tracking through multiplexing and global reasoning capabilities.

MultiLinguahah : A New Unsupervised Multilingual Acoustic Laughter Segmentation Method

arXiv cs.CL

This paper introduces MultiLinguahah, an unsupervised multilingual method for acoustic laughter segmentation using Isolation Forests on BYOL-A encoder representations. The authors demonstrate that their approach outperforms state-of-the-art supervised methods in non-English settings by treating laughter detection as an anomaly detection task.

Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

Hugging Face Daily Papers

The paper introduces JoyAI-Image, a unified multimodal foundation model that integrates a spatially enhanced MLLM with MMDiT to achieve state-of-the-art performance in visual understanding, text-to-image generation, and instruction-guided editing.

SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning

arXiv cs.CL

SAMoRA introduces a semantic-aware router and task-adaptive scaling to improve expert specialization and dynamic weighting in MoE-LoRA fine-tuning, outperforming prior methods on multi-task benchmarks.

@lillyguisnet: WEEE!!! I had not had the opportunity to try SAM3.1 yet, but simply prompting for "worm" perfectly segmented my images!…

X AI KOLs Following

A user shares enthusiastic feedback about SAM 3.1's ability to accurately segment images using simple text prompts like 'worm', highlighting significant improvements over SAM 1.