Introducing SAM Audio: The First Unified Multimodal Model for Audio Separation

Meta AI Blog Models

Summary

SAM Audio is introduced as the first unified multimodal model for audio separation, enabling users to isolate specific sounds from complex mixtures using text, visual, or temporal prompts.

SAM Audio transforms audio processing by making it easy to isolate any sound from complex audio mixtures using natural, multimodal prompts — whether through text, visual cues, or marking time segments.
Original Article

Similar Articles

MultiLinguahah : A New Unsupervised Multilingual Acoustic Laughter Segmentation Method

arXiv cs.CL

This paper introduces MultiLinguahah, an unsupervised multilingual method for acoustic laughter segmentation using Isolation Forests on BYOL-A encoder representations. The authors demonstrate that their approach outperforms state-of-the-art supervised methods in non-English settings by treating laughter detection as an anomaly detection task.