Introducing SAM Audio: The First Unified Multimodal Model for Audio Separation
Summary
SAM Audio is introduced as the first unified multimodal model for audio separation, enabling users to isolate specific sounds from complex mixtures using text, visual, or temporal prompts.
Similar Articles
SAM 3.1: Faster and More Accessible Real-Time Video Detection and Tracking With Multiplexing and Global Reasoning
Meta AI releases SAM 3.1, an update to the Segment Anything Model that enhances real-time video detection and tracking through multiplexing and global reasoning capabilities.
MultiLinguahah : A New Unsupervised Multilingual Acoustic Laughter Segmentation Method
This paper introduces MultiLinguahah, an unsupervised multilingual method for acoustic laughter segmentation using Isolation Forests on BYOL-A encoder representations. The authors demonstrate that their approach outperforms state-of-the-art supervised methods in non-English settings by treating laughter detection as an anomaly detection task.
Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation
The paper introduces JoyAI-Image, a unified multimodal foundation model that integrates a spatially enhanced MLLM with MMDiT to achieve state-of-the-art performance in visual understanding, text-to-image generation, and instruction-guided editing.
SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning
SAMoRA introduces a semantic-aware router and task-adaptive scaling to improve expert specialization and dynamic weighting in MoE-LoRA fine-tuning, outperforming prior methods on multi-task benchmarks.
@lillyguisnet: WEEE!!! I had not had the opportunity to try SAM3.1 yet, but simply prompting for "worm" perfectly segmented my images!…
A user shares enthusiastic feedback about SAM 3.1's ability to accurately segment images using simple text prompts like 'worm', highlighting significant improvements over SAM 1.