Introducing SAM Audio: The First Unified Multimodal Model for Audio Separation
Summary
SAM Audio is introduced as the first unified multimodal model for audio separation, enabling users to isolate specific sounds from complex mixtures using text, visual, or temporal prompts.
Similar Articles
AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting
AuralSAM2 integrates audio into SAM2 via an AuralFuser module that generates sparse and dense prompts from audio-visual features, enhancing cross-modal segmentation while maintaining interactive efficiency.
SAM 3: Segment Anything with Concepts
SAM 3 introduces a unified model for promptable concept segmentation and tracking, achieving state-of-the-art performance with a decoupled recognition and localization architecture and a scalable data engine.
Audio Interaction Model
This paper introduces Audio-Interaction, a unified streaming audio model that combines offline task execution with real-time audio instruction following via an end-to-end framework. It proposes SoundFlow for the perceive-decide-respond loop and evaluates competitive performance across benchmarks.
@multimodalart: Stable Audio 3 by @StabilityAI is just out It mainly comes with 3 open source variants: - Stable Audio 3 Medium (2B) - …
Stability AI released Stable Audio 3 with open source variants for music and VFX, offering fast and high-quality audio generation.
SAM 3.1: Faster and More Accessible Real-Time Video Detection and Tracking With Multiplexing and Global Reasoning
Meta AI releases SAM 3.1, an update to the Segment Anything Model that enhances real-time video detection and tracking through multiplexing and global reasoning capabilities.