audio-language-model

#audio-language-model

Audio Interaction Model

Hugging Face Daily Papers ↗ · 2026-06-03 Cached

This paper introduces Audio-Interaction, a unified streaming audio model that combines offline task execution with real-time audio instruction following via an end-to-end framework. It proposes SoundFlow for the perceive-decide-respond loop and evaluates competitive performance across benchmarks.

0 favorites 0 likes

#audio-language-model

StepAudio 2.5 Technical Report

Hugging Face Daily Papers ↗ · 2026-05-22 Cached

StepAudio 2.5 is a unified audio-language model that achieves state-of-the-art results across ASR, TTS, and real-time spoken interaction by leveraging task-tailored reinforcement learning from human feedback to optimize shared representations.

0 favorites 0 likes

audio-language-model

Audio Interaction Model

StepAudio 2.5 Technical Report

Submit Feedback