audio-ai

#audio-ai

Higgs Audio v3 TTS 4B. Built for voice chat. Support 100 languages and inline control.

Reddit r/LocalLLaMA ↗ · 2026-06-04

Higgs Audio v3 is a 4B parameter TTS model designed for voice chat applications, supporting 100 languages with inline control capabilities.

0 favorites 0 likes

#audio-ai

SpurAudio: A Benchmark for Studying Shortcut Learning in Few-Shot Audio Classification

arXiv cs.AI ↗ · 2026-06-04 Cached

SpurAudio is a new benchmark designed to evaluate shortcut learning and spurious correlations in few-shot audio classification, revealing that state-of-the-art methods—including large pretrained audio foundation models—suffer significant performance degradation when background correlations are disrupted.

0 favorites 0 likes

#audio-ai

SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing

Hugging Face Daily Papers ↗ · 2026-06-03

SpeechEditBench is a bilingual multi-attribute benchmark for evaluating instruction-guided speech editing across seven atomic tasks and compositional tasks, using an anchor-based evaluation protocol with three metrics. Evaluation of mainstream Speech LLMs reveals no single model excels across all dimensions, and compositional editing remains highly challenging.

0 favorites 0 likes

#audio-ai

OpenSTBench: Beyond Semantic Evaluation for Speech Translation

Hugging Face Daily Papers ↗ · 2026-05-29

OpenSTBench is a unified multidimensional evaluation framework for speech translation systems that jointly assesses translation quality, speech quality, speaker preservation, emotion fidelity, and latency across both S2TT and S2ST systems in offline and streaming settings. The framework addresses the gap left by fragmented evaluation protocols and provides a reproducible benchmark for comparing heterogeneous speech translation systems.

0 favorites 0 likes

#audio-ai

new AI music model dropped, demos sound surprisingly real

Reddit r/ArtificialInteligence ↗ · 2026-05-26

A new AI music model has been released, with demos that sound surprisingly realistic.

0 favorites 0 likes

#audio-ai

Spotify takes on Google’s NotebookLM with its new app

TechCrunch AI ↗ · 2026-05-21 Cached

Spotify debuts a new desktop app called Studio by Spotify Labs that uses AI to generate personalized podcasts from users' email, calendar, and documents, directly competing with Google's NotebookLM.

0 favorites 0 likes

#audio-ai

@juberti: gpt-realtime-2 shows a 15pp improvement (vs 1.5) on Big Bench Audio, and is now close to saturation.

X AI KOLs Following ↗ · 2026-05-07

GPT-Realtime-2 demonstrates a 15 percentage point improvement over version 1.5 on the Big Bench Audio benchmark, approaching saturation levels.

0 favorites 0 likes

#audio-ai

APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music

Hugging Face Daily Papers ↗ · 2026-05-05 Cached

APEX is a large-scale multi-task learning framework that predicts both popularity and aesthetic quality of AI-generated music using frozen audio embeddings. The model demonstrates strong generalization across different generative architectures by jointly predicting engagement signals and perceptual quality dimensions.

0 favorites 0 likes

#audio-ai

Build with Lyria 3, our newest music generation model

Google AI Blog ↗ · 2026-03-25 Cached

Google has released Lyria 3, its newest music generation model, available to developers through the Gemini API and Google AI Studio. The model offers two variants: Lyria 3 Pro for full songs and Lyria 3 Clip for shorter clips, with controls for tempo, lyrics, and image-to-music multimodal input.

0 favorites 0 likes

#audio-ai

DolphinGemma: How Google AI is helping decode dolphin communication

Google DeepMind Blog ↗ · 2025-04-14 Cached

Google has developed DolphinGemma, a large language model designed to learn and generate dolphin vocalizations, collaborating with Georgia Tech and the Wild Dolphin Project to advance understanding of dolphin communication patterns and enable potential interspecies dialogue.

0 favorites 0 likes

audio-ai

Submit Feedback