speaker-diarization

Tag

Cards List
#speaker-diarization

@FeitengLi: Next week, after adding speaker labeling and speech generation, it won't be this cheap early bird price anymore.

X AI KOLs Timeline · yesterday Cached

EdgeSpeak officially launched, a local-first, privacy-preserving accurate transcription tool, supporting semantic segmentation and timestamps, compatible with OpenAI Audio API, etc. It will later add speaker labeling and speech generation features.

0 favorites 0 likes
#speaker-diarization

@uniswap12: Microsoft open-sourced a voice AI that can transcribe 60 minutes of long audio in one go, handling 4 people speaking simultaneously. VibeVoice, open-sourced by Microsoft, 24.8k stars, I only found out about it today. For converting recordings to text, I've been using Whisper, but it often times out on long meeting recordings and struggles with multi-speaker recognition...

X AI KOLs Timeline · 2026-06-04 Cached

Microsoft open-sourced the VibeVoice speech AI framework, which supports one-shot transcription of 60-minute long audio, multi-speaker diarization and timestamp labeling, and also provides multi-role TTS synthesis capabilities. It is based on Qwen2.5 and comes with a 0.5B lightweight real-time version. It has received 24.8k stars on GitHub.

0 favorites 0 likes
#speaker-diarization

MUSCAT: MUltilingual, SCientific ConversATion Benchmark

arXiv cs.CL · 2026-04-20 Cached

MUSCAT is a new multilingual, scientific conversation benchmark dataset for evaluating ASR systems on challenging multilingual scenarios including code-switching, domain-specific vocabulary, and mixed language input. The dataset consists of bilingual discussions on scientific papers between speakers using different languages, with results showing current state-of-the-art systems struggle with these multilingual challenges.

0 favorites 0 likes
← Back to home

Submit Feedback