@CopyRebeldia: The business of charging you every month to turn your meetings into a summary just had a very bad day. Microsoft droppe…
Summary
Microsoft released VibeVoice, an open-source model that processes a full hour of audio in one pass and returns a structured transcript with speaker identification and timestamps, disrupting paid transcription services.
View Cached Full Text
Cached at: 06/09/26, 08:48 AM
The business of charging you every month to turn your meetings into a summary just had a very bad day.
Microsoft dropped for free on GitHub a model that swallows an entire hour of audio in a single pass and spits it back neatly organized: this person said this at minute 12, that one at 34. Who, when, and what.
Without chopping up the audio. Without anyone spending the night transcribing it.
Half the industry that lived off this drudgery has spent the day staring at the repo in silence.
It’s called VibeVoice.
Similar Articles
@uniswap12: Microsoft open-sourced a voice AI that can transcribe 60 minutes of long audio in one go, handling 4 people speaking simultaneously. VibeVoice, open-sourced by Microsoft, 24.8k stars, I only found out about it today. For converting recordings to text, I've been using Whisper, but it often times out on long meeting recordings and struggles with multi-speaker recognition...
Microsoft open-sourced the VibeVoice speech AI framework, which supports one-shot transcription of 60-minute long audio, multi-speaker diarization and timestamp labeling, and also provides multi-role TTS synthesis capabilities. It is based on Qwen2.5 and comes with a 0.5B lightweight real-time version. It has received 24.8k stars on GitHub.
VibeVoice Technical Report
VibeVoice is a new model from Microsoft that synthesizes long-form multi-speaker speech using next-token diffusion and a highly efficient continuous speech tokenizer. It achieves superior fidelity and compression, supporting up to 90 minutes of audio with multiple speakers.
@tom_doerr: Captures, transcribes, and summarizes meetings entirely locally https://github.com/Zackriya-Solutions/meeting-minutes…
Meetily is a privacy-first, open-source AI meeting assistant that captures, transcribes, and summarizes meetings entirely locally on the user's infrastructure.
@tom_doerr: Transcribes audio at 70x real-time speed https://github.com/m-bain/whisperX
WhisperX is a tool for fast automatic speech recognition with word-level timestamps and speaker diarization, offering 70x realtime transcription using Whisper large-v2.
@dhaber: https://x.com/dhaber/status/2064711613714735141
The article argues that recording all workplace conversations is becoming the norm, driven by AI's need for context and the productivity advantages for individuals and leaders. It predicts a new category of enterprise software organized around voice data, where AI learns company culture by attending meetings.