@CopyRebeldia: The business of charging you every month to turn your meetings into a summary just had a very bad day. Microsoft droppe…

X AI KOLs Timeline Models

Summary

Microsoft released VibeVoice, an open-source model that processes a full hour of audio in one pass and returns a structured transcript with speaker identification and timestamps, disrupting paid transcription services.

The business of charging you every month to turn your meetings into a summary just had a very bad day. Microsoft dropped for free on GitHub a model that swallows an entire hour of audio in a single pass and spits it back neatly organized: this person said this at minute 12, that one at 34. Who, when, and what. Without chopping up the audio. Without anyone spending the night transcribing it. Half the industry that lived off this drudgery has spent the day staring at the repo in silence. It's called VibeVoice.
Original Article
View Cached Full Text

Cached at: 06/09/26, 08:48 AM

The business of charging you every month to turn your meetings into a summary just had a very bad day.

Microsoft dropped for free on GitHub a model that swallows an entire hour of audio in a single pass and spits it back neatly organized: this person said this at minute 12, that one at 34. Who, when, and what.

Without chopping up the audio. Without anyone spending the night transcribing it.

Half the industry that lived off this drudgery has spent the day staring at the repo in silence.

It’s called VibeVoice.

Similar Articles

@uniswap12: Microsoft open-sourced a voice AI that can transcribe 60 minutes of long audio in one go, handling 4 people speaking simultaneously. VibeVoice, open-sourced by Microsoft, 24.8k stars, I only found out about it today. For converting recordings to text, I've been using Whisper, but it often times out on long meeting recordings and struggles with multi-speaker recognition...

X AI KOLs Timeline

Microsoft open-sourced the VibeVoice speech AI framework, which supports one-shot transcription of 60-minute long audio, multi-speaker diarization and timestamp labeling, and also provides multi-role TTS synthesis capabilities. It is based on Qwen2.5 and comes with a 0.5B lightweight real-time version. It has received 24.8k stars on GitHub.

VibeVoice Technical Report

Papers with Code Trending

VibeVoice is a new model from Microsoft that synthesizes long-form multi-speaker speech using next-token diffusion and a highly efficient continuous speech tokenizer. It achieves superior fidelity and compression, supporting up to 90 minutes of audio with multiple speakers.

@dhaber: https://x.com/dhaber/status/2064711613714735141

X AI KOLs Following

The article argues that recording all workplace conversations is becoming the norm, driven by AI's need for context and the productivity advantages for individuals and leaders. It predicts a new category of enterprise software organized around voice data, where AI learns company culture by attending meetings.