@NielsRogge: Just added this blog as a project page to https://paperswithcode.co/paper/2410.00037…! Hope that more people can learn …

X AI KOLs Following Papers

Summary

NielsRogge added a blog explaining the Moshi full-duplex voice model as a project page on Papers With Code, aiming to increase accessibility to the state-of-the-art architecture.

Just added this blog as a project page to https://t.co/6yQIMR6Ltn! Hope that more people can learn about state-of-the-art full-duplex voice models this way :) https://t.co/IcZwVCjIp2
Original Article
View Cached Full Text

Cached at: 06/18/26, 06:10 PM

Just added this blog as a project page to https://t.co/6yQIMR6Ltn!

Hope that more people can learn about state-of-the-art full-duplex voice models this way :) https://t.co/IcZwVCjIp2

rohit (@bicro_): Moshi is one of the best open source full-duplex voice models out there. The architecture is dense, so we spent a few days studying it and wrote up what we learned, with diagrams to make it click faster.

Let us know if it was helpful 🤠

Similar Articles

MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models

arXiv cs.CL

MoshiRAG combines a compact full-duplex speech language model with asynchronous retrieval-augmented generation to improve factuality while maintaining real-time interactivity. The approach leverages natural temporal gaps in conversation to retrieve external knowledge without disrupting the natural flow of dialogue.

Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models

arXiv cs.CL

This paper analyzes synchronization and turn-taking dynamics in full-duplex speech dialogue models by simulating conversations between two instances of the Moshi model, measuring representational alignment via CKA and predicting turn boundaries with LSTM probes.