@NielsRogge: Just added this blog as a project page to https://paperswithcode.co/paper/2410.00037…! Hope that more people can learn …
Summary
NielsRogge added a blog explaining the Moshi full-duplex voice model as a project page on Papers With Code, aiming to increase accessibility to the state-of-the-art architecture.
View Cached Full Text
Cached at: 06/18/26, 06:10 PM
Just added this blog as a project page to https://t.co/6yQIMR6Ltn!
Hope that more people can learn about state-of-the-art full-duplex voice models this way :) https://t.co/IcZwVCjIp2
rohit (@bicro_): Moshi is one of the best open source full-duplex voice models out there. The architecture is dense, so we spent a few days studying it and wrote up what we learned, with diagrams to make it click faster.
Let us know if it was helpful 🤠
Similar Articles
@kyutai_labs: New paper: Multi-Faceted Interactivity Alignment in Full-Duplex Speech Models We use RL to post-train speech models (Mo…
Kyutai Labs released a new paper on using reinforcement learning to post-train speech models (Moshi and PersonaPlex) for more human-like interaction, including when to respond, wait, or give listening cues.
@NielsRogge: Introducing a revival of PapersWithCode! As @ilyasut said, we're back to the "age of research". Hence, it's important t…
NielsRogge announces a revival of PapersWithCode, featuring SOTA per domain, leaderboards, and methods parsed at scale using AI agents.
MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models
MoshiRAG combines a compact full-duplex speech language model with asynchronous retrieval-augmented generation to improve factuality while maintaining real-time interactivity. The approach leverages natural temporal gaps in conversation to retrieve external knowledge without disrupting the natural flow of dialogue.
@NielsRogge: Introducing Conference Papers on http://paperswithcode.co! Browse all accepted @CVPR 2026 papers easily by domain, from…
Papers With Code introduces a feature to browse all CVPR 2026 accepted papers by domain, with links to code, project pages, and Hugging Face artifacts, including Oral and Spotlight papers.
Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models
This paper analyzes synchronization and turn-taking dynamics in full-duplex speech dialogue models by simulating conversations between two instances of the Moshi model, measuring representational alignment via CKA and predicting turn boundaries with LSTM probes.