Full duplex vs half duplex - the spectrum of AI voice models [D]

Reddit r/MachineLearning 06/01/26, 10:56 PM News

Summary

An analysis of half-duplex vs full-duplex architecture in AI voice models, discussing key features like overlap, backchannels, and barge-in that make voice agents sound robotic.

It seems that there are two ways to build voice AI: Half-duplex: strict turn-taking. You speak, the other side waits until you’re done, one direction of speech at a time. ← This is how almost every voice assistant works today. Full-duplex: two channels, both sides can talk at any time - no more waiting for your “turn”. ← This is the way humans actually talk. In fact, there are three crucial things half-duplex voice models can't really do: * Overlap - talking and listening at the same time without falling apart * Backchannels - the "mhms," "rights," and "yeahs" you drop in while the other person is still going * Barge-in - getting interrupted mid-sentence and recovering gracefully These three features are a big reason why voice agents still feel “robotic” to this day. But what exactly is the spectrum from half-duplex to full-duplex? Is a Moshi-style architecture the only way to approach full-duplex natural voice conversations? What are ways half-duplex systems could imitate full-duplex? Would love to hear others' thoughts on this.

Original Article

Full duplex vs half duplex - the spectrum of AI voice models [D]

Similar Articles

Building voice AI agents that take turns like humans — the gotchas nobody warns you about

How AI voice agents actually work

MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

The voice of the AI

Voice feels like the underrated output layer for AI agents

Submit Feedback

Similar Articles

Building voice AI agents that take turns like humans — the gotchas nobody warns you about

How AI voice agents actually work

MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

Voice feels like the underrated output layer for AI agents