Full duplex vs half duplex - the spectrum of AI voice models [D]
Summary
An analysis of half-duplex vs full-duplex architecture in AI voice models, discussing key features like overlap, backchannels, and barge-in that make voice agents sound robotic.
Similar Articles
How AI voice agents actually work
A detailed explainer on the five-layer architecture of AI voice agents, including speech-to-text, LLM, text-to-speech, orchestrator, and telephony, all operating under a 500ms latency constraint to maintain natural conversation flow.
MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models
MTR-DuplexBench introduces a comprehensive benchmark for evaluating Full-Duplex Speech Language Models in multi-round conversations, addressing challenges like blurred turn boundaries and context inconsistency while assessing conversational features, dialogue quality, instruction following, and safety.
Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models
This paper analyzes synchronization and turn-taking dynamics in full-duplex speech dialogue models by simulating conversations between two instances of the Moshi model, measuring representational alignment via CKA and predicting turn boundaries with LSTM probes.
EchoChain: A Full-Duplex Benchmark for State-Update Reasoning Under Interruptions
EchoChain is a new benchmark for evaluating AI models' ability to revise in-progress responses when users interrupt mid-generation. The benchmark identifies three failure patterns (contextual inertia, interruption amnesia, objective displacement) and finds that across evaluated real-time voice models, no system exceeds 50% pass rate.
Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction
This paper introduces Omni-DuplexEval, a benchmark and automatic evaluation framework for real-time duplex interaction in multimodal large language models, assessing continuous response generation and proactive event detection in streaming scenarios.