full-duplex

Tag

Cards List
#full-duplex

Full duplex vs half duplex - the spectrum of AI voice models [D]

Reddit r/MachineLearning · 2026-06-01

An analysis of half-duplex vs full-duplex architecture in AI voice models, discussing key features like overlap, backchannels, and barge-in that make voice agents sound robotic.

0 favorites 0 likes
#full-duplex

Raon-Speech Technical Report

arXiv cs.CL · 2026-05-26 Cached

Raon-Speech is a 9B-parameter speech language model for English and Korean, supporting understanding, answering, and generation, with a full-duplex extension Raon-SpeechChat for natural real-time conversation. It achieves strong performance across 42 benchmarks and is fully open-sourced.

0 favorites 0 likes
#full-duplex

Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models

arXiv cs.CL · 2026-05-21 Cached

This paper analyzes synchronization and turn-taking dynamics in full-duplex speech dialogue models by simulating conversations between two instances of the Moshi model, measuring representational alignment via CKA and predicting turn boundaries with LSTM probes.

0 favorites 0 likes
#full-duplex

@rohanpaul_ai: Just a few days back, Thinking Machines Lab (TML), showcased a way of making AI interaction continuous instead of turn-…

X AI KOLs Following · 2026-05-17 Cached

Thinking Machines Lab and OpenBMB released MiniCPM-o 4.5, a 9B full-duplex omnimodal model with the Omni-Flow framework that enables continuous, time-aligned real-time video and voice interaction, surpassing previous models and available as open source.

0 favorites 0 likes
#full-duplex

@miramurati: Today we're sharing our work on interaction models. A new class of model trained from scratch to handle real-time inter…

X AI KOLs Following · 2026-05-11 Cached

Mira Murati's team showcased a preview of the new interaction model. Trained from scratch, it natively supports full-duplex real-time audio and video conversations, instant interruptions, multi-language translation, and dynamic multi-tasking. The demonstration verified its core capabilities in low-latency streaming interaction, multimodal perception, and concurrent task execution.

1 favorites 1 likes
#full-duplex

EchoChain: A Full-Duplex Benchmark for State-Update Reasoning Under Interruptions

arXiv cs.CL · 2026-04-21 Cached

EchoChain is a new benchmark for evaluating AI models' ability to revise in-progress responses when users interrupt mid-generation. The benchmark identifies three failure patterns (contextual inertia, interruption amnesia, objective displacement) and finds that across evaluated real-time voice models, no system exceeds 50% pass rate.

0 favorites 0 likes
#full-duplex

MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models

arXiv cs.CL · 2026-04-20 Cached

MoshiRAG combines a compact full-duplex speech language model with asynchronous retrieval-augmented generation to improve factuality while maintaining real-time interactivity. The approach leverages natural temporal gaps in conversation to retrieve external knowledge without disrupting the natural flow of dialogue.

0 favorites 0 likes
#full-duplex

MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

arXiv cs.CL · 2026-04-20 Cached

MTR-DuplexBench introduces a comprehensive benchmark for evaluating Full-Duplex Speech Language Models in multi-round conversations, addressing challenges like blurred turn boundaries and context inconsistency while assessing conversational features, dialogue quality, instruction following, and safety.

0 favorites 0 likes
← Back to home

Submit Feedback