full-duplex

#full-duplex

TurnNat: Automatic Evaluation of Turn-Taking Naturalness in Dyadic Spoken Dialogue

arXiv cs.CL ↗ · 2d ago Cached

TurnNat is a likelihood-based framework for automatically evaluating turn-taking naturalness in dyadic spoken dialogue, using a causal turn-taking prediction model trained on natural conversations to measure timing atypicality via negative log-likelihood.

0 favorites 0 likes

#full-duplex

BayLing-Duplex: Native Full-Duplex Speech Dialogue with a Single Autoregressive LLM

arXiv cs.CL ↗ · 2026-06-15 Cached

BayLing-Duplex is a native full-duplex speech language model that enables a single autoregressive LLM to manage turn-taking and interruptions without external VAD modules, achieving high success rates and improved response quality over prior models.

0 favorites 0 likes

#full-duplex

Overcoming State Inertia in Full-Duplex Spoken Language Models via Activation Steering

arXiv cs.CL ↗ · 2026-06-11 Cached

This paper identifies 'state inertia' in full-duplex spoken language models, where the model's internal predictive focus lags during user interruptions, and proposes a training-free activation steering method to improve interruption handling.

0 favorites 0 likes

#full-duplex

@kyutai_labs: New paper: Multi-Faceted Interactivity Alignment in Full-Duplex Speech Models We use RL to post-train speech models (Mo…

X AI KOLs Following ↗ · 2026-06-10 Cached

Kyutai Labs released a new paper on using reinforcement learning to post-train speech models (Moshi and PersonaPlex) for more human-like interaction, including when to respond, wait, or give listening cues.

0 favorites 0 likes

#full-duplex

Full duplex vs half duplex - the spectrum of AI voice models [D]

Reddit r/MachineLearning ↗ · 2026-06-01

An analysis of half-duplex vs full-duplex architecture in AI voice models, discussing key features like overlap, backchannels, and barge-in that make voice agents sound robotic.

0 favorites 0 likes

#full-duplex

Raon-Speech Technical Report

arXiv cs.CL ↗ · 2026-05-26 Cached

Raon-Speech is a 9B-parameter speech language model for English and Korean, supporting understanding, answering, and generation, with a full-duplex extension Raon-SpeechChat for natural real-time conversation. It achieves strong performance across 42 benchmarks and is fully open-sourced.

0 favorites 0 likes

#full-duplex

Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models

arXiv cs.CL ↗ · 2026-05-21 Cached

This paper analyzes synchronization and turn-taking dynamics in full-duplex speech dialogue models by simulating conversations between two instances of the Moshi model, measuring representational alignment via CKA and predicting turn boundaries with LSTM probes.

0 favorites 0 likes

#full-duplex

@rohanpaul_ai: Just a few days back, Thinking Machines Lab (TML), showcased a way of making AI interaction continuous instead of turn-…

X AI KOLs Following ↗ · 2026-05-17 Cached

Thinking Machines Lab and OpenBMB released MiniCPM-o 4.5, a 9B full-duplex omnimodal model with the Omni-Flow framework that enables continuous, time-aligned real-time video and voice interaction, surpassing previous models and available as open source.

0 favorites 0 likes

#full-duplex

@miramurati: Today we're sharing our work on interaction models. A new class of model trained from scratch to handle real-time inter…

X AI KOLs Following ↗ · 2026-05-11 Cached

Mira Murati's team showcased a preview of the new interaction model. Trained from scratch, it natively supports full-duplex real-time audio and video conversations, instant interruptions, multi-language translation, and dynamic multi-tasking. The demonstration verified its core capabilities in low-latency streaming interaction, multimodal perception, and concurrent task execution.

1 favorites 1 likes

#full-duplex

EchoChain: A Full-Duplex Benchmark for State-Update Reasoning Under Interruptions

arXiv cs.CL ↗ · 2026-04-21 Cached

EchoChain is a new benchmark for evaluating AI models' ability to revise in-progress responses when users interrupt mid-generation. The benchmark identifies three failure patterns (contextual inertia, interruption amnesia, objective displacement) and finds that across evaluated real-time voice models, no system exceeds 50% pass rate.

0 favorites 0 likes

#full-duplex

MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models

arXiv cs.CL ↗ · 2026-04-20 Cached

MoshiRAG combines a compact full-duplex speech language model with asynchronous retrieval-augmented generation to improve factuality while maintaining real-time interactivity. The approach leverages natural temporal gaps in conversation to retrieve external knowledge without disrupting the natural flow of dialogue.

0 favorites 0 likes

#full-duplex

MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

arXiv cs.CL ↗ · 2026-04-20 Cached

MTR-DuplexBench introduces a comprehensive benchmark for evaluating Full-Duplex Speech Language Models in multi-round conversations, addressing challenges like blurred turn boundaries and context inconsistency while assessing conversational features, dialogue quality, instruction following, and safety.

0 favorites 0 likes

full-duplex

Submit Feedback