full-duplex

#full-duplex

Instruct-FD: Can Your Full-Duplex Speech System Follow Turn-Taking Instructions?

arXiv cs.CL ↗ · yesterday Cached

Introduces Instruct-FD, a benchmark for evaluating whether full-duplex speech systems can follow explicit turn-taking instructions. Results show the best model achieves only 64.4% adherence, highlighting a significant gap in instruction-following turn management.

0 favorites 0 likes

#full-duplex

Agentic coding goes hands-free as OpenAI brings GPT-Live's full duplex voice control to Codex and ChatGPT on the desktop (5 minute read)

TLDR AI ↗ · 2d ago Cached

OpenAI integrates GPT-Live's full-duplex voice control into Codex and ChatGPT desktop app, enabling hands-free agentic coding with multi-threaded task execution.

0 favorites 0 likes

#full-duplex

A Reliability Assessment of LALM Audio Judges for Full-Duplex Voice Agents

arXiv cs.CL ↗ · 2026-07-10 Cached

This paper evaluates the reliability of Gemini models as audio judges for scoring full-duplex voice agent conversations, finding that Gemini 2.5 Flash shows strong agreement with human raters on most dimensions, though model swaps require re-validation.

0 favorites 0 likes

#full-duplex

GPT‑Live

Hacker News Top ↗ · 2026-07-08 Cached

OpenAI announces GPT-Live, a new full-duplex voice model that enables more natural, real-time conversations by allowing simultaneous listening and speaking, with GPT-5.5 as the backend model.

0 favorites 0 likes

#full-duplex

OpenAI releases new voice models for more natural live conversations

TechCrunch AI ↗ · 2026-07-08 Cached

OpenAI released new full-duplex voice models GPT-Live-1 and GPT-Live-1 mini for more natural live conversations, allowing simultaneous speaking and listening, with improvements in turn-taking and context handling, and replacing Advanced Voice Mode in ChatGPT.

0 favorites 0 likes

#full-duplex

Hierarchical Acoustic-Semantic Modeling: Modality Separation and Semantic Coherence for Full-Duplex SLMs

arXiv cs.CL ↗ · 2026-07-08 Cached

This paper introduces Lychee-FD, a native end-to-end full-duplex spoken language model that mitigates modality interference through a hierarchical parameter separation strategy, achieving significant improvements in speech intelligence and interaction fluidity.

0 favorites 0 likes

#full-duplex

TurnNat: Automatic Evaluation of Turn-Taking Naturalness in Dyadic Spoken Dialogue

arXiv cs.CL ↗ · 2026-07-03 Cached

TurnNat is a likelihood-based framework for automatically evaluating turn-taking naturalness in dyadic spoken dialogue, using a causal turn-taking prediction model trained on natural conversations to measure timing atypicality via negative log-likelihood.

0 favorites 0 likes

#full-duplex

BayLing-Duplex: Native Full-Duplex Speech Dialogue with a Single Autoregressive LLM

arXiv cs.CL ↗ · 2026-06-15 Cached

BayLing-Duplex is a native full-duplex speech language model that enables a single autoregressive LLM to manage turn-taking and interruptions without external VAD modules, achieving high success rates and improved response quality over prior models.

0 favorites 0 likes

#full-duplex

Overcoming State Inertia in Full-Duplex Spoken Language Models via Activation Steering

arXiv cs.CL ↗ · 2026-06-11 Cached

This paper identifies 'state inertia' in full-duplex spoken language models, where the model's internal predictive focus lags during user interruptions, and proposes a training-free activation steering method to improve interruption handling.

0 favorites 0 likes

#full-duplex

@kyutai_labs: New paper: Multi-Faceted Interactivity Alignment in Full-Duplex Speech Models We use RL to post-train speech models (Mo…

X AI KOLs Following ↗ · 2026-06-10 Cached

Kyutai Labs released a new paper on using reinforcement learning to post-train speech models (Moshi and PersonaPlex) for more human-like interaction, including when to respond, wait, or give listening cues.

0 favorites 0 likes

#full-duplex

Full duplex vs half duplex - the spectrum of AI voice models [D]

Reddit r/MachineLearning ↗ · 2026-06-01

An analysis of half-duplex vs full-duplex architecture in AI voice models, discussing key features like overlap, backchannels, and barge-in that make voice agents sound robotic.

0 favorites 0 likes

#full-duplex

Raon-Speech Technical Report

arXiv cs.CL ↗ · 2026-05-26 Cached

Raon-Speech is a 9B-parameter speech language model for English and Korean, supporting understanding, answering, and generation, with a full-duplex extension Raon-SpeechChat for natural real-time conversation. It achieves strong performance across 42 benchmarks and is fully open-sourced.

0 favorites 0 likes

#full-duplex

Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models

arXiv cs.CL ↗ · 2026-05-21 Cached

This paper analyzes synchronization and turn-taking dynamics in full-duplex speech dialogue models by simulating conversations between two instances of the Moshi model, measuring representational alignment via CKA and predicting turn boundaries with LSTM probes.

0 favorites 0 likes

#full-duplex

@rohanpaul_ai: Just a few days back, Thinking Machines Lab (TML), showcased a way of making AI interaction continuous instead of turn-…

X AI KOLs Following ↗ · 2026-05-17 Cached

Thinking Machines Lab and OpenBMB released MiniCPM-o 4.5, a 9B full-duplex omnimodal model with the Omni-Flow framework that enables continuous, time-aligned real-time video and voice interaction, surpassing previous models and available as open source.

0 favorites 0 likes

#full-duplex

@miramurati: Today we're sharing our work on interaction models. A new class of model trained from scratch to handle real-time inter…

X AI KOLs Following ↗ · 2026-05-11 Cached

Mira Murati's team showcased a preview of the new interaction model. Trained from scratch, it natively supports full-duplex real-time audio and video conversations, instant interruptions, multi-language translation, and dynamic multi-tasking. The demonstration verified its core capabilities in low-latency streaming interaction, multimodal perception, and concurrent task execution.

1 favorites 1 likes

#full-duplex

EchoChain: A Full-Duplex Benchmark for State-Update Reasoning Under Interruptions

arXiv cs.CL ↗ · 2026-04-21 Cached

EchoChain is a new benchmark for evaluating AI models' ability to revise in-progress responses when users interrupt mid-generation. The benchmark identifies three failure patterns (contextual inertia, interruption amnesia, objective displacement) and finds that across evaluated real-time voice models, no system exceeds 50% pass rate.

0 favorites 0 likes

#full-duplex

MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models

arXiv cs.CL ↗ · 2026-04-20 Cached

MoshiRAG combines a compact full-duplex speech language model with asynchronous retrieval-augmented generation to improve factuality while maintaining real-time interactivity. The approach leverages natural temporal gaps in conversation to retrieve external knowledge without disrupting the natural flow of dialogue.

0 favorites 0 likes

#full-duplex

MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

arXiv cs.CL ↗ · 2026-04-20 Cached

MTR-DuplexBench introduces a comprehensive benchmark for evaluating Full-Duplex Speech Language Models in multi-round conversations, addressing challenges like blurred turn boundaries and context inconsistency while assessing conversational features, dialogue quality, instruction following, and safety.

0 favorites 0 likes

#full-duplex

This is the new ChatGPT Voice, powered by GPT-Live

YouTube AI Channels ↗ · 2026-07-09 Cached

OpenAI has released the new generation ChatGPT voice feature powered by GPT-Live, enabling full-duplex real-time conversations, supporting interruption, intelligent reasoning, web search, and real-time translation, making interactions more natural and intelligent.

0 favorites 0 likes

full-duplex

Submit Feedback