turn-taking

#turn-taking

Evaluating Large Language Models Abilities for Addressee, Turn-change, and Next Speaker Prediction in Meetings

arXiv cs.CL ↗ · 18h ago Cached

This paper evaluates the abilities of large language models (LLMs) and multimodal LLMs for addressee detection, turn-change prediction, and next speaker prediction in multi-party meeting conversations. Results show text-based LLMs outperform supervised models and humans in next speaker prediction, while multimodal LLMs improve over text-only models in other tasks but remain below human performance.

0 favorites 0 likes

#turn-taking

BayLing-Duplex: Native Full-Duplex Speech Dialogue with a Single Autoregressive LLM

arXiv cs.CL ↗ · 2d ago Cached

BayLing-Duplex is a native full-duplex speech language model that enables a single autoregressive LLM to manage turn-taking and interruptions without external VAD modules, achieving high success rates and improved response quality over prior models.

0 favorites 0 likes

#turn-taking

Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models

arXiv cs.CL ↗ · 2026-05-21 Cached

This paper analyzes synchronization and turn-taking dynamics in full-duplex speech dialogue models by simulating conversations between two instances of the Moshi model, measuring representational alignment via CKA and predicting turn boundaries with LSTM probes.

0 favorites 0 likes

#turn-taking

When2Speak: A Dataset for Temporal Participation and Turn-Taking in Multi-Party Conversations for Large Language Models

arXiv cs.CL ↗ · 2026-05-08 Cached

When2Speak is a synthetic dataset and pipeline for training LLMs to decide when to speak in multi-party conversations. Fine-tuning on this dataset significantly improves turn-taking, with reinforcement learning reducing missed interventions from 50% to ~20%.

0 favorites 0 likes

turn-taking

Evaluating Large Language Models Abilities for Addressee, Turn-change, and Next Speaker Prediction in Meetings

BayLing-Duplex: Native Full-Duplex Speech Dialogue with a Single Autoregressive LLM

Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models

When2Speak: A Dataset for Temporal Participation and Turn-Taking in Multi-Party Conversations for Large Language Models

Submit Feedback