speech-language-models

#speech-language-models

ParaBridge: Bridging Paralinguistic Perception and Dialogue Behavior in Speech Language Models

arXiv cs.CL ↗ · 2026-06-10 Cached

ParaBridge is an on-policy self-distillation method that bridges the gap between paralinguistic perception and dialogue behavior in speech language models, significantly improving safety and empathy without external rewards.

0 favorites 0 likes

#speech-language-models

Do Factual Recall Mechanisms Carry over from Text to Speech in Multimodal Language Models?

arXiv cs.CL ↗ · 2026-05-22 Cached

This paper investigates whether factual recall mechanisms learned in text-based language models transfer to speech modalities in multimodal speech-language models. Using causal mediation analysis on SpiritLM, it finds that the mechanisms are only partially carried over, highlighting differences between text and speech processing.

0 favorites 0 likes

#speech-language-models

@wsl8297: There is a repository on GitHub that clearly organizes the research lineage of Speech Language Models (SpeechLM): Awesome-SpeechLM-Survey. It comprehensively organizes classification frameworks, representative models, training datasets, and evaluation benchmarks into a single 'knowledge map,' making it time-efficient to look up materials, fill in background knowledge, and find benchmarks.

X AI KOLs Timeline ↗ · 2026-05-14 Cached

The Awesome-SpeechLM-Survey repository on GitHub systematically organizes the research lineage of speech language models, including classification frameworks, representative models, training datasets, and evaluation benchmarks. It serves as a knowledge map for understanding the field.

0 favorites 0 likes

#speech-language-models

MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models

arXiv cs.CL ↗ · 2026-04-20 Cached

MoshiRAG combines a compact full-duplex speech language model with asynchronous retrieval-augmented generation to improve factuality while maintaining real-time interactivity. The approach leverages natural temporal gaps in conversation to retrieve external knowledge without disrupting the natural flow of dialogue.

0 favorites 0 likes

#speech-language-models

MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

arXiv cs.CL ↗ · 2026-04-20 Cached

MTR-DuplexBench introduces a comprehensive benchmark for evaluating Full-Duplex Speech Language Models in multi-round conversations, addressing challenges like blurred turn boundaries and context inconsistency while assessing conversational features, dialogue quality, instruction following, and safety.

0 favorites 0 likes

speech-language-models

ParaBridge: Bridging Paralinguistic Perception and Dialogue Behavior in Speech Language Models

Do Factual Recall Mechanisms Carry over from Text to Speech in Multimodal Language Models?

MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models

MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

Submit Feedback