speech-to-speech

#speech-to-speech

@Thom_Wolf: Most people should probably update their priors on the state of open-source speech-to-speech. It's honestly kind of min…

X AI KOLs Following ↗ · yesterday Cached

Thom Wolf and Cerebras released a fully open-source realtime voice demo with models and code, showcasing state-of-the-art speech-to-speech capabilities.

0 favorites 0 likes

#speech-to-speech

Reference-Based Prosody and Rhythm Evaluation for Spoken Dialogue Systems

arXiv cs.CL ↗ · 3d ago Cached

This paper proposes a reference-based evaluation protocol for assessing prosody and rhythm in speech-to-speech AI systems, using matched human conversation data to provide interpretable behavioral plausibility checks.

0 favorites 0 likes

#speech-to-speech

Hugging Face and Cerebras bring Gemma 4 to real-time voice AI

Hugging Face Blog ↗ · 3d ago Cached

Hugging Face and Cerebras demonstrate a real-time speech-to-speech pipeline combining open-source models (Nvidia's Parakeet, Gemma 4, Qwen3TTS) with Cerebras' fast inference, enabling natural conversational AI and powering robots like Reachy Mini.

0 favorites 0 likes

#speech-to-speech

Dziri Voicebot: An End-to-End Low-Resource Speech-to-Speech Conversational System for Algerian Dialect

arXiv cs.CL ↗ · 2026-06-25 Cached

This paper presents a modular end-to-end speech-to-speech conversational system for the low-resource Algerian Dialect, integrating ASR, NLU, RAG, and TTS with dedicated datasets and fine-tuned models.

0 favorites 0 likes

#speech-to-speech

Gemma 4 12B native encoder free voice input utilization suggest?

Reddit r/LocalLLaMA ↗ · 2026-06-14

Discusses leveraging Gemma 4 12B's encoder-free architecture for native voice input, seeking out-of-the-box solutions for low-latency streaming audio ingestion.

0 favorites 0 likes

#speech-to-speech

Gemini 3.5 Live Translate

Product Hunt ↗ · 2026-06-09

Gemini 3.5 Live Translate is a new audio model for real-time speech-to-speech translation.

0 favorites 0 likes

#speech-to-speech

@GoogleDeepMind: 3.5 Live Translate can convert speech into over 70 languages and processes it as it’s streamed - while keeping tone, pa…

X AI KOLs ↗ · 2026-06-09 Cached

Google DeepMind announces Live Translate, a feature that converts speech into over 70 languages in real-time while preserving tone, pace, and pitch for more natural conversations.

0 favorites 0 likes

#speech-to-speech

Fluid, natural voice translation with Gemini 3.5 Live Translate

Google DeepMind Blog ↗ · 2026-06-09 Cached

Google releases Gemini 3.5 Live Translate, an audio model for near real-time speech-to-speech translation in over 70 languages, preserving speaker intonation and pacing. It is rolling out across Google products including the Gemini Live API, Google Meet, and Google Translate.

0 favorites 0 likes

#speech-to-speech

Benchmarking Speech-to-Speech Translation Models

arXiv cs.CL ↗ · 2026-06-03 Cached

COMPASS is a unified benchmarking framework for speech-to-speech translation (S2ST) that integrates 46 metrics across eight dimensions, evaluated on 1,248 model-language configurations. It identifies complementary architecture strengths and proposes reduced metric subsets that preserve rankings while cutting evaluation time.

0 favorites 0 likes

#speech-to-speech

@gdb: OpenAI for realtime translation — speak in any of 70+ input languages and translate into 13 output ones:

X AI KOLs Following ↗ · 2026-05-29 Cached

OpenAI released a new specialized model, gpt-realtime-translate, that takes speech audio from over 70 input languages and outputs speech in 13 target languages for real-time translation.

0 favorites 0 likes

#speech-to-speech

OpenSTBench: Beyond Semantic Evaluation for Speech Translation

Hugging Face Daily Papers ↗ · 2026-05-29

OpenSTBench is a unified multidimensional evaluation framework for speech translation systems that jointly assesses translation quality, speech quality, speaker preservation, emotion fidelity, and latency across both S2TT and S2ST systems in offline and streaming settings. The framework addresses the gap left by fragmented evaluation protocols and provides a reproducible benchmark for comparing heterogeneous speech translation systems.

0 favorites 0 likes

#speech-to-speech

Build a Realtime Speech Translation (28 minute read)

TLDR AI ↗ · 2026-05-11 Cached

OpenAI releases gpt-realtime-translate, a low-latency speech-to-speech model optimized for live interpretation, accompanied by a developer cookbook for building multilingual browser, phone, and video applications.

0 favorites 0 likes

#speech-to-speech

@paulabartabajo_: Advice for AI engineers If you're building voice agents, stop wiring up 3 separate models, for audio-to-text, text-to-a…

X AI KOLs Timeline ↗ · 2026-05-08 Cached

Announces liquid-audio, an open-source repository for Liquid AI's end-to-end speech-to-speech LFM models (LFM2-Audio-1.5B and LFM2.5-Audio-1.5B) with interleaved and sequential generation modes and fine-tuning support.

0 favorites 0 likes

#speech-to-speech

@kwindla: OpenAI shipped a new speech-to-speech model today: gpt-realtime-2 This is the first speech-to-speech model good enough …

X AI KOLs Following ↗ · 2026-05-07

OpenAI has released gpt-realtime-2, a new speech-to-speech model optimized for real-time voice agent interactions with low-latency tool calling.

0 favorites 0 likes

#speech-to-speech

Introducing gpt-realtime and Realtime API updates

OpenAI Blog ↗ · 2025-08-28 Cached

OpenAI is making the Realtime API generally available with a new advanced speech-to-speech model called gpt-realtime, featuring improved instruction following, tool calling, and natural speech quality. New capabilities include MCP server support, image inputs, SIP phone calling, and two new voices (Cedar and Marin).

0 favorites 0 likes

speech-to-speech

Submit Feedback