@_philschmid: Build a realtime translation app with the new Gemini Live Translate, Next.js, LiveKit and Cloud Run. What it covers: 1.…

X AI KOLs Following Products

Summary

A tutorial on building a realtime translation app using Gemini Live Translate, Next.js, LiveKit, and Cloud Run, covering audio streaming, translation, and deployment.

Build a realtime translation app with the new Gemini Live Translate, Next.js, LiveKit and Cloud Run. What it covers: 1. Stream host audio via WebRTC to a LiveKit Room 2. Pipe PCM frames to Gemini Live for on-the-fly translation 3. Publish translated audio back as separate language tracks 4. Optimize latency with 100ms frame chunking (50Hz → 10Hz) 5. Deploy to Cloud Run with Secret Manager and auto-scaling Links below
Original Article
View Cached Full Text

Cached at: 06/18/26, 06:10 PM

Build a realtime translation app with the new Gemini Live Translate, Next.js, LiveKit and Cloud Run. What it covers:

  1. Stream host audio via WebRTC to a LiveKit Room
  2. Pipe PCM frames to Gemini Live for on-the-fly translation
  3. Publish translated audio back as separate language tracks
  4. Optimize latency with 100ms frame chunking (50Hz → 10Hz)
  5. Deploy to Cloud Run with Secret Manager and auto-scaling

Links below

Blog: https://dev.to/googleai/build-a-realtime-translation-app-with-gemini-live-api-livekit-google-cloud-run-5474…

Code: https://github.com/google-gemini/gemini-live-api-examples/tree/main/gemini-live-translate-livekit…

Similar Articles

Gemini 3.5 Live Translate

Product Hunt

Gemini 3.5 Live Translate is a new audio model for real-time speech-to-speech translation.

Build a Realtime Speech Translation (28 minute read)

TLDR AI

OpenAI releases gpt-realtime-translate, a low-latency speech-to-speech model optimized for live interpretation, accompanied by a developer cookbook for building multilingual browser, phone, and video applications.

Fluid, natural voice translation with Gemini 3.5 Live Translate

Google DeepMind Blog

Google releases Gemini 3.5 Live Translate, an audio model for near real-time speech-to-speech translation in over 70 languages, preserving speaker intonation and pacing. It is rolling out across Google products including the Gemini Live API, Google Meet, and Google Translate.