@livekit: We built a live multilingual, multi-person video call with Gemini 3.5 Live Translate on LiveKit. Everyone picks their l…
Summary
LiveKit built a live multilingual video call using Gemini 3.5 Live Translate, allowing participants to speak in their own language and hear translations in real time. The open source code is available on GitHub.
View Cached Full Text
Cached at: 06/10/26, 09:47 AM
We built a live multilingual, multi-person video call with Gemini 3.5 Live Translate on LiveKit. Everyone picks their language, speaks naturally, and hears each other in real time in their language of choice.
Watch the demo and check out the open source repo: https://github.com/livekit-examples/gemini-live-translate…
livekit-examples/gemini-live-translate
Source: https://github.com/livekit-examples/gemini-live-translate
Live Translate
Multi-language video calls. Everyone picks their language. Translation spins up on demand.
Powered by LiveKit Agents (Python worker) and the Gemini Live API.
What it does
Anyone with the link joins as a peer. Each participant picks one language — that’s what they speak and what they want to hear everyone else in. When someone speaks, a Gemini Live session translates their audio into every other distinct language present in the room, on demand. Same-language pairs hear each other natively, no Gemini cost.
- 8-person rooms by default (configurable)
- 16 supported languages plus “None — native passthrough”
- Camera + mic default off; toggle on when you’re ready
- Captions sidebar (per listener, in their chosen language) with auto-scroll transcripts
- LiveKit Cloud Agents-ready: deploy the Python worker, the frontend dispatches it via room config on token mint
How it works
flowchart LR
Alice(["Alice<br/>EN"])
Bob(["Bob<br/>ES"])
Agent["<b>Translator agent</b><br/>Python worker<br/>one per LiveKit room"]
Alice -- mic --> Agent
Bob -- mic --> Agent
Agent -- "tx:bob:en" --> Alice
Agent -- "tx:alice:es" --> Bob
Each participant’s chosen language lives in their LiveKit attributes.lang. The agent watches participantAttributesChanged and reconciles a map of (speaker, target_lang) sessions — one Gemini Live session per pair, skipping pairs where source == target.
For each active pair the agent publishes two things into the room:
- an audio track named
tx:<speaker>:<target_lang>carrying the translated speech - a
lk.translationtext-stream carrying the matching captions, tagged withtarget_lang
The frontend subscribes to either the native mic or the matching tx:* track for each peer, based on the same (listener_lang, speaker_lang) predicate.
Quick start
You need:
- Node.js 20+, pnpm (or run
corepack enableand let the repo’spackageManagerfield pin it) - Python 3.11+, uv
- A LiveKit Cloud project (free tier works)
- A Gemini API key
# 1. Install deps and seed env files
pnpm run setup
# 2. Fill in credentials in .env.local and translator/.env.local
# LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET (both files)
# GEMINI_API_KEY (translator/.env.local only)
# 3. Run frontend + agent worker together
pnpm run dev
Open http://localhost:3000, click Create session, share the URL with another browser, pick different languages, unmute.
Repo layout
gemini-live-translate-livekit/
├── src/ # Next.js 16 frontend
│ ├── app/
│ │ ├── page.tsx # Landing
│ │ ├── api/token/route.ts # Mints token + dispatches translator agent
│ │ └── session/[id]/
│ │ ├── page.tsx # Pre-flight (name + language)
│ │ └── room/ # In-call UI
│ │ ├── RoomClient.tsx
│ │ ├── InCall.tsx
│ │ ├── VideoGrid.tsx + ParticipantTile, SelfView
│ │ ├── ControlBar.tsx + LanguagePill
│ │ ├── CaptionsSidebar.tsx
│ │ └── useTranslationRouting.ts
│ └── lib/
│ ├── languages.ts # 16 languages + "none" sentinel
│ └── config.ts # Caps, attribute keys
└── translator/ # Python LiveKit Agents worker
├── src/
│ ├── agent.py # @server.rtc_session(agent_name="gemini-translator")
│ ├── router.py # TranslationRouter (reconcile loop)
│ ├── session.py # GeminiSession (one per speaker→target pair)
│ ├── audio.py # PCM glue
│ └── config.py # Model id, debounce, grace, etc.
├── tests/test_router.py # Demand-set computation
├── pyproject.toml
├── Dockerfile # For LiveKit Cloud Agents deploy
└── livekit.toml
Deploy
Agent — to LiveKit Cloud Agents:
cd translator
lk agent create --secrets-file .env.local . # first time
lk agent deploy # subsequent deploys
Frontend — anywhere that runs Next.js. The repo includes a Dockerfile for container deploys (Cloud Run, Fly.io, Render, etc.). For Vercel, no special config needed since the only API route is /api/token and it’s stateless.
Set on the frontend host:
LIVEKIT_URL,LIVEKIT_API_KEY,LIVEKIT_API_SECRET
Set on the agent host:
LIVEKIT_URL,LIVEKIT_API_KEY,LIVEKIT_API_SECRET,GEMINI_API_KEY
Configuration
Caps in src/lib/config.ts and translator/src/config.py — adjust together:
| Setting | Default | Where |
|---|---|---|
| Max participants per room | 8 | MAX_PARTICIPANTS (token route) |
| Session TTL | 4h | token route ttl |
| Empty-room timeout | 60s | token route |
| Session grace on mute | 10s | SESSION_GRACE_SEC (agent) |
| Reconcile debounce | 250ms | RECONCILE_DEBOUNCE_SEC (agent) |
| Gemini model | gemini-3.5-live-translate-preview | GEMINI_MODEL (agent) |
Tech stack
- Frontend — Next.js 16 (Turbopack), React 19,
@livekit/components-react,livekit-client - Token mint —
livekit-server-sdk(RoomAgentDispatch+RoomConfiguration) - Agent runtime —
livekit-agents1.5 withAgentServer.rtc_session() - Translation — Gemini Live API (raw v1beta
BidiGenerateContentWebSocket withtranslationConfig) - Audio I/O —
livekit.rtc.AudioStream(16 kHz mono in) +AudioSource(24 kHz mono out) - Typography — Instrument Serif (display), DM Sans (body), DM Mono (status)
- Package management —
pnpm+uv
License
MIT
Google AI Developers (@googleaidevs): Our latest audio model, Gemini 3.5 Live Translate, takes real-time speech translation to the next level for developers by delivering low-latency translation across 70+ languages.
By processing speech as it streams in near real time, the model enables devs to build low-latency
Similar Articles
Gemini 3.5 Live Translate
Gemini 3.5 Live Translate is a new audio model for real-time speech-to-speech translation.
Fluid, natural voice translation with Gemini 3.5 Live Translate
Google releases Gemini 3.5 Live Translate, an audio model for near real-time speech-to-speech translation in over 70 languages, preserving speaker intonation and pacing. It is rolling out across Google products including the Gemini Live API, Google Meet, and Google Translate.
Google announces Gemini 3.5 Live Translate for instant voice-to-voice translation
Google announces Gemini 3.5 Live Translate, a speech-to-speech model that provides instant voice translation in over 70 languages, rolling out across Google ecosystem.
@_philschmid: Docs: http://ai.google.dev/gemini-api/docs/live-api/live-translate… GitHub: http://github.com/google-gemini/gemini-live…
Google launches Gemini Live API for real-time translation, with documentation, GitHub examples, and a blog post.
@interjc: Gemini 3.5 Live Translate enables near real-time voice translation. If this is made into a wearable device, the voice language barrier would be broken.
Gemini 3.5 Live Translate provides near real-time voice translation. The author believes that if integrated into wearable devices, it would break the voice language barrier.