@livekit: We built a live multilingual, multi-person video call with Gemini 3.5 Live Translate on LiveKit. Everyone picks their l…

X AI KOLs Following Tools

Summary

LiveKit built a live multilingual video call using Gemini 3.5 Live Translate, allowing participants to speak in their own language and hear translations in real time. The open source code is available on GitHub.

We built a live multilingual, multi-person video call with Gemini 3.5 Live Translate on LiveKit. Everyone picks their language, speaks naturally, and hears each other in real time in their language of choice. Watch the demo and check out the open source repo: https://github.com/livekit-examples/gemini-live-translate…
Original Article
View Cached Full Text

Cached at: 06/10/26, 09:47 AM

We built a live multilingual, multi-person video call with Gemini 3.5 Live Translate on LiveKit. Everyone picks their language, speaks naturally, and hears each other in real time in their language of choice.

Watch the demo and check out the open source repo: https://github.com/livekit-examples/gemini-live-translate…


livekit-examples/gemini-live-translate

Source: https://github.com/livekit-examples/gemini-live-translate

Live Translate

Multi-language video calls. Everyone picks their language. Translation spins up on demand.

Powered by LiveKit Agents (Python worker) and the Gemini Live API.

architecture agent web


What it does

Anyone with the link joins as a peer. Each participant picks one language — that’s what they speak and what they want to hear everyone else in. When someone speaks, a Gemini Live session translates their audio into every other distinct language present in the room, on demand. Same-language pairs hear each other natively, no Gemini cost.

  • 8-person rooms by default (configurable)
  • 16 supported languages plus “None — native passthrough”
  • Camera + mic default off; toggle on when you’re ready
  • Captions sidebar (per listener, in their chosen language) with auto-scroll transcripts
  • LiveKit Cloud Agents-ready: deploy the Python worker, the frontend dispatches it via room config on token mint

How it works

flowchart LR
    Alice(["Alice<br/>EN"])
    Bob(["Bob<br/>ES"])
    Agent["<b>Translator agent</b><br/>Python worker<br/>one per LiveKit room"]

    Alice -- mic --> Agent
    Bob -- mic --> Agent
    Agent -- "tx:bob:en" --> Alice
    Agent -- "tx:alice:es" --> Bob

Each participant’s chosen language lives in their LiveKit attributes.lang. The agent watches participantAttributesChanged and reconciles a map of (speaker, target_lang) sessions — one Gemini Live session per pair, skipping pairs where source == target.

For each active pair the agent publishes two things into the room:

  • an audio track named tx:<speaker>:<target_lang> carrying the translated speech
  • a lk.translation text-stream carrying the matching captions, tagged with target_lang

The frontend subscribes to either the native mic or the matching tx:* track for each peer, based on the same (listener_lang, speaker_lang) predicate.

Quick start

You need:

  • Node.js 20+, pnpm (or run corepack enable and let the repo’s packageManager field pin it)
  • Python 3.11+, uv
  • A LiveKit Cloud project (free tier works)
  • A Gemini API key
# 1. Install deps and seed env files
pnpm run setup

# 2. Fill in credentials in .env.local and translator/.env.local
#    LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET (both files)
#    GEMINI_API_KEY (translator/.env.local only)

# 3. Run frontend + agent worker together
pnpm run dev

Open http://localhost:3000, click Create session, share the URL with another browser, pick different languages, unmute.

Repo layout

gemini-live-translate-livekit/
├── src/                                # Next.js 16 frontend
│   ├── app/
│   │   ├── page.tsx                    # Landing
│   │   ├── api/token/route.ts          # Mints token + dispatches translator agent
│   │   └── session/[id]/
│   │       ├── page.tsx                # Pre-flight (name + language)
│   │       └── room/                   # In-call UI
│   │           ├── RoomClient.tsx
│   │           ├── InCall.tsx
│   │           ├── VideoGrid.tsx       + ParticipantTile, SelfView
│   │           ├── ControlBar.tsx      + LanguagePill
│   │           ├── CaptionsSidebar.tsx
│   │           └── useTranslationRouting.ts
│   └── lib/
│       ├── languages.ts                # 16 languages + "none" sentinel
│       └── config.ts                   # Caps, attribute keys
└── translator/                         # Python LiveKit Agents worker
    ├── src/
    │   ├── agent.py                    # @server.rtc_session(agent_name="gemini-translator")
    │   ├── router.py                   # TranslationRouter (reconcile loop)
    │   ├── session.py                  # GeminiSession (one per speaker→target pair)
    │   ├── audio.py                    # PCM glue
    │   └── config.py                   # Model id, debounce, grace, etc.
    ├── tests/test_router.py            # Demand-set computation
    ├── pyproject.toml
    ├── Dockerfile                      # For LiveKit Cloud Agents deploy
    └── livekit.toml

Deploy

Agent — to LiveKit Cloud Agents:

cd translator
lk agent create --secrets-file .env.local .   # first time
lk agent deploy                               # subsequent deploys

Frontend — anywhere that runs Next.js. The repo includes a Dockerfile for container deploys (Cloud Run, Fly.io, Render, etc.). For Vercel, no special config needed since the only API route is /api/token and it’s stateless.

Set on the frontend host:

  • LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET

Set on the agent host:

  • LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET, GEMINI_API_KEY

Configuration

Caps in src/lib/config.ts and translator/src/config.py — adjust together:

SettingDefaultWhere
Max participants per room8MAX_PARTICIPANTS (token route)
Session TTL4htoken route ttl
Empty-room timeout60stoken route
Session grace on mute10sSESSION_GRACE_SEC (agent)
Reconcile debounce250msRECONCILE_DEBOUNCE_SEC (agent)
Gemini modelgemini-3.5-live-translate-previewGEMINI_MODEL (agent)

Tech stack

  • Frontend — Next.js 16 (Turbopack), React 19, @livekit/components-react, livekit-client
  • Token mintlivekit-server-sdk (RoomAgentDispatch + RoomConfiguration)
  • Agent runtimelivekit-agents 1.5 with AgentServer.rtc_session()
  • Translation — Gemini Live API (raw v1beta BidiGenerateContent WebSocket with translationConfig)
  • Audio I/Olivekit.rtc.AudioStream (16 kHz mono in) + AudioSource (24 kHz mono out)
  • Typography — Instrument Serif (display), DM Sans (body), DM Mono (status)
  • Package managementpnpm + uv

License

MIT

Google AI Developers (@googleaidevs): Our latest audio model, Gemini 3.5 Live Translate, takes real-time speech translation to the next level for developers by delivering low-latency translation across 70+ languages.

By processing speech as it streams in near real time, the model enables devs to build low-latency

Similar Articles

Gemini 3.5 Live Translate

Product Hunt

Gemini 3.5 Live Translate is a new audio model for real-time speech-to-speech translation.

Fluid, natural voice translation with Gemini 3.5 Live Translate

Google DeepMind Blog

Google releases Gemini 3.5 Live Translate, an audio model for near real-time speech-to-speech translation in over 70 languages, preserving speaker intonation and pacing. It is rolling out across Google products including the Gemini Live API, Google Meet, and Google Translate.