@berryxia: 兄弟们，这个可以啊！赶紧装起来！ Kevin Lin，牛津大学博士后，前Meta和Microsoft研究员，刚刚把Violin这个开源视频翻译Skill放了出来。视频已经是互联网绝对主流的内容形式。可绝大多数高质量讲座、演讲、播客却被…

X AI KOLs Timeline 2026/05/15 01:09 工具

video-translation open-source mit-license developer-tool asr llm tts

摘要

Violin是一个开源视频翻译工具，集成了语音识别、大语言模型翻译和语音合成功能，支持30多种语言，提供CLI、Web应用和Claude Code三种使用方式。

兄弟们，这个可以啊！赶紧装起来！ Kevin Lin，牛津大学博士后，前Meta和Microsoft研究员，刚刚把Violin这个开源视频翻译Skill放了出来。视频已经是互联网绝对主流的内容形式。可绝大多数高质量讲座、演讲、播客却被单一语言死死锁住，全球观众根本触达不到。 Violin把ASR、LLM翻译、TTS三者无缝串成一条流水线。「输入一段视频，它就能自动完成语音识别、多语言翻译、自然语音合成。」最实用的是两个功能：你可以个性化翻译风格，把学术报告改成孩子也能听懂的版本；还能直接和视频聊天，任何问题都基于视频内容给出答案。它同时支持Web应用、CLI命令行和Agent Skill，全部MIT开源。以后高质量内容不再只属于某一种语言，而是真正走向全球。 Demo、博客和GitHub都在原帖。如果你在做内容、教育、跨语言传播，或者正在开发多模态Agent，这套Skill值得立刻去试。你觉得AI下一步最该解决的，是内容创作，还是内容全球化？项目地址：https://github.com/shang-zhu/violin…

查看原文

查看缓存全文

缓存时间: 2026/05/15 04:56

shang-zhu/violin

Source: https://github.com/shang-zhu/violin

🎻 Violin

Open-source Video Translation Skill.

🌐 Live demo · 📝 Blog post · 📜 MIT License

Upload a video. Violin transcribes the speech, translates it, synthesizes a native-sounding voice-over in the target language, and remuxes it back into the video — fully aligned, with optional SRT subtitles.

Available as a CLI, a FastAPI web app, and a Claude Code skill.

✨ Features

33 target languages with handpicked native-speaker voices for the 16 most-used ones (Cartesia Sonic 3 + ElevenLabs)
In-video Q&A — ask questions about any moment in the dubbed video; answers use nearby subtitles plus sampled frames
Natural-language voice picker — describe the voice you want, an LLM picks from the catalog
6 style profiles (experimental) — standard / kids / academic / casual / storyteller / news
Pluggable stack — Together / OpenAI / ElevenLabs interchangeable for every stage, one YAML

🚀 Quick start

Try it without installing anything

The live demo runs at https://www.violin-ai.com — drop a short clip in, get a dubbed video out in a few minutes.

Run locally

Requires Python 3.10+ and ffmpeg on PATH.

curl -LsSf https://astral.sh/uv/install.sh | sh   # install uv if you don't have it
uv tool install violin                            # recommended — faster, isolated
# or: pip install violin                          # if you'd rather install into your current Python env

export TOGETHER_API_KEY=...                       # get one at https://api.together.ai (add to ~/.zshrc to persist)

Three ways to use it:

1. CLI — translate one file:

violin lecture.mp4 lecture_zh.mp4 --language Chinese

2. Web app — full REST API + browser UI:

violin-api
# → http://127.0.0.1:8000           (browser UI)
# → http://127.0.0.1:8000/docs      (interactive API docs)

3. Claude Code skill — invoke from any Claude Code session:

violin --install-skill          # one-time: copies the skill into ~/.claude/skills/
claude
> please use the violin skill to translate path/to/video.mp4 into Chinese

Run from source (for hacking on the pipeline)

git clone https://github.com/shang-zhu/violin.git
cd violin
uv sync
cp .env.example .env             # then fill in TOGETHER_API_KEY
uv run main.py lecture.mp4 lecture_zh.mp4 --language Chinese

To use the violin / violin-api commands globally while edits to your local source reflect immediately, install editable:

uv tool uninstall violin     # if you've installed the PyPI version
uv tool install --editable .

After this, violin / violin-api run from your local checkout — edit any file and the next invocation picks it up; no rebuild needed. To switch back to PyPI: uv tool uninstall violin && uv tool install violin.

🎬 How Violin works

Video
  │
  ├─ ffmpeg ─────────────────────► Extract audio (16 kHz WAV)
  │
  ├─ Whisper Large v3 ────────────► Word-level timestamps → sentence segments
  │
  ├─ LLM (DeepSeek V4 Pro by default) ──► Translate each segment, respecting style profile
  │
  ├─ TTS (Cartesia Sonic 3 by default) ─► Synthesize dubbed audio per segment
  │
  └─ ffmpeg ─────────────────────► Speed-align video to dubbed audio,
                                    concat with freeze-frame fallback,
                                    single-pass AAC encode the audio track,
                                    write output mp4 + optional SRT

⚙️ Configuration

Override any default by writing your own YAML and passing it with --config my.yaml — only the keys you want to change need to appear; values deep-merge with the built-in defaults.

Switch providers

# config/default.yaml — pick the stack you want
models:
  transcription:
    provider: together                  # together | openai
    model: openai/whisper-large-v3      # together → openai/whisper-large-v3 | openai → whisper-1
  translation:
    provider: together                  # together | openai
    model: deepseek-ai/DeepSeek-V4-Pro  # together → deepseek-ai/DeepSeek-V4-Pro | openai → gpt-5.5
  tts:
    provider: together                  # together | elevenlabs | openai
    model: cartesia/sonic-3             # together → cartesia/sonic-3 | elevenlabs → eleven_v3 | openai → tts-1-hd

Production overrides

A starter config/prod.yaml is included for public deployments. It adds upload limits, serializes jobs, and caps ffmpeg concurrency. The included Dockerfile + docker-compose.yml + Caddyfile are how the live demo is hosted — docker compose up -d --build after filling .env is enough to put a copy of Violin behind auto-HTTPS on any Docker host.

Environment variables

Variable	When required	Description
`TOGETHER_API_KEY`	Recommended — covers every stage with the default config	Together AI API key
`OPENAI_API_KEY`	Any stage uses `provider: openai`	Covers `whisper-1`, GPT models, and `tts-1`
`ELEVENLABS_API_KEY`	TTS uses `provider: elevenlabs`	ElevenLabs API key
`CORS_ORIGINS`	Optional	Comma-separated allowed origins (default: `*`)

You only need keys for the providers you actually pick. Pure-OpenAI deployments (all stages on openai) work too — OPENAI_API_KEY alone is enough. Same idea for ElevenLabs.

🎭 Style profiles

Six built-in profiles tune both the translation LLM prompt and the TTS delivery. Use --style <name> on the CLI or pass style in API requests.

Style	Tone	TTS speed	Emotion
`standard`	Faithful translation, natural voice	1.0×	—
`kids`	Rewritten for a 7-year-old, plain language	1.0×	excited
`academic`	Formal register, preserves jargon and honorifics	0.95×	calm
`casual`	Spoken slang, contractions, friendly	1.1×	content
`storyteller`	Vivid, dramatic narration	0.9×	enthusiastic
`news`	Concise, declarative, broadcast-style	1.0×	neutral

Add your own by editing prompts/styles.yaml.

See all available styles: violin --style list.

💻 CLI usage

Examples use the PyPI-installed violin command. If you’re running from a git checkout, substitute uv run main.py for violin (and uv run run_api.py for violin-api).

# Basic
violin lecture.mp4 lecture_es.mp4 --language Spanish

# Pick a style
violin talk.mp4 talk_zh.mp4 --language Chinese --style kids

# Pick a specific voice
violin lecture.mp4 lecture_fr.mp4 --language French --voice "french narrator man"

# Skip SRT
violin lecture.mp4 lecture_ja.mp4 --language Japanese --no-subtitles

# Full replacement (no original audio underneath)
violin lecture.mp4 lecture_ko.mp4 --language Korean --no-voiceover

# Custom config (e.g. switch to OpenAI/ElevenLabs)
violin lecture.mp4 lecture_it.mp4 --language Italian --config config/other_api.yaml

CLI flags

Flag	Default	Description
`--language` / `-l`	(required)	Target language name (e.g. `Spanish`, `Japanese`)
`--voice` / `-v`	auto	TTS voice. Defaults to the primary native voice for the target language
`--source-language`	`auto-detect`	Source language hint for translation
`--no-subtitles`	off	Skip SRT generation
`--voiceover` / `--no-voiceover`	voiceover on	Keep original audio underneath the dub, or full replacement
`--style` / `-s`	`standard`	Style profile name. Use `--style list` to see all
`--config` / `-c`	`config/default.yaml`	Path to a YAML override file
`--timings-out`	off	Write per-step wall-clock timings + cost as JSON

🛰️ Web app & REST API

violin-api                              # default dev mode
violin-api --host 0.0.0.0 --port 8080   # bind everywhere
violin-api --config config/prod.yaml    # production overrides (requires a git checkout for config/prod.yaml)

Core flow: POST /jobs to start, GET /jobs/{id} to poll, GET /jobs/{id}/video and /srt to download, POST /jobs/{id}/chat for in-video Q&A. Full list with request/response schemas at /docs.

Example

# Submit
JOB=$(curl -s -X POST http://localhost:8000/jobs \
  -F "[email protected]" \
  -F "language=Spanish" \
  -F "style=academic" | jq -r .id)

# Poll
curl -s http://localhost:8000/jobs/$JOB | jq '{status, progress}'

# Download
curl -OJ http://localhost:8000/jobs/$JOB/video
curl -OJ http://localhost:8000/jobs/$JOB/srt

Job data lives under jobs/{id}/. Set api.job_ttl_hours to auto-delete jobs older than N hours (default 0 = disabled; config/prod.yaml uses 24h for the public demo).

🌍 Supported languages

Violin supports 33 target languages. The 16 below ship with handpicked native-speaker voices for each provider; the rest fall back to the English voice catalog (which is multilingual under both Cartesia Sonic 3 and ElevenLabs eleven_v3).

Ordered by native-speaker population.

Language	Cartesia native voice (M / F)	ElevenLabs native voice (M / F)
Chinese	chinese commercial man / chinese female conversational	Lin / Lingyue
Spanish	spanish narrator man / spanish narrator lady	Carlos / Valeria
English	tutorial man / helpful woman	Adam / Sarah
Hindi	hindi narrator man / hindi narrator woman	Yatin / Madhusmita
Arabic	middle eastern woman	Faris / Haneen
Portuguese	friendly brazilian man / pleasant brazilian lady	Medeiros / Luna
Russian	russian narrator man 1 / russian narrator woman	Ivo / Xenia
Japanese	japanese male conversational / japanese woman conversational	Shohei / Maiko
Turkish	turkish narrator man / turkish calm man	Sinan / Aura
German	german reporter man / german conversational woman	Daniel / Sina
Korean	korean narrator man / korean calm woman	Joon-ho / Soo
French	french narrator man / french narrator lady	Lior / Virginie
Italian	italian narrator man / italian narrator woman	Raffaele / Chiara
Polish	polish confident man / polish narrator woman	Gregor / Jola
Dutch	dutch confident man / dutch man	Ronald / Jolanda
Swedish	swedish narrator man / swedish calm lady	Andreas / Louise

The 17 fallback languages (using the English voice catalog), also ordered by native speakers: Vietnamese, Tamil, Indonesian, Malay, Ukrainian, Romanian, Thai, Greek, Hungarian, Catalan, Czech, Bulgarian, Danish, Slovak, Croatian, Finnish, Norwegian.

🤝 Contributing

PRs welcome. Got questions or hit a bug? Email [email protected] or open an issue.

⚠️ Disclaimer

This is a personal open-source project, not a Together AI product. Users are responsible for ensuring they have the right to download and translate any content they process. Designed for Creative Commons, public domain, your own recordings, and other content you have permission to use.

📜 License

MIT — use it freely, including commercially.

🙏 Acknowledgements

Built on top of Together AI, Whisper, Cartesia Sonic 3, ElevenLabs, FastAPI, and ffmpeg.

相似文章

@rwayne: 视频翻译这事，这下被牛津博士后一个人干通了。牛津大学博士后 Kevin Lin 开源 Violin 视频翻译工具，把语音识别、LLM 翻译、语音合成整合成自动化流水线。支持多语言互译、个性化翻译风格、视频对话三合一，可以把学术报告转成儿…

X AI KOLs Timeline

牛津大学博士后 Kevin Lin 开源了 Violin 视频翻译工具，将语音识别、LLM 翻译和语音合成整合为自动化流水线，支持多语言互译和个性化风格，并提供 Web、CLI 和 Agent 三种使用方式。

@KevinQHLin：介绍 Violin —— 一款开源视频翻译技能。视频是互联网上的主流媒介，然而大多数高质量内容（讲座、演讲、播客）都局限于单一语言，将全球观众拒之门外。

X AI KOLs Timeline

Violin 是一款开源视频翻译技能，它将语音识别、LLM 翻译和语音合成整合成一个无缝流水线，支持多语言 ASR、个性化翻译以及与视频内容的交互式聊天。

@XAMTO_AI: 这个开源工具要是现在不收藏，将来肯定得后悔——视频自动配音翻译，一口气支持 33 种语言，还能直接对视频内容提问。在 GitHub 上发现一个宝藏工具，叫 Violin，完全开源，做的事情说出来有点离谱：你把视频丢进去，它自动识别语音、…

X AI KOLs Timeline

Violin 是一个开源的视频自动配音翻译工具，支持33种语言，集成Whisper、DeepSeek等模型，提供一键式语音识别、翻译、配音合成及视频内问答功能。

@aigclink: 一个开源的端到端视频翻译+视频问答Skill：violin，亮点是不只是直译，而是内容再创作的设想它把ASR、LLM翻译和TTS整合成了一条无缝管道视频Skill，这三个环节自动衔接，输入视频即得到翻译后的配音视频翻译风格可调，比如说…

X AI KOLs Timeline

Violin是一个开源端到端视频翻译+视频问答工具，整合ASR、LLM翻译和TTS，支持风格调整和内容再创作，可针对视频内容问答。

@yhslgg: 兄弟们，再分享一个开源视频翻译工具——pyVideoTrans，GitHub 17700 星，做视频搬运和本地化的必备！一句话：一个视频丢进去，自动走完语音识别→字幕翻译→AI配音→视频合成整条流水线，出来就是另一种语言的完整视频。核…

X AI KOLs Timeline

pyVideoTrans 是一个开源视频翻译工具，支持自动语音识别、字幕翻译、AI 配音和视频合成，集成了多种 ASR、翻译和 TTS 引擎，适合跨语言视频制作和本地化。

shang-zhu/violin

🎻 Violin

✨ Features

🚀 Quick start

Try it without installing anything

Run locally

🎬 How Violin works

⚙️ Configuration

Switch providers

Production overrides

Environment variables

🎭 Style profiles

💻 CLI usage

CLI flags

🛰️ Web app & REST API

Example

🌍 Supported languages

🤝 Contributing

⚠️ Disclaimer

📜 License

🙏 Acknowledgements

相似文章

@KevinQHLin：介绍 Violin —— 一款开源视频翻译技能。视频是互联网上的主流媒介，然而大多数高质量内容（讲座、演讲、播客）都局限于单一语言，将全球观众拒之门外。

提交意见反馈