@XAMTO_AI: 这个开源工具要是现在不收藏,将来肯定得后悔——视频自动配音翻译,一口气支持 33 种语言,还能直接对视频内容提问。 在 GitHub 上发现一个宝藏工具,叫 Violin,完全开源,做的事情说出来有点离谱:你把视频丢进去,它自动识别语音、…
摘要
Violin 是一个开源的视频自动配音翻译工具,支持33种语言,集成Whisper、DeepSeek等模型,提供一键式语音识别、翻译、配音合成及视频内问答功能。
查看缓存全文
缓存时间: 2026/06/12 08:58
这个开源工具要是现在不收藏,将来肯定得后悔——视频自动配音翻译,一口气支持 33 种语言,还能直接对视频内容提问。
在 GitHub 上发现一个宝藏工具,叫 Violin,完全开源,做的事情说出来有点离谱:你把视频丢进去,它自动识别语音、翻译、合成目标语言的配音,再无缝混回视频里,时间轴完全对齐,还顺手给你输出 SRT 字幕。整个流程一条龙,不用你手动碰任何东西。
底层跑的是 Whisper Large v3 做语音识别,DeepSeek V4 Pro 负责翻译,Cartesia Sonic 3 合成配音,ffmpeg 前后处理,整个 pipeline 设计得很干净。
功能上有几个点我觉得挺有意思:
支持 33 种目标语言,其中 16 种配有精选母语配音,用的是 Cartesia Sonic 3 加 ElevenLabs,听感不是那种机器腔。
视频内 Q&A,配音完之后你可以对视频任意时刻提问,它基于附近字幕和采样帧给你答案,这个设计有点超出预期。
自然语言选声音,你描述想要什么风格的声音,LLM 自动从语音库里帮你挑,不用一个个试。
六种风格预设:标准、儿童、学术、休闲、讲故事、新闻,每种预设连语速和情绪都调好了,直接用。
可插拔架构,转录、翻译、TTS 各阶段都能换,Together、OpenAI、ElevenLabs 随便组合,一个 YAML 文件配置搞定,不想动代码的人也能玩。
GitHub: https://github.com/shang-zhu/violin… 在线体验: https://violin-ai.com
shang-zhu/violin
Source: https://github.com/shang-zhu/violin
🎻 Violin
Open-source Video Translation Skill.
🌐 Live demo · 📝 Blog post · 📜 MIT License
Upload a video. Violin transcribes the speech, translates it, synthesizes a native-sounding voice-over in the target language, and remuxes it back into the video — fully aligned, with optional SRT subtitles.
Available as a CLI, a FastAPI web app, and a Claude Code skill.
✨ Features
- 33 target languages with handpicked native-speaker voices for the 16 most-used ones (Cartesia Sonic 3 + ElevenLabs)
- In-video Q&A — ask questions about any moment in the dubbed video; answers use nearby subtitles plus sampled frames
- Natural-language voice picker — describe the voice you want, an LLM picks from the catalog
- 6 style profiles (experimental) — standard / kids / academic / casual / storyteller / news
- Pluggable stack — Together / OpenAI / ElevenLabs interchangeable for every stage, one YAML
🚀 Quick start
Try it without installing anything
The live demo runs at https://www.violin-ai.com — drop a short clip in, get a dubbed video out in a few minutes.
Run locally
Requires Python 3.10+ and ffmpeg on PATH.
curl -LsSf https://astral.sh/uv/install.sh | sh # install uv if you don't have it
uv tool install violin # recommended — faster, isolated
# or: pip install violin # if you'd rather install into your current Python env
export TOGETHER_API_KEY=... # get one at https://api.together.ai (add to ~/.zshrc to persist)
Three ways to use it:
1. CLI — translate one file:
violin lecture.mp4 lecture_zh.mp4 --language Chinese
2. Web app — full REST API + browser UI:
violin-api
# → http://127.0.0.1:8000 (browser UI)
# → http://127.0.0.1:8000/docs (interactive API docs)
3. Claude Code skill — invoke from any Claude Code session:
violin --install-skill # one-time: copies the skill into ~/.claude/skills/
claude
> please use the violin skill to translate path/to/video.mp4 into Chinese
Run from source (for hacking on the pipeline)
git clone https://github.com/shang-zhu/violin.git
cd violin
uv sync
cp .env.example .env # then fill in TOGETHER_API_KEY
uv run main.py lecture.mp4 lecture_zh.mp4 --language Chinese
To use the violin / violin-api commands globally while edits to your local source reflect immediately, install editable:
uv tool uninstall violin # if you've installed the PyPI version
uv tool install --editable .
After this, violin / violin-api run from your local checkout — edit any file and the next invocation picks it up; no rebuild needed. To switch back to PyPI: uv tool uninstall violin && uv tool install violin.
📝 To Do List
- [-] support voice cloning.
- [-] lip sync generation.
🎬 How Violin works
Video
│
├─ ffmpeg ─────────────────────► Extract audio (16 kHz WAV)
│
├─ Whisper Large v3 ────────────► Word-level timestamps → sentence segments
│
├─ LLM (DeepSeek V4 Pro by default) ──► Translate each segment, respecting style profile
│
├─ TTS (Cartesia Sonic 3 by default) ─► Synthesize dubbed audio per segment
│
└─ ffmpeg ─────────────────────► Speed-align video to dubbed audio,
concat with freeze-frame fallback,
single-pass AAC encode the audio track,
write output mp4 + optional SRT
⚙️ Configuration
Override any default by writing your own YAML and passing it with --config my.yaml — only the keys you want to change need to appear; values deep-merge with the built-in defaults.
Switch providers
# config/default.yaml — pick the stack you want
models:
transcription:
provider: together # together | openai
model: openai/whisper-large-v3 # together → openai/whisper-large-v3 | openai → whisper-1
translation:
provider: together # together | openai
model: deepseek-ai/DeepSeek-V4-Pro # together → deepseek-ai/DeepSeek-V4-Pro | openai → gpt-5.5
tts:
provider: together # together | elevenlabs | openai
model: cartesia/sonic-3 # together → cartesia/sonic-3 | elevenlabs → eleven_v3 | openai → tts-1-hd
Production overrides
A starter config/prod.yaml is included for public deployments. It adds upload limits, serializes jobs, and caps ffmpeg concurrency. The included Dockerfile + docker-compose.yml + Caddyfile are how the live demo is hosted — docker compose up -d --build after filling .env is enough to put a copy of Violin behind auto-HTTPS on any Docker host.
Environment variables
| Variable | When required | Description |
|---|---|---|
TOGETHER_API_KEY | Recommended — covers every stage with the default config | Together AI API key |
OPENAI_API_KEY | Any stage uses provider: openai | Covers whisper-1, GPT models, and tts-1 |
ELEVENLABS_API_KEY | TTS uses provider: elevenlabs | ElevenLabs API key |
CORS_ORIGINS | Optional | Comma-separated allowed origins (default: *) |
You only need keys for the providers you actually pick. Pure-OpenAI deployments (all stages on
openai) work too —OPENAI_API_KEYalone is enough. Same idea for ElevenLabs.
🎭 Style profiles
Six built-in profiles tune both the translation LLM prompt and the TTS delivery. Use --style <name> on the CLI or pass style in API requests.
| Style | Tone | TTS speed | Emotion |
|---|---|---|---|
standard | Faithful translation, natural voice | 1.0× | — |
kids | Rewritten for a 7-year-old, plain language | 1.0× | excited |
academic | Formal register, preserves jargon and honorifics | 0.95× | calm |
casual | Spoken slang, contractions, friendly | 1.1× | content |
storyteller | Vivid, dramatic narration | 0.9× | enthusiastic |
news | Concise, declarative, broadcast-style | 1.0× | neutral |
Add your own by editing prompts/styles.yaml.
See all available styles: violin --style list.
💻 CLI usage
Examples use the PyPI-installed
violincommand. If you’re running from a git checkout, substituteuv run main.pyforviolin(anduv run run_api.pyforviolin-api).
# Basic
violin lecture.mp4 lecture_es.mp4 --language Spanish
# Pick a style
violin talk.mp4 talk_zh.mp4 --language Chinese --style kids
# Pick a specific voice
violin lecture.mp4 lecture_fr.mp4 --language French --voice "french narrator man"
# Skip SRT
violin lecture.mp4 lecture_ja.mp4 --language Japanese --no-subtitles
# Full replacement (no original audio underneath)
violin lecture.mp4 lecture_ko.mp4 --language Korean --no-voiceover
# Custom config (e.g. switch to OpenAI/ElevenLabs)
violin lecture.mp4 lecture_it.mp4 --language Italian --config config/other_api.yaml
CLI flags
| Flag | Default | Description |
|---|---|---|
--language / -l | (required) | Target language name (e.g. Spanish, Japanese) |
--voice / -v | auto | TTS voice. Defaults to the primary native voice for the target language |
--source-language | auto-detect | Source language hint for translation |
--no-subtitles | off | Skip SRT generation |
--voiceover / --no-voiceover | voiceover on | Keep original audio underneath the dub, or full replacement |
--style / -s | standard | Style profile name. Use --style list to see all |
--config / -c | config/default.yaml | Path to a YAML override file |
--timings-out | off | Write per-step wall-clock timings + cost as JSON |
🛰️ Web app & REST API
violin-api # default dev mode
violin-api --host 0.0.0.0 --port 8080 # bind everywhere
violin-api --config config/prod.yaml # production overrides (requires a git checkout for config/prod.yaml)
Core flow: POST /jobs to start, GET /jobs/{id} to poll, GET /jobs/{id}/video and /srt to download, POST /jobs/{id}/chat for in-video Q&A. Full list with request/response schemas at /docs.
Example
# Submit
JOB=$(curl -s -X POST http://localhost:8000/jobs \
-F "[email protected]" \
-F "language=Spanish" \
-F "style=academic" | jq -r .id)
# Poll
curl -s http://localhost:8000/jobs/$JOB | jq '{status, progress}'
# Download
curl -OJ http://localhost:8000/jobs/$JOB/video
curl -OJ http://localhost:8000/jobs/$JOB/srt
Job data lives under jobs/{id}/. Set api.job_ttl_hours to auto-delete jobs older than N hours (default 0 = disabled; config/prod.yaml uses 24h for the public demo).
🌍 Supported languages
Violin supports 33 target languages. The 16 below ship with handpicked native-speaker voices for each provider; the rest fall back to the English voice catalog (which is multilingual under both Cartesia Sonic 3 and ElevenLabs eleven_v3).
Ordered by native-speaker population.
| Language | Cartesia native voice (M / F) | ElevenLabs native voice (M / F) |
|---|---|---|
| Chinese | chinese commercial man / chinese female conversational | Lin / Lingyue |
| Spanish | spanish narrator man / spanish narrator lady | Carlos / Valeria |
| English | tutorial man / helpful woman | Adam / Sarah |
| Hindi | hindi narrator man / hindi narrator woman | Yatin / Madhusmita |
| Arabic | middle eastern woman | Faris / Haneen |
| Portuguese | friendly brazilian man / pleasant brazilian lady | Medeiros / Luna |
| Russian | russian narrator man 1 / russian narrator woman | Ivo / Xenia |
| Japanese | japanese male conversational / japanese woman conversational | Shohei / Maiko |
| Turkish | turkish narrator man / turkish calm man | Sinan / Aura |
| German | german reporter man / german conversational woman | Daniel / Sina |
| Korean | korean narrator man / korean calm woman | Joon-ho / Soo |
| French | french narrator man / french narrator lady | Lior / Virginie |
| Italian | italian narrator man / italian narrator woman | Raffaele / Chiara |
| Polish | polish confident man / polish narrator woman | Gregor / Jola |
| Dutch | dutch confident man / dutch man | Ronald / Jolanda |
| Swedish | swedish narrator man / swedish calm lady | Andreas / Louise |
The 17 fallback languages (using the English voice catalog), also ordered by native speakers: Vietnamese, Tamil, Indonesian, Malay, Ukrainian, Romanian, Thai, Greek, Hungarian, Catalan, Czech, Bulgarian, Danish, Slovak, Croatian, Finnish, Norwegian.
🤝 Contributing
PRs welcome. Got questions or hit a bug? Email [email protected] or open an issue.
⚠️ Disclaimer
This is a personal open-source project, not a Together AI product. Users are responsible for ensuring they have the right to download and translate any content they process. Designed for Creative Commons, public domain, your own recordings, and other content you have permission to use.
📜 License
MIT — use it freely, including commercially.
🙏 Acknowledgements
Built on top of Together AI, Whisper, Cartesia Sonic 3, ElevenLabs, FastAPI, and ffmpeg.
相似文章
@aigclink: 一个开源的端到端视频翻译+视频问答Skill:violin,亮点是不只是直译,而是内容再创作的设想 它把ASR、LLM翻译和TTS整合成了一条无缝管道视频Skill,这三个环节自动衔接,输入视频即得到翻译后的配音视频 翻译风格可调,比如说…
Violin是一个开源端到端视频翻译+视频问答工具,整合ASR、LLM翻译和TTS,支持风格调整和内容再创作,可针对视频内容问答。
@berryxia: 兄弟们,这个可以啊!赶紧装起来! Kevin Lin,牛津大学博士后,前Meta和Microsoft研究员,刚刚把Violin这个开源视频翻译Skill放了出来。 视频已经是互联网绝对主流的内容形式。 可绝大多数高质量讲座、演讲、播客却被…
Violin是一个开源视频翻译工具,集成了语音识别、大语言模型翻译和语音合成功能,支持30多种语言,提供CLI、Web应用和Claude Code三种使用方式。
@yhslgg: 老杨再特么分享一个宝藏开源工具——KrillinAI,GitHub 10000 星,做多语言音视频内容的绝对值得看! 一句话:从视频下载到字幕翻译、AI配音、视频合成,整条链路全包,还能自动生成平台封面,B站、抖音、小红书、YouTube…
KrillinAI 是一款开源工具,整合了视频下载、字幕翻译、AI配音、视频合成全流程,支持上下文感知翻译、语音克隆、自动布局和封面生成,兼容多种AI模型,适合多语言音视频内容创作与分发。
@yhslgg: 兄弟们,再分享一个开源视频翻译工具——pyVideoTrans,GitHub 17700 星,做视频搬运和本地化的必备! 一句话:一个视频丢进去,自动走完语音识别→字幕翻译→AI配音→视频合成整条流水线,出来就是另一种语言的完整视频。 核…
pyVideoTrans 是一个开源视频翻译工具,支持自动语音识别、字幕翻译、AI 配音和视频合成,集成了多种 ASR、翻译和 TTS 引擎,适合跨语言视频制作和本地化。
@rwayne: 视频翻译这事,这下被牛津博士后一个人干通了。 牛津大学博士后 Kevin Lin 开源 Violin 视频翻译工具,把语音识别、LLM 翻译、语音合成整合成自动化流水线。支持多语言互译、个性化翻译风格、视频对话三合一,可以把学术报告转成儿…
牛津大学博士后 Kevin Lin 开源了 Violin 视频翻译工具,将语音识别、LLM 翻译和语音合成整合为自动化流水线,支持多语言互译和个性化风格,并提供 Web、CLI 和 Agent 三种使用方式。