@berryxia: 兄弟们,这个可以啊!赶紧装起来! Kevin Lin,牛津大学博士后,前Meta和Microsoft研究员,刚刚把Violin这个开源视频翻译Skill放了出来。 视频已经是互联网绝对主流的内容形式。 可绝大多数高质量讲座、演讲、播客却被…

X AI KOLs Timeline 工具

摘要

Violin是一个开源视频翻译工具,集成了语音识别、大语言模型翻译和语音合成功能,支持30多种语言,提供CLI、Web应用和Claude Code三种使用方式。

兄弟们,这个可以啊!赶紧装起来! Kevin Lin,牛津大学博士后,前Meta和Microsoft研究员,刚刚把Violin这个开源视频翻译Skill放了出来。 视频已经是互联网绝对主流的内容形式。 可绝大多数高质量讲座、演讲、播客却被单一语言死死锁住,全球观众根本触达不到。 Violin把ASR、LLM翻译、TTS三者无缝串成一条流水线。 「输入一段视频,它就能自动完成语音识别、多语言翻译、自然语音合成。」 最实用的是两个功能: 你可以个性化翻译风格,把学术报告改成孩子也能听懂的版本; 还能直接和视频聊天,任何问题都基于视频内容给出答案。 它同时支持Web应用、CLI命令行和Agent Skill,全部MIT开源。 以后高质量内容不再只属于某一种语言,而是真正走向全球。 Demo、博客和GitHub都在原帖。 如果你在做内容、教育、跨语言传播,或者正在开发多模态Agent,这套Skill值得立刻去试。 你觉得AI下一步最该解决的,是内容创作,还是内容全球化? 项目地址:https://github.com/shang-zhu/violin…
查看原文
查看缓存全文

缓存时间: 2026/05/15 04:56

兄弟们,这个可以啊!赶紧装起来! Kevin Lin,牛津大学博士后,前Meta和Microsoft研究员,刚刚把Violin这个开源视频翻译Skill放了出来。 视频已经是互联网绝对主流的内容形式。 可绝大多数高质量讲座、演讲、播客却被单一语言死死锁住,全球观众根本触达不到。 Violin把ASR、LLM翻译、TTS三者无缝串成一条流水线。 「输入一段视频,它就能自动完成语音识别、多语言翻译、自然语音合成。」 最实用的是两个功能: 你可以个性化翻译风格,把学术报告改成孩子也能听懂的版本; 还能直接和视频聊天,任何问题都基于视频内容给出答案。 它同时支持Web应用、CLI命令行和Agent Skill,全部MIT开源。 以后高质量内容不再只属于某一种语言,而是真正走向全球。 Demo、博客和GitHub都在原帖。 如果你在做内容、教育、跨语言传播,或者正在开发多模态Agent,这套Skill值得立刻去试。 你觉得AI下一步最该解决的,是内容创作,还是内容全球化? 项目地址:https://github.com/shang-zhu/violin…


shang-zhu/violin

Source: https://github.com/shang-zhu/violin

🎻 Violin

Open-source Video Translation Skill.

🌐 Live demo · 📝 Blog post · 📜 MIT License

Violin logo

Upload a video. Violin transcribes the speech, translates it, synthesizes a native-sounding voice-over in the target language, and remuxes it back into the video — fully aligned, with optional SRT subtitles.

Available as a CLI, a FastAPI web app, and a Claude Code skill.


✨ Features

  • 33 target languages with handpicked native-speaker voices for the 16 most-used ones (Cartesia Sonic 3 + ElevenLabs)
  • In-video Q&A — ask questions about any moment in the dubbed video; answers use nearby subtitles plus sampled frames
  • Natural-language voice picker — describe the voice you want, an LLM picks from the catalog
  • 6 style profiles (experimental) — standard / kids / academic / casual / storyteller / news
  • Pluggable stack — Together / OpenAI / ElevenLabs interchangeable for every stage, one YAML

🚀 Quick start

Try it without installing anything

The live demo runs at https://www.violin-ai.com — drop a short clip in, get a dubbed video out in a few minutes.

Run locally

Requires Python 3.10+ and ffmpeg on PATH.

curl -LsSf https://astral.sh/uv/install.sh | sh   # install uv if you don't have it
uv tool install violin                            # recommended — faster, isolated
# or: pip install violin                          # if you'd rather install into your current Python env

export TOGETHER_API_KEY=...                       # get one at https://api.together.ai (add to ~/.zshrc to persist)

Three ways to use it:

1. CLI — translate one file:

violin lecture.mp4 lecture_zh.mp4 --language Chinese

2. Web app — full REST API + browser UI:

violin-api
# → http://127.0.0.1:8000           (browser UI)
# → http://127.0.0.1:8000/docs      (interactive API docs)

3. Claude Code skill — invoke from any Claude Code session:

violin --install-skill          # one-time: copies the skill into ~/.claude/skills/
claude
> please use the violin skill to translate path/to/video.mp4 into Chinese
Run from source (for hacking on the pipeline)
git clone https://github.com/shang-zhu/violin.git
cd violin
uv sync
cp .env.example .env             # then fill in TOGETHER_API_KEY
uv run main.py lecture.mp4 lecture_zh.mp4 --language Chinese

To use the violin / violin-api commands globally while edits to your local source reflect immediately, install editable:

uv tool uninstall violin     # if you've installed the PyPI version
uv tool install --editable .

After this, violin / violin-api run from your local checkout — edit any file and the next invocation picks it up; no rebuild needed. To switch back to PyPI: uv tool uninstall violin && uv tool install violin.


🎬 How Violin works

Video
  │
  ├─ ffmpeg ─────────────────────► Extract audio (16 kHz WAV)
  │
  ├─ Whisper Large v3 ────────────► Word-level timestamps → sentence segments
  │
  ├─ LLM (DeepSeek V4 Pro by default) ──► Translate each segment, respecting style profile
  │
  ├─ TTS (Cartesia Sonic 3 by default) ─► Synthesize dubbed audio per segment
  │
  └─ ffmpeg ─────────────────────► Speed-align video to dubbed audio,
                                    concat with freeze-frame fallback,
                                    single-pass AAC encode the audio track,
                                    write output mp4 + optional SRT

⚙️ Configuration

Override any default by writing your own YAML and passing it with --config my.yaml — only the keys you want to change need to appear; values deep-merge with the built-in defaults.

Switch providers

# config/default.yaml — pick the stack you want
models:
  transcription:
    provider: together                  # together | openai
    model: openai/whisper-large-v3      # together → openai/whisper-large-v3 | openai → whisper-1
  translation:
    provider: together                  # together | openai
    model: deepseek-ai/DeepSeek-V4-Pro  # together → deepseek-ai/DeepSeek-V4-Pro | openai → gpt-5.5
  tts:
    provider: together                  # together | elevenlabs | openai
    model: cartesia/sonic-3             # together → cartesia/sonic-3 | elevenlabs → eleven_v3 | openai → tts-1-hd

Production overrides

A starter config/prod.yaml is included for public deployments. It adds upload limits, serializes jobs, and caps ffmpeg concurrency. The included Dockerfile + docker-compose.yml + Caddyfile are how the live demo is hosted — docker compose up -d --build after filling .env is enough to put a copy of Violin behind auto-HTTPS on any Docker host.

Environment variables

VariableWhen requiredDescription
TOGETHER_API_KEYRecommended — covers every stage with the default configTogether AI API key
OPENAI_API_KEYAny stage uses provider: openaiCovers whisper-1, GPT models, and tts-1
ELEVENLABS_API_KEYTTS uses provider: elevenlabsElevenLabs API key
CORS_ORIGINSOptionalComma-separated allowed origins (default: *)

You only need keys for the providers you actually pick. Pure-OpenAI deployments (all stages on openai) work too — OPENAI_API_KEY alone is enough. Same idea for ElevenLabs.


🎭 Style profiles

Six built-in profiles tune both the translation LLM prompt and the TTS delivery. Use --style <name> on the CLI or pass style in API requests.

StyleToneTTS speedEmotion
standardFaithful translation, natural voice1.0×
kidsRewritten for a 7-year-old, plain language1.0×excited
academicFormal register, preserves jargon and honorifics0.95×calm
casualSpoken slang, contractions, friendly1.1×content
storytellerVivid, dramatic narration0.9×enthusiastic
newsConcise, declarative, broadcast-style1.0×neutral

Add your own by editing prompts/styles.yaml.

See all available styles: violin --style list.


💻 CLI usage

Examples use the PyPI-installed violin command. If you’re running from a git checkout, substitute uv run main.py for violin (and uv run run_api.py for violin-api).

# Basic
violin lecture.mp4 lecture_es.mp4 --language Spanish

# Pick a style
violin talk.mp4 talk_zh.mp4 --language Chinese --style kids

# Pick a specific voice
violin lecture.mp4 lecture_fr.mp4 --language French --voice "french narrator man"

# Skip SRT
violin lecture.mp4 lecture_ja.mp4 --language Japanese --no-subtitles

# Full replacement (no original audio underneath)
violin lecture.mp4 lecture_ko.mp4 --language Korean --no-voiceover

# Custom config (e.g. switch to OpenAI/ElevenLabs)
violin lecture.mp4 lecture_it.mp4 --language Italian --config config/other_api.yaml

CLI flags

FlagDefaultDescription
--language / -l(required)Target language name (e.g. Spanish, Japanese)
--voice / -vautoTTS voice. Defaults to the primary native voice for the target language
--source-languageauto-detectSource language hint for translation
--no-subtitlesoffSkip SRT generation
--voiceover / --no-voiceovervoiceover onKeep original audio underneath the dub, or full replacement
--style / -sstandardStyle profile name. Use --style list to see all
--config / -cconfig/default.yamlPath to a YAML override file
--timings-outoffWrite per-step wall-clock timings + cost as JSON

🛰️ Web app & REST API

violin-api                              # default dev mode
violin-api --host 0.0.0.0 --port 8080   # bind everywhere
violin-api --config config/prod.yaml    # production overrides (requires a git checkout for config/prod.yaml)

Core flow: POST /jobs to start, GET /jobs/{id} to poll, GET /jobs/{id}/video and /srt to download, POST /jobs/{id}/chat for in-video Q&A. Full list with request/response schemas at /docs.

Example

# Submit
JOB=$(curl -s -X POST http://localhost:8000/jobs \
  -F "[email protected]" \
  -F "language=Spanish" \
  -F "style=academic" | jq -r .id)

# Poll
curl -s http://localhost:8000/jobs/$JOB | jq '{status, progress}'

# Download
curl -OJ http://localhost:8000/jobs/$JOB/video
curl -OJ http://localhost:8000/jobs/$JOB/srt

Job data lives under jobs/{id}/. Set api.job_ttl_hours to auto-delete jobs older than N hours (default 0 = disabled; config/prod.yaml uses 24h for the public demo).


🌍 Supported languages

Violin supports 33 target languages. The 16 below ship with handpicked native-speaker voices for each provider; the rest fall back to the English voice catalog (which is multilingual under both Cartesia Sonic 3 and ElevenLabs eleven_v3).

Ordered by native-speaker population.

LanguageCartesia native voice (M / F)ElevenLabs native voice (M / F)
Chinesechinese commercial man / chinese female conversationalLin / Lingyue
Spanishspanish narrator man / spanish narrator ladyCarlos / Valeria
Englishtutorial man / helpful womanAdam / Sarah
Hindihindi narrator man / hindi narrator womanYatin / Madhusmita
Arabicmiddle eastern womanFaris / Haneen
Portuguesefriendly brazilian man / pleasant brazilian ladyMedeiros / Luna
Russianrussian narrator man 1 / russian narrator womanIvo / Xenia
Japanesejapanese male conversational / japanese woman conversationalShohei / Maiko
Turkishturkish narrator man / turkish calm manSinan / Aura
Germangerman reporter man / german conversational womanDaniel / Sina
Koreankorean narrator man / korean calm womanJoon-ho / Soo
Frenchfrench narrator man / french narrator ladyLior / Virginie
Italianitalian narrator man / italian narrator womanRaffaele / Chiara
Polishpolish confident man / polish narrator womanGregor / Jola
Dutchdutch confident man / dutch manRonald / Jolanda
Swedishswedish narrator man / swedish calm ladyAndreas / Louise

The 17 fallback languages (using the English voice catalog), also ordered by native speakers: Vietnamese, Tamil, Indonesian, Malay, Ukrainian, Romanian, Thai, Greek, Hungarian, Catalan, Czech, Bulgarian, Danish, Slovak, Croatian, Finnish, Norwegian.


🤝 Contributing

PRs welcome. Got questions or hit a bug? Email [email protected] or open an issue.


⚠️ Disclaimer

This is a personal open-source project, not a Together AI product. Users are responsible for ensuring they have the right to download and translate any content they process. Designed for Creative Commons, public domain, your own recordings, and other content you have permission to use.


📜 License

MIT — use it freely, including commercially.


🙏 Acknowledgements

Built on top of Together AI, Whisper, Cartesia Sonic 3, ElevenLabs, FastAPI, and ffmpeg.

相似文章

@rwayne: 视频翻译这事,这下被牛津博士后一个人干通了。 牛津大学博士后 Kevin Lin 开源 Violin 视频翻译工具,把语音识别、LLM 翻译、语音合成整合成自动化流水线。支持多语言互译、个性化翻译风格、视频对话三合一,可以把学术报告转成儿…

X AI KOLs Timeline

牛津大学博士后 Kevin Lin 开源了 Violin 视频翻译工具,将语音识别、LLM 翻译和语音合成整合为自动化流水线,支持多语言互译和个性化风格,并提供 Web、CLI 和 Agent 三种使用方式。

@XAMTO_AI: 这个开源工具要是现在不收藏,将来肯定得后悔——视频自动配音翻译,一口气支持 33 种语言,还能直接对视频内容提问。 在 GitHub 上发现一个宝藏工具,叫 Violin,完全开源,做的事情说出来有点离谱:你把视频丢进去,它自动识别语音、…

X AI KOLs Timeline

Violin 是一个开源的视频自动配音翻译工具,支持33种语言,集成Whisper、DeepSeek等模型,提供一键式语音识别、翻译、配音合成及视频内问答功能。

@yhslgg: 兄弟们,再分享一个开源视频翻译工具——pyVideoTrans,GitHub 17700 星,做视频搬运和本地化的必备! 一句话:一个视频丢进去,自动走完语音识别→字幕翻译→AI配音→视频合成整条流水线,出来就是另一种语言的完整视频。 核…

X AI KOLs Timeline

pyVideoTrans 是一个开源视频翻译工具,支持自动语音识别、字幕翻译、AI 配音和视频合成,集成了多种 ASR、翻译和 TTS 引擎,适合跨语言视频制作和本地化。