@KevinQHLin: IntroducingViolin — an Open-source Video Translation Skill. Video is the dominant medium on the internet, yet most high…

X AI KOLs Timeline 05/14/26, 08:31 PM Tools

video-translation open-source speech-recognition llm tts multilingual agent

Summary

Violin is an open-source video translation skill that combines speech recognition, LLM translation, and speech synthesis into a seamless pipeline, supporting multilingual ASR, personalized translation, and interactive chat with video content.

IntroducingViolin — an Open-source Video Translation Skill. Video is the dominant medium on the internet, yet most high-quality content (lecture, talk, podcast) is locked behind a single language, leaving global audiences behind. So we built Violin: a video skill that combines speech recognition, LLM translation, and speech synthesis into one seamless pipeline. Demo: https://violin-ai.com Blog: https://together.ai/blog/violin-open-source-translation-skill… GitHub: https://github.com/shang-zhu/violin… Key Features: High-quality multilingual ASR & Translation & TTS. Personalize translation & voice (turn an academic talk into something children can follow). Chat with the video — ask any questions grounded in the video. Support Web app, CLI, and Agent skill Fully open-source under MIT. Built with the wonderful @ShangZhu18 and advised by @james_y_zou ! All features powered by @togethercompute . Try it and let us know what you think!

Original Article

View Cached Full Text

Cached at: 05/15/26, 02:55 AM

Violin — Video Narrator

Source: https://www.violin-ai.com/ Vimeo, X/Twitter, and1000+ sites· max 2 hours ·YouTube may not work from cloud servers

Only use URLs you have rights to download — Creative Commons, public domain, or your own content.

Similar Articles

@berryxia: Guys, this is awesome! Install it right away! Kevin Lin, postdoc at Oxford, former Meta and Microsoft researcher, just released Violin, an open-source video translation Skill. Video is already the absolute dominant content form on the internet. Yet most high-quality lectures, speeches, and podcasts are locked by a single language…

X AI KOLs Timeline

Violin is an open-source video translation tool that integrates speech recognition, large language model translation, and text-to-speech. It supports over 30 languages and offers three usage modes: CLI, web app, and Claude Code.

@aigclink: An open-source end-to-end video translation + video Q&A Skill: violin. The highlight is not just literal translation, but the idea of content re-creation. It integrates ASR, LLM translation, and TTS into a seamless pipeline video Skill. The three modules are automatically chained: input a video and get a dubbed translated video. Translation style is adjustable, for example...

X AI KOLs Timeline

Violin is an open-source end-to-end video translation and video Q&A tool, integrating ASR, LLM translation, and TTS. It supports style adjustment and content re-creation, and can answer questions about video content.

VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing

arXiv cs.CL

VITA-QinYu is an expressive end-to-end spoken language model capable of role-playing and singing, trained on 15.8K hours of data to outperform peers in expressiveness and conversational accuracy.

Build a Realtime Speech Translation (28 minute read)

TLDR AI

OpenAI releases gpt-realtime-translate, a low-latency speech-to-speech model optimized for live interpretation, accompanied by a developer cookbook for building multilingual browser, phone, and video applications.

How Descript engineers multilingual video dubbing at scale

OpenAI Blog

Descript redesigned its translation pipeline using OpenAI reasoning models to optimize multilingual video dubbing at scale, achieving 15% increase in translated video exports and 13-43% improvement in duration adherence across languages by addressing the challenge of matching speech duration to video timing constraints.

Violin — Video Narrator

Similar Articles

VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing

Build a Realtime Speech Translation (28 minute read)

How Descript engineers multilingual video dubbing at scale

Submit Feedback