@hank_aibtc: https://x.com/victormustar/status/2058492201261244458/video/1… Holy cow! Meituan crushes commercial closed-source Avatar, open-source free LongCat-Video-Avatar-1.5 is here! …

X AI KOLs Timeline Models

Summary

Meituan open-sourced the LongCat-Video-Avatar-1.5 model, which supports generating realistic talking videos from a single photo and voice, supports multiple languages and long videos, and outperforms commercial closed-source solutions.

https://x.com/victormustar/status/2058492201261244458/video/1… Holy cow! Meituan directly crushes commercial closed-source Avatar, open-source free LongCat-Video-Avatar-1.5 is here! Throw in a single photo + audio (Chinese, English, Japanese, whatever), and get a talking video with explosive lip-sync, natural eye blinking and head tilting, wild gestures. Long videos with stable face, multi-person conversations each managing their own, singing and dancing work, anime, animals, real people all handled! Previous issues with HeyGen, Kling etc., like constantly mismatched mouths, face drift, only speaking English? All dead now. Now open-source under MIT, run locally, batch generate at will! Content creators, e-commerce sellers, virtual lecturers, YouTubers who don't want to show faces, multilingual marketers... this is a huge productivity gain! Core Idea: LongCat-Video-Avatar-1.5 is best for Talking Head Avatar, especially for e-commerce marketing. Scenario: Input a Reference Image + Audio (recorded speech script), generate a product video with natural lip-sync and stable identity (Identity Consistency). Advantages: Supports Long Video Continuation, multi-person dialogue, multilingual, no identity drift, suitable for live replay or short video pre-rendering. Project + HF Demo below
Original Article
View Cached Full Text

Cached at: 05/25/26, 04:44 AM

https://x.com/victormustar/status/2058492201261244458/video/1…
Holy shit! Meitu just wrecked commercial closed-source Avatars —
the open-source, free LongCat-Video-Avatar-1.5 is here!

Drop in a photo + a voice clip (Chinese, English, Japanese — any language works),
and it instantly generates a talking video with perfectly synced lips, natural blinking and head movements, and wild hand gestures.

Long videos stay stable, multi-person conversations keep each person separate,
it even handles singing and dancing, and works with anime, animals, and real people!

All those problems HeyGen, Kling, etc. used to have — lip-sync failures, face drift, English-only?
All gone.

Now open-source under MIT, runs locally, batch generation is a breeze!

Content creators, e-commerce sellers, virtual lecturers, YouTubers who don’t want to show their faces, multilingual marketers… this is a productivity jackpot!

Core Idea:
LongCat-Video-Avatar-1.5 is ideal for
Talking Head Avatars (digital human avatars),
especially for e-commerce marketing.

Use case: Input a reference image + an audio clip (recorded script), and generate a product promotion video with natural lip-sync and stable identity consistency.

Advantages: Supports long video continuation,
multi-person dialogues, multilingual speech, no identity drift,
perfect for live replay or short video pre-rendering.

Project + HF Demo below:

Similar Articles

@QT9277: "No way, AI voice synthesis has gotten this insane???" I was browsing GitHub today and was completely stunned. VoxCPM2, trending #1, over 20k stars, blowing up overseas. I thought it was another PPT open-source project, but after carefully checking the demo—my ears really couldn't tell which one was real. …

X AI KOLs Timeline

Introducing VoxCPM2, a completely free for commercial use, open-source multilingual voice synthesis model supporting voice design, cloning, and 48kHz high-quality output, ranked #1 on GitHub trending.