Tag
A lab released ViMax, an open-source multi-agent system that fully automates video production from text—generating scripts, storyboards, and videos with consistent characters, solving a key problem in long-form AI video generation.
This paper introduces OmniScript, an 8B-parameter omni-modal (audio-visual) language model for a novel video-to-script (V2S) task that generates hierarchical, scene-by-scene scripts from long-form cinematic videos. Trained via progressive pipeline techniques including chain-of-thought SFT and reinforcement learning with temporally segmented rewards, OmniScript outperforms larger open-source models and rivals proprietary models like Gemini 3-Pro.