Tag
JD Open Source releases JoyAI-Echo (Echo-LongVideo), a text-to-audio-video diffusion model capable of generating minute-level multi-shot videos with consistent character identity and voice, using DMD distillation for 7.5x speedup.
MSAVBench is the first comprehensive benchmark and adaptive evaluation framework for multi-shot audio-video generation, assessing 19 models across diverse tasks and achieving high alignment with human judgment.