Tag
This paper introduces YoCausal, a benchmark based on the Violation of Expectation paradigm from cognitive science, to evaluate whether video diffusion models truly understand causality or merely overfit to temporal patterns. Evaluation of 13 state-of-the-art models reveals a significant gap compared to human-level causal cognition.
VideoRLVR optimizes video diffusion models for verifiable reasoning tasks using reinforcement learning with rule-based rewards, achieving better performance than supervised methods in constraint-satisfying video generation.