@DengHokin: I am super excited to share that I launch a weekly Video Model Journal Club. Every week we pick one paper and go deep, …
Summary
The author launches a weekly Video Model Journal Club covering video generation, world models, physical reasoning, diffusion, flow matching, etc. The first in-person talk will be by Yilun Du on Embodied Reasoning with World Models.
View Cached Full Text
Cached at: 06/16/26, 11:53 AM
I am super excited to share that I launch a weekly Video Model Journal Club. Every week we pick one paper and go deep, i.e. video generation, world models, physical reasoning, diffusion, flow matching, and everything in between.
This Friday, we will have Yilun Du @du_yilun from @Harvard giving us a talk on Embodied Reasoning with World Models in person at @moonlake - really grateful for Fan-yun Sun @sunfanyun, Charlotte @xia_char and Shin @shinshin_oob for hosting.
Register for in-person via Luma: https://luma.com/video-model
#video #AI #SF
Video Model Journal Club · Events Calendar
Source: https://luma.com/video-model Every week we pick one paper and go deep — video generation, world models, physical reasoning, diffusion, flow matching, and everything in between.
Events
Embodied Reasoning with World Models by Yilun Du
By Hokin Deng, Fan-Yun Sun, Charlotte Xia, Shin & 2 others
San Francisco, United States
Think Visually, Reason Textually: Vision-Language Synergy in ARC by Beichen Zhang
Demystifying Video Reasoning by Ruisi Wang
Video Reasoning Models by Zhongang Cai
Video Models Can Reason with Verifiable Rewards by Tinghui Zhu
Video Models Are Zero-Shot Learners and Reasoners by Thaddäus Wiedemer
Do Joint Audio-Video Generation Models Understand Physics? by Zijun Cui
Similar Articles
@swyx: full writeup and links here
A Latent Space podcast episode discusses the thesis that video models derive intelligence from LLMs, and that the next frontier is video agents. Guest Ethan He, who built Grok Imagine at xAI, shares insights on building frontier image and video systems.
@aiDotEngineer: Building Generative Image & Video models at Scale https://youtube.com/watch?v=xOP1PM8fwnk… A lot of interest in image g…
YouTube talk by @sedielem offering a concise state-of-the-art overview of scaling generative image and video models, covering modeling, architecture, distillation and control.
Why Video Agent models are next — Ethan He, xAI Grok Imagine (98 minute read)
Ethan He from xAI discusses why video agent models are the next frontier, arguing that video models derive intelligence from LLMs and that the evolution of video generation will mirror AI coding, shifting from one-shot output to multi-turn planning and execution.
@HuggingPapers: Top AI Papers of The Week (May 25-31): - Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players - SkillO…
A curated roundup of top AI papers from May 25-31, covering multi-agent world modeling, vision-language-action models, agent skill optimization, and alignment frameworks.
Qwen's Embodied World Modeling (28 minute read)
The Qwen-RobotWorld technical report presents a unified language-conditioned video world model for embodied intelligence, enabling future video prediction from current observations across various domains like robotics, autonomous driving, and navigation, with applications in synthetic data generation, policy evaluation, and planning.