@EthanHe_42: In @latentspacepod podcast, I shared my view on video generation, world models, LLMs, agents, continual learning and wh…
Summary
Ethan He shares his insights from a Latent Space podcast, discussing key ideas about video generation, world models, LLMs, agents, continual learning, and the next frontiers in AI.
View Cached Full Text
Cached at: 06/02/26, 05:37 PM
In @latentspacepod podcast, I shared my view on video generation, world models, LLMs, agents, continual learning and where the next frontier is.
- Video models get most of their intelligence from language, not from video data.
- Idea-to-code is fast now. The bottleneck is back to having enough compute to try every idea.
- Iteration speed beats almost everything else in model development.
- The next leap won’t be a better video model. It’ll be a video agent.
- Diffusion will be the frontend of AGI, the LLM the backend. Generative UI will replace HTML/CSS: user intent straight to pixels.
- Physical embodiment may become a tool a powerful AI picks up. Robotics may get solved by video-capable LLMs.
- Continual learning may look like models that manage their own context, and even rewrite their own harness at test time. Thanks @swyx and @vibhuuuus for having me
Apple podcast: https://podcasts.apple.com/us/podcast/latent-space-the-ai-engineer-podcast/id1674008350?i=1000770600564… Spotify: https://open.spotify.com/episode/1ZUjJ0WBqpp5F2vwZbpVSf… transcript on Substack: https://latent.space/p/video-agents
great interview @EthanHe_42 @latentspacepod
Great listen. Nice pod @EthanHe_42
Similar Articles
@swyx: full writeup and links here
A Latent Space podcast episode discusses the thesis that video models derive intelligence from LLMs, and that the next frontier is video agents. Guest Ethan He, who built Grok Imagine at xAI, shares insights on building frontier image and video systems.
@swyx: This pod was an incredible gift to the community: not only our first pod about @xAI, but Ethan really indulged on all o…
A tweet praising a podcast episode where former xAI world model lead Ethan He provides deep insights into training SOTA video generation world models, covering Grok Imagine, Cosmos, and the parallels between video and coding agents.
Why Video Agent models are next — Ethan He, xAI Grok Imagine (98 minute read)
Ethan He from xAI discusses why video agent models are the next frontier, arguing that video models derive intelligence from LLMs and that the evolution of video generation will mirror AI coding, shifting from one-shot output to multi-turn planning and execution.
@bradwmorris: integrating good system design and thinking, not just into software, but into all agentic ai interactions is a massive …
Brad Morris joins the Latent Space podcast to discuss the significant opportunity of applying rigorous system design principles to agentic AI interactions.
@aiDotEngineer: Building Generative Image & Video models at Scale https://youtube.com/watch?v=xOP1PM8fwnk… A lot of interest in image g…
YouTube talk by @sedielem offering a concise state-of-the-art overview of scaling generative image and video models, covering modeling, architecture, distillation and control.