Why Video Agent models are next — Ethan He, xAI Grok Imagine
Latent Space: The AI Engineer Podcast
2 DAYS AGO
Why Video Agent models are next — Ethan He, xAI Grok Imagine
Why Video Agent models are next — Ethan He, xAI Grok Imagine

Latent Space: The AI Engineer Podcast
2 DAYS AGO
Shownote
Shownote
We’re announcing AIEWF speakers this week! Take the AI Engineering Survey! Today’s guest Ethan first joined us for the LS Paper Club as the lead on NVIDIA Cosmos World Model, but then joined xAI and built Grok Imagine in 3 months: He comes back on Latent...
Highlights
Highlights
Ethan He, who built NVIDIA's Cosmos world model and later led the creation of Grok Imagine at xAI in just three months, shares his journey and insights on the current state and future of video generation. He argues that the most significant advances in video models are now coming from language models and agentic systems, not from improvements in diffusion technology itself.
Chapters
Chapters
Introduction
00:00From NVIDIA Cosmos to xAI
01:25Building Grok Imagine from Zero to One
03:24How Image and Video Models Are Trained
10:07Video Compression, VAEs, and Real-Time Tradeoffs
18:53Generative UI, Flipbook, and Neural OS
22:10The Cost of Training Large Video Models
32:10Distillation, GANs, and Fast Video Inference
37:04Audio-Video Generation and Grok Imagine 0.9
41:21What Makes a World Model?
48:34Reference Videos, Long Context, and Video Memory
55:51xAI Culture, Research, and First-Principles Building
1:00:11AI Safety, Watermarking, and Prompt Rewriting
1:09:45Video Agents and AI-Assisted Creation
1:13:10Why Language Models Unlock Better Video
1:27:32Robotics, Physical AI, and Embodied World Models
1:31:15Why Ethan Left xAI
1:32:38Self-Managed Context and the Future of LLMs
1:34:16Ethan’s Career Path and Closing Thoughts
1:38:43Transcript
Transcript
swyx: Okay, we're here in the studio with Ethan He, most recently of XAI. Welcome.
Ethan He: Yes, thank you. Glad being here.
swyx: We're also here with Vibhu. You were first coming to us or joining the latent space world because you were working on Cosm...