After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs
Latent Space: The AI Engineer Podcast
2025/11/25
After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs
After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs

Latent Space: The AI Engineer Podcast
2025/11/25
The conversation dives into the evolution of AI beyond language, exploring how spatial intelligence is reshaping the way machines understand and generate 3D environments. From foundational research to real-world applications, the discussion bridges decades of computer vision progress with a bold new direction in generative modeling.
The podcast explores the emergence of spatial intelligence as a critical frontier in AI, driven by World Labs' new generative world model, Marble. Building on early breakthroughs like ImageNet and dense captioning, the team highlights how visual and physical understanding surpass the limits of language-based models. Marble uses Gaussian splats to create interactive, editable 3D scenes from text or images, enabling precise camera control and cross-device rendering. The discussion emphasizes that while current models excel at pattern recognition, they lack true causal reasoning—such as deriving physical laws like F=ma—pointing to the need for embodied, multimodal systems. Spatial intelligence, rooted in perception and interaction, is framed as complementary to language, not a replacement. Applications span creative fields like film and design, robotics simulation, and scientific discovery. The vision centers on integrating physics into neural models and democratizing access to world-scale AI, urging researchers to pursue high-risk, long-term innovation despite resource disparities in academia and industry.
09:12
09:12
Academia should focus on new and wacky ideas rather than training large models.
19:39
19:39
Pixels may be a more lossless and general representation of the world compared to tokenized representations used in LLMs.
22:26
22:26
A model can render realistic scenes without understanding the physical forces behind them.
41:14
41:14
Spatial intelligence complements linguistic intelligence and involves reasoning, understanding, moving, and interacting in space.
47:26
47:26
Most humans are born with the ability to link perception and motor movement, a capability that remains challenging for AI.
50:11
50:11
An LLM may predict accurate movement trajectories but not derive Newtonian laws.
57:58
57:58
We need intellectual fearlessness—this is a pioneering field demanding bold thinkers.