After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs
Latent Space: The AI Engineer Podcast
5 DAYS AGO
After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs
After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs

Latent Space: The AI Engineer Podcast
5 DAYS AGO
In this episode, Fei-Fei Li and Justin Johnson, pioneers in computer vision and AI, discuss their latest venture, World Labs, and its groundbreaking spatial intelligence model, Marble. Their conversation traces a journey from foundational work in image recognition to the next frontier of AI: understanding and generating 3D worlds.
Marble represents a shift from language-centric AI to spatial intelligence, enabling the creation of editable 3D environments from text, images, and spatial inputs. Built on Gaussian splats, it supports real-time rendering across devices and allows precise camera control, making it valuable for gaming, film, robotics, and design. The discussion emphasizes that while LLMs excel at abstract reasoning, they lack embodied understanding of physics and space—capabilities essential for true world modeling. The team advocates for integrating physics engines into neural networks to move beyond pattern matching toward causal reasoning. They also highlight the importance of academic innovation despite resource disparities with industry, and reframe transformers as set-based models, opening new architectural possibilities. Marble is not meant to replace language models but to complement them in multimodal systems. Current applications span creative industries and synthetic data generation for embodied AI, with long-term potential in science and decision-making. World Labs continues to push boundaries, seeking talent to advance this vision of intelligent, interactive 3D worlds.
10:29
10:29
There's a scaling limit in performance per watt from Hopper to Blackwell, opening room for new architectures.
18:04
18:04
Vision and language modeling might not be very different.
27:43
27:43
Things invented for fun can end up enabling serious breakthroughs, like the AI revolution from misused graphics chips.
31:40
31:40
Marble enables precise camera placement by understanding 3D space, unlike most video generative models.
33:29
33:29
Gaussian splats can be enhanced with physical properties for physics simulation.
36:20
36:20
Regenerating entire scenes allows more general interaction but is computationally expensive
40:32
40:32
Spatial intelligence involves the ability to reason, understand, move, and interact in space
54:17
54:17
Transformers are natively models of sets, not sequences, with permutation-equivariant operations.
57:04
57:04
Intellectual fearlessness is essential for pioneers in AI research.