World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI
Latent Space: The AI Engineer Podcast
2025/12/06
World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI
World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI

Latent Space: The AI Engineer Podcast
2025/12/06
In this deep dive, we explore the evolution of world models in AI through the journey of Pim DeWitte, whose gaming platform Medal became the foundation for General Intuition—a frontier lab betting on spatial-temporal intelligence as the next leap beyond large language models. With a unique dataset built from real human gameplay, the team is redefining how agents learn from visual input and action, moving far beyond passive video prediction into active, embodied simulation.
General Intuition leverages 3.8 billion action-labeled game clips from Medal to train vision-based agents that perceive frames and output actions in real time, achieving human-level and sometimes superhuman performance in dynamic environments. These world models go beyond video generation by incorporating interactivity, memory, and partial observability—learning from smoke, occlusion, and camera shake as meaningful signals. The team has successfully distilled large policies into compact, real-time models deployable on consumer hardware, enabling transfer from arcade games to realistic simulations and real-world video. Rejecting a $500M offer from OpenAI, the founders chose independence to build proprietary capabilities, using their data moat to pioneer 'frames in, actions out' APIs for gaming, robotics, and simulation. They argue world models and LLMs are complementary: while LLMs excel at orchestration, world models provide spatial reasoning essential for real-world interactions. By treating game highlights as 'episodic memory of simulation,' they enable reinforcement learning at scale. With a $134M seed from Khosla, the vision is for spatial-temporal foundation models to power 80% of atoms-to-atoms interactions by 2030—primarily through scalable simulation rather than physical deployment.
04:57
04:57
The model learns from highlight clips to achieve superhuman performance.
08:41
08:41
The policy of the agent is demonstrated, and it's stated that this is a general recipe scalable to any environment.
11:32
11:32
Game data is better than YouTube for spatial reasoning because it involves simulating optical dynamics with hands
13:42
13:42
Reverse engineering has been key to my problem-solving approach.
23:53
23:53
Diamond world model ran on a consumer GPU with minimal data
33:04
33:04
An investor asks founders to draw a 2030 picture of their company and defend it from first principles
35:07
35:07
As model capabilities grow, less ground truth data is needed.
38:42
38:42
World models is a new space, allowing fresh contributions and ideas.
40:30
40:30
World models understand all possibilities and outcomes from the current state, not just predict video frames.
46:09
46:09
LLMs are useful as orchestrators but lack spatial context for real-world generalization.
57:03
57:03
Medal is the episodic memory of humanity in simulation
59:00
59:00
Reward models can be trained based on performance in gameplay clips.
1:01:30
1:01:30
We're open to sharing data for educational research and building real-world impact models.
1:02:09
1:02:09
GI models to handle 80% of AI-driven atoms-to-atoms interactions by 2030