scripod.com

World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI

Shownote

From building Medal into a 12M-user game clipping platform with 3.8B highlight moments to turning down a reported $500M offer from OpenAI (https://www.theinformation.com/articles/openai-offered-pay-500-million-startup-videogame-data) and raising a $134M se...

Highlights

In this deep dive, we explore the evolution of world models in AI through the journey of Pim DeWitte, whose gaming platform Medal became the foundation for General Intuition—a frontier lab betting on spatial-temporal intelligence as the next leap beyond large language models. With a unique dataset built from real human gameplay, the team is redefining how agents learn from visual input and action, moving far beyond passive video prediction into active, embodied simulation.
04:57
The model learns from highlight clips to achieve superhuman performance.
08:41
The policy of the agent is demonstrated, and it's stated that this is a general recipe scalable to any environment.
11:32
Game data is better than YouTube for spatial reasoning because it involves simulating optical dynamics with hands
13:42
Reverse engineering has been key to my problem-solving approach.
23:53
Diamond world model ran on a consumer GPU with minimal data
33:04
An investor asks founders to draw a 2030 picture of their company and defend it from first principles
35:07
As model capabilities grow, less ground truth data is needed.
38:42
World models is a new space, allowing fresh contributions and ideas.
40:30
World models understand all possibilities and outcomes from the current state, not just predict video frames.
46:09
LLMs are useful as orchestrators but lack spatial context for real-world generalization.
57:03
Medal is the episodic memory of humanity in simulation
59:00
Reward models can be trained based on performance in gameplay clips.
1:01:30
We're open to sharing data for educational research and building real-world impact models.
1:02:09
GI models to handle 80% of AI-driven atoms-to-atoms interactions by 2030

Chapters

Introduction and Medal's Gaming Data Advantage
00:00
Exclusive Demo: Vision-Based Gaming Agents
02:08
Action Prediction and Real-World Video Transfer
06:17
World Models: Interactive Video Generation
08:41
From Runescape to AI: Pim's Founder Journey
13:42
The Research Foundations: Diamond, Genie, and SEMA
16:45
Vinod Khosla's Largest Seed Bet Since OpenAI
33:03
Data Moats and Why GI Stayed Independent
35:04
Self-Teaching AI Fundamentals: The Francois Fleuret Course
38:42
Defining World Models vs Video Generation
40:28
Why Simulation Complexity Favors World Models
41:52
World Labs, Yann LeCun, and the Spatial Intelligence Race
43:30
Business Model: APIs, Agents, and Game Developer Partnerships
50:08
From Imitation Learning to RL: Making Clips Playable
58:57
Open Research, Academic Partnerships, and Hiring
1:00:15
2030 Vision: 80 Percent of Atoms-to-Atoms AI Interactions
1:02:09

Transcript

swyx: Hi, listeners. As you may know, I recently wrapped up the AIE Code Conference in New York, and while I'm traveling, I do like to visit top AI startups in person. To bring you interviews that you don't find on any other podcast that just does a Zoom c...