scripod.com

Scaling Test Time Compute to Multi-Agent Civilizations — Noam Brown, OpenAI

Shownote

Solving Poker and Diplomacy, Debating RL+Reasoning with Ilya, what's *wrong* with the System 1/2 analogy, and where Test-Time Compute hits a wall Timestamps 00:00 Intro – Diplomacy, Cicero & World Championship 02:00 Reverse Centaur: How AI Improved Noam’s Human Play 05:00 Turing Test Failures in Chat: Hallucinations & Steerability 07:30 Reasoning Models & Fast vs. Slow Thinking Paradigm 11:00 System 1 vs. System 2 in Visual Tasks (GeoGuessr, Tic-Tac-Toe) 14:00 The Deep Research Existence Proof for Unverifiable Domains 17:30 Harnesses, Tool Use, and Fragility in AI Agents 21:00 The Case Against Over-Reliance on Scaffolds and Routers 24:00 Reinforcement Fine-Tuning and Long-Term Model Adaptability 28:00 Ilya’s Bet on Reasoning and the O-Series Breakthrough 34:00 Noam’s Dev Stack: Codex, Windsurf & AGI Moments 38:00 Building Better AI Developers: Memory, Reuse, and PR Reviews 41:00 Multi-Agent Intelligence and the “AI Civilization” Hypothesis 44:30 Implicit World Models and Theory of Mind Through Scaling 48:00 Why Self-Play Breaks Down Beyond Go and Chess 54:00 Designing Better Benchmarks for Fuzzy Tasks 57:30 The Real Limits of Test-Time Compute: Cost vs. Time 1:00:30 Data Efficiency Gaps Between Humans and LLMs 1:03:00 Training Pipeline: Pretraining, Midtraining, Posttraining 1:05:00 Games as Research Proving Grounds: Poker, MTG, Stratego 1:10:00 Closing Thoughts – Five-Year View and Open Research Directions

Highlights

This podcast delves into the intersection of AI and strategic games, focusing on advancements in Diplomacy AI, reasoning models, and multi-agent systems. The discussion highlights how AI has influenced human gameplay strategies, particularly through insights from the development of Cicero, which contributed to a world championship win. It also examines the challenges and opportunities in creating AI capable of passing the Turing test and achieving success in complex environments.
00:39
In 2022, Cicero, an AI for Diplomacy, was built and improved the speaker's skills.
03:51
Cicero's controllability valued by AI safety community.
07:57
Models can perform well in subjective domains with success measures
11:28
GPT-4.5 can play tic-tac-toe reasonably well but may need System 2 for perfect play.
16:27
Model routers may become unnecessary as models advance.
20:36
Reinforcement fine tuning specializes models for specific data.
24:07
OpenAI's success came from recognizing the potential of reasoning models and investing in scaling up.
32:04
OpenAI's success stemmed from betting on the scaling paradigm early
40:19
Well-aligned AI models could outperform humans in virtual assistant roles
44:20
Multi-agent field has been misguided with heuristic approaches
49:55
Current poker AIs mainly follow pre-computed GTO strategies and lack effective adaptation to opponents.
54:52
AlphaGo's success stems from large-scale pre-training and self-play.
58:53
Generative media attracts more public attention and drives subscriptions
1:00:46
Robotics progress is slower due to difficulty iterating on physical hardware.
1:06:51
Current benchmarks for AI models focus on easily measurable problems.
1:16:51
General reasoning techniques are more valuable than poker-specific approaches.

Chapters

Intro & Guest Welcome
00:00
Diplomacy AI & Cicero Insights
00:33
AI Safety, Language Models, and Steerability
03:49
O Series Models: Progress and Benchmarks
05:23
Reasoning Paradigm: Thinking Fast and Slow in AI
08:53
Design Questions: Harnesses, Tools, and Test Time Compute
14:02
Reinforcement Fine-tuning & Model Specialization
20:32
The Rise of Reasoning Models at OpenAI
21:52
Data Efficiency in Machine Learning
29:33
Coding & AI: Codex, Workflows, and Developer Experience
33:21
Multi-Agent AI: Collaboration, Competition, and Civilization
41:38
Poker, Diplomacy & Exploitative vs. Optimal AI Strategy
45:14
World Models, Multi-Agent Learning, and Self-Play
52:11
Generative Media: Image & Video Models
58:50
Robotics: Humanoids, Iteration Speed, and Embodiment
1:00:44
Rapid Fire: Research Practices, Benchmarks, and AI Progress
1:04:25
Games, Imperfect Information, and AI Research Directions
1:14:19

Transcript

Alessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel, and I'm joined by my co-host, swyx, founder of Small AI. swyx: Hello, hello. And we're here recording on a holiday Monday with Noam Brown from OpenAI....