scripod.com

Scaling Test Time Compute to Multi-Agent Civilizations — Noam Brown, OpenAI

This podcast delves into the intersection of AI and strategic games, focusing on advancements in Diplomacy AI, reasoning models, and multi-agent systems. The discussion highlights how AI has influenced human gameplay strategies, particularly through insights from the development of Cicero, which contributed to a world championship win. It also examines the challenges and opportunities in creating AI capable of passing the Turing test and achieving success in complex environments.
The podcast explores the impact of AI on strategic games, emphasizing Diplomacy and poker. Insights from Cicero's development improved human gameplay, showcasing AI's potential beyond direct competition. The conversation addresses AI safety, controllability, and the evolution of O-Series models, which challenge misconceptions about reasoning models' applicability in subjective domains. The System 1/System 2 paradigm is applied to AI, highlighting the need for intuitive capabilities to enhance deliberate reasoning. Test-time compute and reinforcement fine-tuning are discussed as methods to improve model adaptability and specialization. OpenAI's journey with reasoning models reflects a shift towards scaling paradigms, improving data efficiency and surpassing human capabilities. Multi-agent research focuses on cooperative and competitive AI systems, advocating for principled approaches over heuristics. Poker strategies contrast Game Theory Optimal with exploitative methods, revealing challenges in modeling opponents effectively. World modeling and self-play face limitations outside two-player zero-sum games, necessitating new frameworks. Advancements in generative media and robotics highlight diverse AI applications, while benchmarks and societal impacts underscore ongoing research directions.
00:39
00:39
In 2022, Cicero, an AI for Diplomacy, was built and improved the speaker's skills.
03:51
03:51
Cicero's controllability valued by AI safety community.
07:57
07:57
Models can perform well in subjective domains with success measures
11:28
11:28
GPT-4.5 can play tic-tac-toe reasonably well but may need System 2 for perfect play.
16:27
16:27
Model routers may become unnecessary as models advance.
20:36
20:36
Reinforcement fine tuning specializes models for specific data.
24:07
24:07
OpenAI's success came from recognizing the potential of reasoning models and investing in scaling up.
32:04
32:04
OpenAI's success stemmed from betting on the scaling paradigm early
40:19
40:19
Well-aligned AI models could outperform humans in virtual assistant roles
44:20
44:20
Multi-agent field has been misguided with heuristic approaches
49:55
49:55
Current poker AIs mainly follow pre-computed GTO strategies and lack effective adaptation to opponents.
54:52
54:52
AlphaGo's success stems from large-scale pre-training and self-play.
58:53
58:53
Generative media attracts more public attention and drives subscriptions
1:00:46
1:00:46
Robotics progress is slower due to difficulty iterating on physical hardware.
1:06:51
1:06:51
Current benchmarks for AI models focus on easily measurable problems.
1:16:51
1:16:51
General reasoning techniques are more valuable than poker-specific approaches.