[AIEWF Preview] Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect
Latent Space: The AI Engineer Podcast
2025/05/23
[AIEWF Preview] Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect
[AIEWF Preview] Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect

Latent Space: The AI Engineer Podcast
2025/05/23
Shownote
Shownote
In an otherwise heavy week packed with Microsoft Build, Google I/O, and OpenAI io, the worst kept secret in biglab land was the launch of Claude 4, particularly the triumphant return of Opus, which many had been clamoring for. We will leave the specific Claude 4 recap to AINews, however we think that both Gemini’s progress on Deep Think this week and Claude 4 represent the next frontier of progress on inference time compute/reasoning (at last until GPT5 ships this summer).
Will Brown’s talk at AIE NYC and open source work on verifiers have made him one of the most prominent voices able to publicly discuss (aka without the vaguepoasting LoRA they put on you when you join a biglab) the current state of the art in reasoning models and where current SOTA research directions lead. We discussed his latest paper on Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment and he has previewed his AIEWF talk on Agentic RL for those with the temerity to power thru bad meetup audio.
Highlights
Highlights
This podcast delves into the latest advancements in AI, focusing on Claude 4 and Opus's return. It explores reasoning capabilities, tool use, and safety measures in AI models, alongside insights from Will Brown’s work on verifiers and multi-turn reinforcement learning.
Chapters
Chapters
Introduction and Episode Overview
00:00Discussion on Cloud 4 and its Features
02:01Reasoning and Tool Use in AI Models
04:31Extended Thinking in Claude and Model Differences
07:01Speculation on Claude's Extended Thinking
09:31Challenges and Controversies in AI Model Training
11:01Technical Highlights and Code Trustworthiness
13:31Token Costs and Incentives in AI Models
16:01Thinking Budgets and AI Effort
18:31Safety and Ethics in AI Model Development
21:01Anthropic's Approach to AI Safety
23:31LLM Arena and Evaluation Challenges
26:01Developing Taste and Direction in AI Research
28:31Recent Research and Multi-Turn RL
31:01Tools and Incentives in AI Model Development
33:31Challenges in Evaluating AI Model Outputs
36:01Model-Based Rewards and Future Directions
38:31Transcript
Transcript
Will Brown: Hello, AI engineers. We're back with a quick reaction pod for Claude 4 with the new reasoning research lead for Prime Intellect, Will Brown. Will Brown's talk at AIEWF talk and open source work on verifiers have made him one of the most promine...