[AIEWF Preview] Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect

Latent Space: The AI Engineer Podcast

2025/05/23

[AIEWF Preview] Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect

Latent Space: The AI Engineer Podcast

2025/05/23

Overview Shownote Highlights Transcript Chapters Pins

This podcast delves into the latest advancements in AI, focusing on Claude 4 and Opus's return. It explores reasoning capabilities, tool use, and safety measures in AI models, alongside insights from Will Brown’s work on verifiers and multi-turn reinforcement learning.

The discussion highlights the evolution of AI agents, emphasizing Claude 4's extended thinking mode and its implications for inference time compute. Key points include managing token costs effectively through thinking budgets, ensuring code trustworthiness, and addressing ethical considerations in model development. The speakers stress the importance of stress testing to align AI behavior with societal norms. They also examine challenges in evaluating model outputs and integrating tools into reward systems, advocating for model-based rewards that enhance flexibility. Additionally, the conversation touches on Anthropic's safety approaches and the potential of academia as an unbiased evaluator of AI models. Finally, the hosts preview upcoming research directions and practical applications in agentic reinforcement learning.