scripod.com

Inside Anthropic: How Claude Actually Gets Built | Alex Albert

Behind the Craft
In this podcast, Alex Albert, a research PM at Anthropic, provides an inside look at how the team develops the next Claude model, covering everything from product development and user feedback integration to character training and consciousness research.
Anthropic treats each new Claude model as a product, setting requirements early and discovering capabilities during training. They use user feedback, analyzed by Claude itself, to refine features like adaptive thinking and memory, which includes a 'dreaming' process for pruning memories. Decision-making focuses on identifying irreversible 'one-way doors' for careful deliberation, while reversible decisions are treated as cheap. Internally, Claude is used for coding, data access, and brainstorming. Evaluating models involves creating test cases to identify weaknesses and collaborating on interventions like pre-training or RL. Character training combines quantifiable metrics with human intuition from transcripts. Anthropic also researches Claude's consciousness to improve interactions and trustworthiness, without taking an official stance.
00:00
00:00
Claude models are developed like products
04:58
04:58
Context about the user influences whether Claude should think hard
09:41
09:41
AI tools drastically reduced the cost and time to prototype.
11:15
11:15
One-way doors require more time and thought.
15:40
15:40
AI challenges assumptions to sharpen decisions
17:05
17:05
Creating test cases to identify weaknesses
21:02
21:02
Character is both quantifiable metrics and human intuition.
32:46
32:46
Research improves trustworthiness for autonomous tasks