Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast

22 HOURS AGO

Overview Shownote Highlights Transcript Chapters Pins

In this episode, Eric Jang revisits AlphaGo not as a historical artifact, but as a pedagogical and architectural blueprint for understanding intelligence—particularly how search, learning from experience, and self-play interact to solve problems with vast combinatorial spaces.

Jang walks through building AlphaGo from scratch using modern tools, emphasizing Monte Carlo Tree Search (MCTS) as a solution to Go’s exponential complexity—guiding exploration via neural policy and value networks while sidestepping the credit assignment problem that plagues naive RL. Unlike LLMs trained with high-variance policy gradients over long token sequences, AlphaGo’s MCTS provides precise, move-level training targets, enabling efficient distillation of search into neural weights. He contrasts on-policy self-play with off-policy methods, noting AlphaGo Zero’s replay buffer strategically samples near-optimal states to avoid compounding errors. The discussion extends to why MCTS doesn’t translate directly to language modeling—due to lack of well-defined value estimation and deterministic outcomes—and highlights how supervised pretraining with soft targets, not raw RL, underpins stable early learning. Finally, Jang reflects on automating AI research: while LLMs now handle implementation and hyperparameter tuning, selecting high-leverage questions and escaping dead ends remains uniquely human—for now.