scripod.com

Reiner Pope – The math behind how LLMs are trained and served

Dwarkesh Podcast

Shownote

Did a very different format with Reiner Pope - a blackboard lecture where he walks through how frontier LLMs are trained and served. It’s shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and so...

Highlights

In this episode, Reiner Pope delivers an insightful blackboard-style lecture unpacking the hardware-aware realities of training and serving frontier large language models—using first-principles reasoning, public pricing data, and fundamental constraints of modern GPU architecture.
15:16
The FLOPS-to-memory-bandwidth ratio (~300) is a stable hardware invariant that directly determines minimum effective batch size.
32:09
The MoE layer uses a router to dynamically assign tokens to sparse MLP experts, with expert parallelism distributing them across GPUs.
1:00:14
Pipeline parallelism reduces memory per rack but offers diminishing returns with modern hardware
1:10:08
Pipeline parallelism reduces weight memory but not activation memory; KV savings are offset by in-flight sequences
1:28:37
Current pre-training token count is about 100 times larger than the Chinchilla-optimal count
1:47:30
Cache hits are 10x cheaper than cache writes
2:10:41
Reversible transformer layers save memory via activation rematerialization

Chapters

How batch size affects token cost and speed
00:00
How MoE models are laid out across GPU racks
32:09
How pipeline parallelism spreads model layers across racks
47:12
Why Ilya said, “As we now know, pipelining is not wise.”
1:03:37
Because of RL, models may be 100x over-trained beyond Chinchilla-optimal
1:18:59
Deducing long context memory costs from API pricing
1:33:02
Convergent evolution between neural nets and cryptography
2:04:02

Transcript

Dwarkesh Patel: Today, I'm interviewing Reiner Pope, who is CEO of MatX, which is a new chip startup. Previously, he was doing TPU architecture and many other things at Google. This is a very different format from my usual interviews. This is going to be a...