scripod.com

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Lex Fridman Podcast

Shownote

Dylan Patel is the founder of SemiAnalysis, a research & analysis company specializing in semiconductors, GPUs, CPUs, and AI hardware. Nathan Lambert is a research scientist at the Allen Institute for AI (Ai2) and the author of a blog on AI called Intercon...

Highlights

This podcast features a deep dive into the world of AI and semiconductors, led by Dylan Patel, founder of SemiAnalysis, and Nathan Lambert, a research scientist at the Allen Institute for AI. They explore the latest advancements in AI models, particularly DeepSeek's R1 and V3, and discuss the implications of export controls on GPUs to China. The conversation also touches on geopolitical tensions, AGI timelines, and the future of AI infrastructure.
00:00
DeepSeek's models outperform OpenAI's O3-Mini in certain tasks
30:39
DeepSeek R1 breaks down problems into detailed reasoning steps.
46:10
An auxiliary loss ensures all experts are utilized during training, balancing expert usage more effectively.
1:01:20
DeepSeek leveraged High Flyer's 10,000 A100 GPUs starting from 2021.
1:17:08
Language models are considered a form of AGI by some.
1:24:23
Implementing powerful models like GPT-3 or AGI in everyday applications is currently impractical due to high computational costs.
1:28:49
Export controls on advanced chips to China may inadvertently benefit China in the long term.
1:36:31
U.S. export restrictions on GPUs have heightened concerns about China's potential military actions.
1:59:14
Geopolitical implications of U.S.-China relations in semiconductors
2:09:36
Attention mechanism has a quadratic memory cost relative to context length
2:19:30
DeepSeek's R1 model reduces memory pressure by 80-90%, making it cheaper and more efficient.
2:35:05
Backdoors in open-source models could be influenced by governmental requirements or malicious intent.
2:47:07
Llama 2's overly cautious responses due to safety prioritization in RLHF.
2:57:58
Reinforcement learning can achieve breakthroughs akin to Move 37 in Go.
3:20:29
Inference cost for GPT-3 dropped from $60-$70 to 5 cents per million tokens.
3:24:25
DeepSeek R1 caused a significant drop in NVIDIA's stock.
3:33:55
DeepSeek struggles with insufficient GPUs, leading to limited app functionality and slow response times.
3:37:55
Companies face scrutiny for using internet text without permission while training models.
3:45:59
AI megaclusters could consume 10% of U.S. power by 2028-2030
4:26:19
Chat applications have limited monetization potential and will likely be supported by ads as costs decrease.
4:31:36
True agents should be open-ended and capable of solving tasks independently.
4:42:49
AI can enhance business efficiency by automating workflows and improving outdated tools.
4:47:48
Tulu incorporates fully open code and data for post-training.
4:56:55
Trump's executive actions streamline permitting for data centers on federal land.
5:09:24
Openness and inclusivity are crucial for shaping AI development.

Chapters

Introduction
00:00
DeepSeek-R1 and DeepSeek-V3
13:28
Low cost of training
35:02
DeepSeek compute cluster
1:01:19
Export controls on GPUs to China
1:08:52
AGI timeline
1:19:10
China's manufacturing capacity
1:28:35
Cold war with China
1:36:30
TSMC and Taiwan
1:41:00
Best GPUs for AI
2:04:38
Why DeepSeek is so cheap
2:19:30
Espionage
2:32:49
Censorship
2:41:52
Andrej Karpathy and magic of RL
2:54:46
OpenAI o3-mini vs DeepSeek r1
3:05:17
NVIDIA
3:24:25
GPU smuggling
3:28:53
DeepSeek training on OpenAI data
3:35:30
AI megaclusters
3:45:59
Who wins the race to AGI?
4:21:21
AI agents
4:31:34
Programming and AI
4:40:16
Open source
4:47:43
Stargate
4:56:55
Future of AI
5:04:24

Transcript

Lex Fridman: The following is a conversation with Dylan Patel and Nathan Lambert. Dylan runs SemiAnalysis, a well-respected research and analysis company that specializes in semiconductors, GPUs, CPUs, and AI hardware in general. Nathan is a research scien...