#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Lex Fridman Podcast
2025/02/03
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Lex Fridman Podcast
2025/02/03
This podcast features a deep dive into the world of AI and semiconductors, led by Dylan Patel, founder of SemiAnalysis, and Nathan Lambert, a research scientist at the Allen Institute for AI. They explore the latest advancements in AI models, particularly DeepSeek's R1 and V3, and discuss the implications of export controls on GPUs to China. The conversation also touches on geopolitical tensions, AGI timelines, and the future of AI infrastructure.
Dylan Patel and Nathan Lambert delve into the technical and geopolitical aspects of AI development. DeepSeek's models, R1 and V3, excel in instruction and reasoning tasks, leveraging efficient training methods like reinforcement learning from human feedback. The company's innovative architecture reduces computational costs, enabling over 600 billion parameters while only activating 37 billion at a time. DeepSeek operates a vast compute cluster with around 50,000 GPUs, aiming for AGI. Export controls on GPUs to China, particularly NVIDIA’s H100 and H800 models, are scrutinized for their impact on AI development and the strategic importance of maintaining a technological gap. The discussion also explores China's manufacturing capacity, potential military use of AI, and the role of TSMC in semiconductor production. Challenges in achieving AGI, including economic constraints and geopolitical concerns, are highlighted. The podcast concludes with reflections on the future of AI, emphasizing both its potential to amplify human capabilities and the risks associated with rapid advancements.
00:00
00:00
DeepSeek's models outperform OpenAI's O3-Mini in certain tasks
30:39
30:39
DeepSeek R1 breaks down problems into detailed reasoning steps.
46:10
46:10
An auxiliary loss ensures all experts are utilized during training, balancing expert usage more effectively.
1:01:20
1:01:20
DeepSeek leveraged High Flyer's 10,000 A100 GPUs starting from 2021.
1:17:08
1:17:08
Language models are considered a form of AGI by some.
1:24:23
1:24:23
Implementing powerful models like GPT-3 or AGI in everyday applications is currently impractical due to high computational costs.
1:28:49
1:28:49
Export controls on advanced chips to China may inadvertently benefit China in the long term.
1:36:31
1:36:31
U.S. export restrictions on GPUs have heightened concerns about China's potential military actions.
1:59:14
1:59:14
Geopolitical implications of U.S.-China relations in semiconductors
2:09:36
2:09:36
Attention mechanism has a quadratic memory cost relative to context length
2:19:30
2:19:30
DeepSeek's R1 model reduces memory pressure by 80-90%, making it cheaper and more efficient.
2:35:05
2:35:05
Backdoors in open-source models could be influenced by governmental requirements or malicious intent.
2:47:07
2:47:07
Llama 2's overly cautious responses due to safety prioritization in RLHF.
2:57:58
2:57:58
Reinforcement learning can achieve breakthroughs akin to Move 37 in Go.
3:20:29
3:20:29
Inference cost for GPT-3 dropped from $60-$70 to 5 cents per million tokens.
3:24:25
3:24:25
DeepSeek R1 caused a significant drop in NVIDIA's stock.
3:33:55
3:33:55
DeepSeek struggles with insufficient GPUs, leading to limited app functionality and slow response times.
3:37:55
3:37:55
Companies face scrutiny for using internet text without permission while training models.
3:45:59
3:45:59
AI megaclusters could consume 10% of U.S. power by 2028-2030
4:26:19
4:26:19
Chat applications have limited monetization potential and will likely be supported by ads as costs decrease.
4:31:36
4:31:36
True agents should be open-ended and capable of solving tasks independently.
4:42:49
4:42:49
AI can enhance business efficiency by automating workflows and improving outdated tools.
4:47:48
4:47:48
Tulu incorporates fully open code and data for post-training.
4:56:55
4:56:55
Trump's executive actions streamline permitting for data centers on federal land.
5:09:24
5:09:24
Openness and inclusivity are crucial for shaping AI development.