scripod.com

The Shape of Compute (Chris Lattner of Modular)

Shownote

Chris Lattner of Modular (https://modular.com) joined us (again!) to talk about how they are breaking the CUDA monopoly, what it took to match NVIDIA performance with AMD, and how they are building a company of "elite nerds". X: https://x.com/latentspacepod Substack: https://latent.space 00:00:00 Introductions 00:00:12 Overview of Modular and the Shape of Compute 00:02:27 Modular’s R&D Phase 00:06:55 From CPU Optimization to GPU Support 00:11:14 MAX: Modular’s Inference Framework 00:12:52 Mojo Programming Language 00:18:25 MAX Architecture: From Mojo to Cluster-Scale Inference 00:29:16 Open Source Contributions and Community Involvement 00:32:25 Modular's Differentiation from VLLM and SGLang 00:41:37 Modular’s Business Model and Monetization Strategy 00:53:17 DeepSeek’s Impact and Low-Level GPU Programming 01:00:00 Inference Time Compute and Reasoning Models 01:02:31 Personal Reflections on Leading Modular 01:08:27 Daily Routine and Time Management as a Founder 01:13:24 Using AI Coding Tools and Staying Current with Research 01:14:47 Personal Projects and Work-Life Balance 01:17:05 Hiring, Open Source, and Community Engagement

Highlights

In this episode, Chris Lattner of Modular discusses the company's efforts to revolutionize GPU programming and democratize AI. The conversation explores how Modular is breaking CUDA's monopoly, achieving high performance on AMD GPUs, and fostering a community of 'elite nerds' through open-source contributions and innovative technologies.
02:34
Focused on proving the impossible without NVIDIA and CUDA
06:55
Modular's three-year journey started with proving compilation philosophy on CPUs.
11:15
MAX is open-source, efficient, and avoids CUDA dependencies.
12:52
Mojo aims to expose the full power of hardware, be portable across vendors, and offer usability.
18:25
MAX focuses on inference and integrates with Mojo for automatic kernel fusion.
29:16
Building Flash Attention using Mojo beats reference implementations
32:25
Modular focuses on AI and Gen AI, inspired by the inaccessibility of large labs' resources.
46:17
Mojo had a soft start at version 0.1, focusing on real customer needs.
53:17
DeepSeek's open publication advanced AI progress significantly.
1:00:02
Inference is now part of training due to reasoning models
1:02:31
Losing team members feels more personal at a startup.
1:10:33
Mojo allows expressing all hardware capabilities with readable code.
1:13:24
Use Slack, Reddit, RSS, and arXiv to track research papers
1:15:30
Using a bandsaw is safe and great for teaching kids about woodworking.
1:17:05
More people should program GPUs as it's a huge industry opportunity.

Chapters

Introductions
00:00
Overview of Modular and the Shape of Compute
00:12
Modular’s R&D Phase
02:27
From CPU Optimization to GPU Support
06:55
MAX: Modular’s Inference Framework
11:14
Mojo Programming Language
12:52
MAX Architecture: From Mojo to Cluster-Scale Inference
18:25
Open Source Contributions and Community Involvement
29:16
Modular's Differentiation from VLLM and SGLang
32:25
Modular’s Business Model and Monetization Strategy
41:37
DeepSeek’s Impact and Low-Level GPU Programming
53:17
Inference Time Compute and Reasoning Models
1:00:00
Personal Reflections on Leading Modular
1:02:31
Daily Routine and Time Management as a Founder
1:08:27
Using AI Coding Tools and Staying Current with Research
1:13:24
Personal Projects and Work-Life Balance
1:14:47
Hiring, Open Source, and Community Engagement
1:17:05

Transcript

Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel, and I'm joined by my co-host, this week's founder of Modular. And we're so excited to be back in the studio with Chris Lattner from Modular. Welcome ba...