scripod.com

A Technical History of Generative Media

Shownote

Today we are joined by Gorkem and Batuhan from Fal.ai, the fastest growing generative media inference provider. They recently raised a $125M Series C and crossed $100M ARR. We covered how they pivoted from dbt pipelines to diffusion models inference, what were the models that really changed the trajectory of image generation, and the future of AI videos. Enjoy! 00:00 - Introductions 04:58 - History of Major AI Models and Their Impact on Fal.ai 07:06 - Pivoting to Generative Media and Strategic Business Decisions 10:46 - Technical discussion on CUDA optimization and kernel development 12:42 - Inference Engine Architecture and Kernel Reusability 14:59 - Performance Gains and Latency Trade-offs 15:50 - Discussion of model latency importance and performance optimization 17:56 - Importance of Latency and User Engagement 18:46 - Impact of Open Source Model Releases and Competitive Advantage 19:00 - Partnerships with closed source model developers 20:06 - Collaborations with Closed-Source Model Providers 21:28 - Serving Audio Models and Infrastructure Scalability 22:29 - Serverless GPU infrastructure and technical stack 23:52 - GPU Prioritization: H100s and Blackwell Optimization 25:00 - Discussion on ASICs vs. General Purpose GPUs 26:10 - Architectural Trends: MMDiTs and Model Innovation 27:35 - Rise and Decline of Distillation and Consistency Models 28:15 - Draft Mode and Streaming in Image Generation Workflows 29:46 - Generative Video Models and the Role of Latency 30:14 - Auto-Regressive Image Models and Industry Reactions 31:35 - Discussion of OpenAI's Sora and competition in video generation 34:44 - World Models and Creative Applications in Games and Movies 35:27 - Video Models’ Revenue Share and Open-Source Contributions 36:40 - Rise of Chinese Labs and Partnerships 38:03 - Top Trending Models on Hugging Face and ByteDance's Role 39:29 - Monetization Strategies for Open Models 40:48 - Usage Distribution and Model Turnover on FAL 42:11 - Revenue Share vs. Open Model Usage Optimization 42:47 - Moderation and NSFW Content on the Platform 44:03 - Advertising as a key use case for generative media 45:37 - Generative Video in Startup Marketing and Virality 46:56 - LoRA Usage and Fine-Tuning Popularity 47:17 - LoRA ecosystem and fine-tuning discussion 49:25 - Post-Training of Video Models and Future of Fine-Tuning 50:21 - ComfyUI Pipelines and Workflow Complexity 52:31 - Requests for startups and future opportunities in the space 53:33 - Data Collection and RedPajama-Style Initiatives for Media Models 53:46 - RL for Image and Video Models: Unknown Potential 55:11 - Requests for Models: Editing and Conversational Video Models 57:12 - VO3 Capabilities: Lip Sync, TTS, and Timing 58:23 - Bitter Lesson and the Future of Model Workflows 58:44 - FAL's hiring approach and team structure 59:29 - Team Structure and Scaling Applied ML and Performance Teams 1:01:41 - Developer Experience Tools and Low-Code/No-Code Integration 1:03:04 - Improving Hiring Process with Public Challenges and Benchmarks 1:04:02 - Closing Remarks and Culture at FAL

Highlights

In this episode, we hear from Gorkem and Batuhan of Fal.ai, a leading generative media inference platform that has rapidly scaled to serve 2 million developers, host 350 models, and achieve $100M ARR—recently backed by a $125M Series C. The conversation centers on their technical evolution, strategic pivots, and vision for the future of AI-generated images and video.
02:40
Model release days happen weekly and are the best part of the platform
04:58
VO3 created a usable text-to-video component
07:06
Chose not to compete in language models to avoid head-to-head rivalry with Google, OpenAI, and Anthropic
10:47
Optimizing Stable Diffusion 1.5 reduced inference time from 10 to 2 seconds
12:54
On average, a model on FAL runs 10x faster than self-hosting.
15:01
Image responses can't be streamed like language model responses
15:52
Latency is critical for generative media user experience
17:57
They package the inference engine so clients can self-service and get high performance without showing their code
18:47
Working with four major video companies and one undisclosed image company, which is sensitive for them
19:02
FAL can scale up to thousands of GPUs instantly
20:07
FAL and PlayHT achieved deep collaboration to optimize inference and infrastructure for real-time text-to-speech
21:29
They built their own orchestration layer, distributed file system, and container runtimes to ensure fast cold starts and handle scale
22:30
A team is working with NVIDIA to write custom Blackwell kernels for diffusion transformers to make it cost-effective
23:53
Building ASICs doesn't make sense for NVIDIA due to diverse diffusion workloads and the need for flexibility
25:02
Researchers prefer novel changes over iterative improvements like SDXL Lightning
26:10
A two-stage process—consistency models for drafting and real models for upscaling—improves image generation quality and control
27:40
Creators generate many videos at once and need to wait and iterate, so faster speeds are important
28:19
Anthropic's lack of an image generation model is due to its own priorities, not competitive disadvantage
29:50
Google mentioned 'generative media' in its last announcement, which is a win
30:16
Best-case scenarios for controllable video models from world models offer boundless possibilities in movies and games
33:59
Alibaba's updated video model runs draft mode in under five seconds and full 720p in 20 seconds
34:45
Using single-frame instead of multiple frames can yield a good text-image model due to video data
35:29
Training video models costs a couple million dollars but can bring a lot of attention, especially in a competitive LL space
36:44
Whether to make money from open-source models depends on a company's goals
38:04
The usage distribution of models follows a power-law but is not as extreme as expected and changes monthly
39:35
NSFW content is almost negligible, with moderation for illegal content and tracking of non-illegal NSFW content
40:48
Advertising, especially video advertising, is growing, while the claim of revolutionizing Hollywood filmmaking is considered less interesting
42:12
Generative technology is well-suited for advertising as it allows for unlimited ad creation and more personalized ads have greater economic value.
42:49
In 6–12 months, 80–90% of viral video content could be AI-generated
44:07
Only open-source models have a rich LoRA ecosystem
45:41
Training LoRA with 6–20 images for 1000 steps can achieve 99% accuracy
46:57
Many companies may focus on post-training open-source video models in the next six months to a year
47:18
As models improve, ComfyUI workflows for images are getting simpler, while those for video remain complex
49:28
Many startups are reinventing the wheel in AI data collection for image and video models
50:21
FAL could build an image dataset like Together AI did with RedPajama
52:31
State-of-the-art image models are cheap to train mainly due to data engineering, not algorithmic advances
53:34
VO3 can generalize and handle scenes, unlike post-trained models which are good for conversations but lack generalization ability
53:47
VO3 has the most accurate lip-sync compared to other models
55:11
Waiting for bigger models is a 'bitter lesson'
57:17
Those who can write a sparse attention kernel with BF16 on Blackwell should join FAL
58:23
The team has a high culture bar as the team loves generative media and would do it even if it weren't their job
58:44
Hired an Applied ML engineer with a top Hugging Face space and another specializing in training LORAs on FAL
59:30
A kernel bench is proposed to evaluate kernel stability and performance

Chapters

Introductions
00:00
History of Major AI Models and Their Impact on Fal.ai
04:58
Pivoting to Generative Media and Strategic Business Decisions
07:06
Technical discussion on CUDA optimization and kernel development
10:46
Inference Engine Architecture and Kernel Reusability
12:42
Performance Gains and Latency Trade-offs
14:59
Discussion of model latency importance and performance optimization
15:50
Importance of Latency and User Engagement
17:56
Impact of Open Source Model Releases and Competitive Advantage
18:46
Partnerships with closed source model developers
19:00
Collaborations with Closed-Source Model Providers
20:06
Serving Audio Models and Infrastructure Scalability
21:28
Serverless GPU infrastructure and technical stack
22:29
GPU Prioritization: H100s and Blackwell Optimization
23:52
Discussion on ASICs vs. General Purpose GPUs
25:00
Architectural Trends: MMDiTs and Model Innovation
26:10
Rise and Decline of Distillation and Consistency Models
27:35
Draft Mode and Streaming in Image Generation Workflows
28:15
Generative Video Models and the Role of Latency
29:46
Auto-Regressive Image Models and Industry Reactions
30:14
Discussion of OpenAI's Sora and competition in video generation
31:35
World Models and Creative Applications in Games and Movies
34:44
Video Models’ Revenue Share and Open-Source Contributions
35:27
Rise of Chinese Labs and Partnerships
36:40
Top Trending Models on Hugging Face and ByteDance's Role
38:03
Monetization Strategies for Open Models
39:29
Usage Distribution and Model Turnover on FAL
40:48
Revenue Share vs. Open Model Usage Optimization
42:11
Moderation and NSFW Content on the Platform
42:47
Advertising as a key use case for generative media
44:03
Generative Video in Startup Marketing and Virality
45:37
LoRA Usage and Fine-Tuning Popularity
46:56
LoRA ecosystem and fine-tuning discussion
47:17
Post-Training of Video Models and Future of Fine-Tuning
49:25
ComfyUI Pipelines and Workflow Complexity
50:21
Requests for startups and future opportunities in the space
52:31
Data Collection and RedPajama-Style Initiatives for Media Models
53:33
RL for Image and Video Models: Unknown Potential
53:46
Requests for Models: Editing and Conversational Video Models
55:11
VO3 Capabilities: Lip Sync, TTS, and Timing
57:12
Bitter Lesson and the Future of Model Workflows
58:23
FAL's hiring approach and team structure
58:44
Team Structure and Scaling Applied ML and Performance Teams
59:29

Transcript

Alessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, founder of Kernel Labs, and I'm joined by Spix, founder of Small AI. Spix: Hello, hello. Today, we're so excited to be in the studio with Gorkem and Batuhan of Fal.ai. Welcome. ...