A Technical History of Generative Media

Latent Space: The AI Engineer Podcast

2025/09/05

A Technical History of Generative Media

Latent Space: The AI Engineer Podcast

2025/09/05

Overview Shownote Highlights Transcript Chapters Pins

Shownote

Today we are joined by Gorkem and Batuhan from Fal.ai, the fastest growing generative media inference provider. They recently raised a $125M Series C and crossed $100M ARR. We covered how they pivoted from dbt pipelines to diffusion models inference, what were the models that really changed the trajectory of image generation, and the future of AI videos. Enjoy! 00:00 - Introductions 04:58 - History of Major AI Models and Their Impact on Fal.ai 07:06 - Pivoting to Generative Media and Strategic Business Decisions 10:46 - Technical discussion on CUDA optimization and kernel development 12:42 - Inference Engine Architecture and Kernel Reusability 14:59 - Performance Gains and Latency Trade-offs 15:50 - Discussion of model latency importance and performance optimization 17:56 - Importance of Latency and User Engagement 18:46 - Impact of Open Source Model Releases and Competitive Advantage 19:00 - Partnerships with closed source model developers 20:06 - Collaborations with Closed-Source Model Providers 21:28 - Serving Audio Models and Infrastructure Scalability 22:29 - Serverless GPU infrastructure and technical stack 23:52 - GPU Prioritization: H100s and Blackwell Optimization 25:00 - Discussion on ASICs vs. General Purpose GPUs 26:10 - Architectural Trends: MMDiTs and Model Innovation 27:35 - Rise and Decline of Distillation and Consistency Models 28:15 - Draft Mode and Streaming in Image Generation Workflows 29:46 - Generative Video Models and the Role of Latency 30:14 - Auto-Regressive Image Models and Industry Reactions 31:35 - Discussion of OpenAI's Sora and competition in video generation 34:44 - World Models and Creative Applications in Games and Movies 35:27 - Video Models’ Revenue Share and Open-Source Contributions 36:40 - Rise of Chinese Labs and Partnerships 38:03 - Top Trending Models on Hugging Face and ByteDance's Role 39:29 - Monetization Strategies for Open Models 40:48 - Usage Distribution and Model Turnover on FAL 42:11 - Revenue Share vs. Open Model Usage Optimization 42:47 - Moderation and NSFW Content on the Platform 44:03 - Advertising as a key use case for generative media 45:37 - Generative Video in Startup Marketing and Virality 46:56 - LoRA Usage and Fine-Tuning Popularity 47:17 - LoRA ecosystem and fine-tuning discussion 49:25 - Post-Training of Video Models and Future of Fine-Tuning 50:21 - ComfyUI Pipelines and Workflow Complexity 52:31 - Requests for startups and future opportunities in the space 53:33 - Data Collection and RedPajama-Style Initiatives for Media Models 53:46 - RL for Image and Video Models: Unknown Potential 55:11 - Requests for Models: Editing and Conversational Video Models 57:12 - VO3 Capabilities: Lip Sync, TTS, and Timing 58:23 - Bitter Lesson and the Future of Model Workflows 58:44 - FAL's hiring approach and team structure 59:29 - Team Structure and Scaling Applied ML and Performance Teams 1:01:41 - Developer Experience Tools and Low-Code/No-Code Integration 1:03:04 - Improving Hiring Process with Public Challenges and Benchmarks 1:04:02 - Closing Remarks and Culture at FAL

Highlights

In this episode, we hear from Gorkem and Batuhan of Fal.ai, a leading generative media inference platform that has rapidly scaled to serve 2 million developers, host 350 models, and achieve $100M ARR—recently backed by a $125M Series C. The conversation centers on their technical evolution, strategic pivots, and vision for the future of AI-generated images and video.