scripod.com

The Future of Audio AI: Insights from Mati Staniszewski of ElevenLabs

In this episode, Guy Podjarny welcomes Mati Staniszewski, CEO and co-founder of ElevenLabs, a company transforming the landscape of AI audio technology. Mati recounts how a simple frustration with poor movie dubbing in his native Poland led to the creation of a powerful platform that now serves millions of users worldwide. The conversation explores the technical evolution of ElevenLabs, the challenges of building emotionally expressive AI voices, and the company’s vision for the future of audio AI.
Mati shares how ElevenLabs began as a response to low-quality dubbing, evolving into a leading AI audio platform with innovations in text-to-speech, voice cloning, and dubbing. The team built custom models using transformers and diffusion techniques, emphasizing modularity for developers. They later expanded into speech-to-text and focused on end-to-end conversational systems. The company serves a wide range of industries through flexible APIs and interfaces, while addressing security and competition concerns. Looking ahead, ElevenLabs is investing in multimodal voice AI and real-time communication, all while maintaining a unique, collaborative company culture that values ideas over hierarchy.
00:00
00:00
ElevenLabs raised $180 million and serves millions of users generating a thousand years of audio
14:44
14:44
Introduced text-to-speech and voice cloning with model-driven voice component decisions
32:35
32:35
Conversational AI is seen as the future of digital interactions, with potential in healthcare, 911 training, and customer support.
35:25
35:25
ElevenLabs simultaneously releases interfaces and APIs, serving as a horizontal layer for audio AI while offering end-to-end solutions in conversational AI and media entertainment.
45:52
45:52
Real-time cross-language communication is the holy grail for audio AI applications.