The development of ultra-realistic human voices is upon us, and Mahmoud Felfel's Play.ht is leading the next generation of text-to-voice models. In this episode we discuss the challenges and opportunities of automating a more human voice, as well as concerns about deep fakes and user safety.
Check out the debut of Erik Torenberg's new podcast Upstream. This coming season features interviews with Marc Andreessen (Episode 1 live now), David Sacks, Ezra Klein, Balaji Srinivasan, Katherine Boyle, and more. Subscribe here: https://www.youtube.com/@UpstreamwithErikTorenberg
Timestamps for E10: Mahmoud Felfel of Play.ht
(0:00) Preview of Mahmoud on this episode
(0:55) Sponsor: Omneky.com
(1:45) Nathan clones his voice using Play.ht
(6:11) Why Mahmoud started Play.ht and the problem they tried to solve
(13:08) The job to be done for Play.ht & how they’re thinking about APIs and models
(24:45) Mahmoud breaks down the architecture of Play.ht
(29:30) How the use cases have evolved
(30:00) New markets and opportunities with creators
(37:00) Are we all about to become prompt engineers/directors?
(44:50) Roadmap to other languages beyond English
(48:00) Managing the compute(52:00) If AI-generated voices becomes a commodity, what will happen?
(55:00) Why bigger companies are late adopters of AI tools
(56:30) The long-term moat of Play.ht and other applications
(1:00:00) Controversial voice-cloning and potential for societal abuse
(1:10:32) Commonly abused voices
(1:12:36) Rapid fire questions
*Thank you Omneky for sponsoring The Cognitive Revolution. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work, customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.*
Join 1000's of subscribers of our Substack: https://cognitiverevolution.substack