"Descript Isn't a Slop Machine": Laura Burkhauser on the AI Tools Creators Love and Hate

Descript CEO Laura Burkhauser discusses how the company evaluates AI models, responds to creator concerns about low-quality slop, and builds tools such as Underlord for video understanding, editing, APIs, and creative workflows.

"Descript Isn't a Slop Machine": Laura Burkhauser on the AI Tools Creators Love and Hate

Watch Episode Here


Listen to Episode Here


Show Notes

Laura Burkhauser, CEO of Descript, explains how the company is navigating the tension between powerful AI tools and creator backlash against “slop.” She shares how Descript chooses which models to use, why reliability and multimodal understanding matter, and how the team balances frontier models with in-house task-specific systems. The conversation also covers Underlord, agentic video editing, API design for coding agents, and what AI means for the future of creative work.

LINKS:

Sponsors:

Sequence:

Sequence handles the full revenue workflow for complex pricing, from quoting and metering to invoicing, revenue recognition, and collections. Book a public demo at https://sequencehq.com and use code Cognizant in the source field to save 20% off year one

Claude:

Claude by Anthropic is an AI collaborator that understands your workflow and helps you tackle research, writing, coding, and organization with deep context. Get started with Claude and explore Claude Pro at https://claude.ai/tcr

AvePoint:

AvePoint is building the control layer for AI agents so you can securely govern, audit, and recover every action at scale. Design trusted agentic outcomes from day one at https://avpt.co/tcr

CHAPTERS:

(00:00) About the Episode

(03:56) What is slop

(12:24) Creator AI tensions (Part 1)

(20:24) Sponsors: Sequence | Claude

(23:23) Creator AI tensions (Part 2)

(23:23) Selecting generative models

(34:46) Underlord video understanding (Part 1)

(34:53) Sponsor: AvePoint

(36:00) Underlord video understanding (Part 2)

(41:55) Proprietary data advantage

(50:44) Generalized agent harness

(57:31) API and bundling

(01:05:04) Automation and jobs

(01:10:26) Pricing AI work

(01:14:20) Art beyond slop

(01:19:04) Episode Outro

(01:23:02) Outro

PRODUCED BY:

https://aipodcast.ing

SOCIAL LINKS:

Website: https://www.cognitiverevolution.ai

Twitter (Podcast): https://x.com/cogrev_podcast

Twitter (Nathan): https://x.com/labenz

LinkedIn: https://linkedin.com/in/nathanlabenz/

Youtube: https://youtube.com/@CognitiveRevolutionPodcast

Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431

Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk


Transcript

This transcript is automatically generated; we strive for accuracy, but errors in wording or speaker identification may occur. Please verify key details when needed.


Introduction

[00:00] Hello, and welcome back to the Cognitive Revolution!

Today my guest is Laura Burkhauser, CEO of the pioneering video editing platform Descript, which originally burst onto the scene in 2017 with its revolutionary AI-powered, word-processor-like editing paradigm, and as you'll hear, has continued to push the boundaries of what AI can do for creators ever since.

Laura took over for Descript founder Andrew Mason, who was my guest on the show back in August 2024, after serving as VP of Product for several years, and as a longtime Descript customer and early adopter on their new Underlord API, I've been impressed by their customer obsession and product velocity, and was genuinely excited to get Laura's take on product management in the AI era.  

We begin with a remarkable email that Laura recently sent to customers, in which she recognized that Generative AI is "a polarizing topic" among creators, and declared that "Descript isn't a slop machine and we don't want it to be."

For me, this begged the question: what is slop?  

For Laura, who emphasizes that all creators have to start somewhere, and that all new media take time to mature, it's less about the quality of the content, and more about the incentives that drive its creation.  In short, it's the mass production of content, for the purpose of algorithmic attention arbitrage, that she objects to.  

In this, she's in step with Descript's creator customer base, who she says approach AI with a passionate mix of enthusiasm and hostility.

Narrowly scoped, purpose-built and critically – reliable – AI tools such as Descript's Studio Sound, green screen, and audio overdub features are pretty much universally loved.

Underlord, their natural language instructable AI editing assistant, which I personally do find quite useful, everyone wants to love, but many still find frustratingly limited.

And then there are the infamously unruly image and video generation models, which despite – and perhaps in part because of – their soaring popularity, are the object of visceral hatred.

It's a lot to manage, particularly with general-purpose products like Claude Code accelerating to the point where they're starting to be capable of editing video, but Laura's true north is simple: it's her job to make sure that no matter how good frontier models get, you have a better experience using Descript than you would with an AI agent alone.  

To this end, we get Laura's razor-sharp takes on how Descript decides which generative models to include in the product; why they plan to use frontier models to power agentic editing for the foreseeable future, while also training task-specific models in-house where they have unique proprietary data; the critical importance of and challenges associated with multimodal understanding; the critical role that expert aesthetic judgment plays in the process of model evaluation and iteration; the product design principle that says AI assistants should be able to do everything human users can – and vice versa; how Descript is designing the Underlord API to be "hired" by coding agents; and the pricing and design challenges that arise when a single button click or API call can consume multiple dollars worth of credits.

Finally, we take stock of where we are in the big picture.  Laura emphasizes that while economic logic might dictate a future of infinite slop, artists have a long history of adapting to and incorporating new technologies in unpredictable and often defiant ways, and she's betting that our cultural reality will be far more vibrant than our Black Mirror fears. 

With that, I hope you enjoy this extremely insightful conversation about managing both AI products and customer bases, with Descript CEO, Laura Burkhauser.

Main Episode

[03:56] Nathan Labenz: Laura Burkhauser, CEO at Descript. Welcome to the Cognitive Revolution.

[04:01] Laura Burkhauser: Hi, Nathan.

[04:03] Nathan Labenz: Thanks for being here. I'm excited for this conversation. As longtime listeners know, we are Descript customers and use Descript to help produce the podcast. And this is actually the second CEO of Descript episode on the Cognitive Revolution, although the person holding the seat has changed. And so you're new in the role. I look back at a couple of emails that you sent, one that I think you said kind of right after taking over, and you wrote something I think will be a really great jumping off point for us, which was, the script isn't a slot machine and we don't want it to be. So how do we keep building generative AI features without surrendering to the slop, or is it impossible? That's, I think, going to be a defining question, honestly, of like how people, even in the big picture, like spend their time over the next few years. So I want to just start off with the kind of, at least first question that comes to mind for me. What is slop? How do you know it when you see it?

[05:03] Laura Burkhauser: Yeah, I might, and I might define it differently than other people. So to me, I think about slop as being a form of content arbitrage. So there's like a, it's when you can identify a temporary, almost like inefficiency in the market or opportunity in the market to create content that is likely to give you a return on your investment. And in this case, it's like that you can pump the system with a lot of content that is extremely cheap for you to make, that might not get a ton of engagement or might not get like a ton of subscribers, but gets you enough revenue or engagement from that content, that it ends up being net positive for you. And so there are people that can identify these kinds of slop arbitrage moments and really take advantage of it. But the two key elements to me of slop are the incentive is money in some way, ultimately, and it is happening at scale. I think there's a lot of bad art out there. And I would say like, generally, I'm pro bad art. Like, I think bad art is a really important stage that one must go through to get to good art or good content. I don't know if you can remember the first few things you put on the internet, Nathan, but my guess is that you're like, You would cringe if you looked at them now, knowing how sophisticated you've become in your creations. And so I guess like to me, there's a difference between slop, which is like, I'm trying to pump, I'm trying to juice the algorithm. I'm trying to pump YouTube full of a bunch of avatar meditation videos in this moment when it hasn't caught on so that I can get some ad revenue real quick. And like, maybe I should be like a meditation guru on YouTube. I'm gonna create this avatar. And you know, like that's a different thing. And then the result may be the same, but I don't think that's slop. That's just someone's bad. That's just someone's bad idea.

[07:00] Nathan Labenz: Yeah, interesting. So if you're feeding the algorithmic hogs, you are producing slop. I do wonder though, and by the way, if the comment section is to be believed, I'm still in the bad art phase of my own personal development. We'll see if that ever comes in. That art is important. one thing I will say about AI is it is allowing me to create stuff that I don't think is terrible, at least, and that I enjoy the process of creating in ways that I just never would have had any opportunity to do before. And so for now, I'm ignoring the haters in the comments who are almost uniformly opposed to my new AI-generated YouTube preview art, but I just figure, I'm having fun doing it and I kind of like the look of it. So for now, we'll keep going. Maybe it'll evolve and I'll finally click.

[07:53] Laura Burkhauser: That's exactly what I mean though. That's like exactly what I mean. Like, okay, if you were learning how to paint for the first time and you sat in front of a canvas and you painted me a picture, It would probably be really bad. Like you haven't figured out what your voice is, what your aesthetic is, like what feels good to you. haven't taken any classes, you haven't played with the paint, right? That's your very first thing. This is a new medium. A lot of the, like we're having fun with a new medium is how you get to good stuff, right? Like I'm sure that you're finding that as you play with this stuff more and more, you are starting to have your own opinions. Like, oh, I don't like this. Or, oh, this is working, this prompt is working better. I'm liking what I'm getting more. You're going out into the world, you're seeing other people who are doing stuff and you're like, I like that person's style. Like, How can I show up in that same way that person does? And that is the same way that if you were learning how to paint, you would be developing your painting style, right? So I also am not a hater who thinks anything created with Gen. AI is slop. I think that because this is a new technology, most of us are in our create a lot of bad stuff kind of phase, and also because the technology is still nascent. But I am very bullish that it will be possible and already is to create really awesome stuff with Gen. AI, like stuff that is not slop at all. And I think the only way you're going to ever get there is by, yeah, creating a lot of bad stuff first, because that's how you get good at everything. That's how you get good at art.

[09:33] Nathan Labenz: Yeah, no doubt in my mind that good quality stuff can be produced with the new Gen. AI tools. My creative teammates at Waymark are like just clearly head and shoulders above me in terms of their ability to do that. And it's a little bit hard sometimes to put our finger on exactly why they're so much better, but I think it's pretty undeniable that there is still like a significant skill gradient in terms of what people can do. And they've also put in the time to a much greater extent than I have even still. And you know, I say that as an early adopter and enthusiast of just about everything AI.

[10:08] Laura Burkhauser: Yeah, and they're also fighting their tools right now. Like right now, I think that there are two things that are really blocking us from seeing a lot more good Gen. AI art or content or whatever you want to call it. And the first is that the technology is just not quite there yet. And that's like, it's getting better, but no one would actually choose to make a video by generating 5 to 10 seconds of it at a time and like crossing their fingers that the voice consistency between clip one and clip two is good enough that no one notices or deciding we're just not going to have voice in video. Like right now, there's all kinds of constraints. You have to generate something like 50 times sometimes to get it exactly the way that you want it. And so if you're making good Gen. AI stuff right now, it's because like you're super invested in the medium. and you want to fight your tools the entire time. And then I think the other reason we're not seeing a lot of it is because there's a lot of stigma right now for people with the kinds of taste and skills to be able to make good stuff. There's a lot of stigma for them to be using these tools and publishing and owning that they're using them. And so when you go on X or you go online, like wherever it is that you're consuming social media, a lot of the people that you're seeing using these tools are people who are earlier in their journey of knowing what good looks like visually. And so like it is bad. Like I think like the average person who's never studied, whether by study I mean in school or in just experience, if you've never studied film or you've never studied like photographic composition or whatever, like yeah, you're... you're going to need to spend some time in the medium before the stuff you make is actually like good. And you're not going to know why. You're going to be like, I know this isn't good. I don't know why. I don't have the vocabulary to say why. If you stick with it, might get better. It doesn't for everyone. But those are the two reasons. It's like right now we don't have the tastemakers really using it. And right now, They'd really have to fight the technology for it to feel good, for them to ever get in a flow state, for it to really feel fun for them. So you just have early adopters that are really digging into this stuff right now.

[12:25] Nathan Labenz: That comment on vocabulary, I think, is a very apt one. I just said it's a little hard to put our fingers sometimes on why the creative team is so much better than me, but I think that is a huge part of it. So often, if I send our creative lead something that I'm working on, he'll give me like a few adjectives or an artist's name for inspiration, something like that, or even a, and in some cases it could be like an explicit direction for composition. And that really does usually take it up a pretty clear notch. So I think that's a great detail to highlight. I don't know if it was in the same e-mail or another one, but you described adding generative AI models to the descript product as a polarizing topic, and I guess a polarizing product move. And in that same e-mail, I think you were recruiting people to join a generative AI advisory committee. I've been broadly very impressed by the way with my interactions with, because we're on the API, you know, early adopter beta list. I think your team is like very plugged into what are people trying to do and how can we help them do it. So I've been very impressed by my interaction with the people that are building the product. Who signed up for this? What did they want to tell you? What are you hearing from people? And how has your sense of what is making it polarizing and how the sort of discourse around these tools is evolving since you formed that committee?

[13:51] Laura Burkhauser: Yeah, it has been. fascinating. First of all, there was a huge uptake in that invitation. We got way more people who signed up for it than we could convene in a reasonable way. Although I think I've talked to most folks who signed up for it now. And yeah, so why did I send this e-mail in the 1st place? Because when I became the CEO of Descripten, I had been the VP of product for several years before that, so I wasn't like brand new to the discourse, but we had just changed our pricing, which was a tough moment. We are a really popular product with a huge and loyal and vocal set of customers that we love dearly. And I'm glad that came through in the way that we've talked to you. We are obsessed with our customers. But we did need to change our pricing, which is always like a kind of moment. And we got a lot of feedback on it. And there was this one type of dual feedback that I thought was really interesting and I wanted to dig in on. And came up a bunch in some form of like, I wish you would stop spending time building AI features and use that time to invest in the core quality of the app. And I'm like, when you see feedback like that a number of times as you're raising prices or as you're changing prices, you should get real curious because there's a lot that could be in that. There are like some dual things. So the first is like, do we have a problem with core quality? And what do you mean by core quality? What is core quality to you? Are we talking about performance? Are we talking about reliability? Are we talking about like upload speed? Are we talking about playback speed? And interestingly, and so there were some really good things that came out of that. And I hope folks see that we've been kind of knocking through a bunch of quality things. But Then there was also a bunch of core quality being new features that in fact, to my mind, are AI features, right? And so one of the things that became really clear to me is that when we talk about AI features, different people really mean different things. things. So Descript is like an AI native product. We've been AI native since, like the whole idea of editing a video, like a transcript is actually an AI idea. The way that we implement green screen is like using an AI model, like a visual model. The way that we do studio sound, like that's an AI model that we created. We do like voice cloning, regenerate and overdub, like so you can change something that you've, oh, I recorded the wrong word. Let's just go back and change that and have it in my voice say the right thing. And now Idlib dubs. And We built that model and it's like, these are all AI products and many of the features they wanted us to improve are actually like AI features.

[17:05] Laura Burkhauser: And so what I came to understand is that there is a hierarchy of like hostility towards different types of AI features and that Descript users at least love, love a lot of AI features, especially when those features are effects, transitions, Things that have a button that do something that feels deterministic to the video, even if it's powered by AI, green check mark, everyone loves it. Keep building that into an infinity. Then there's like Underlord, which has our AI co-editor. Underlord is somewhat polarizing. Everyone wants it. People are very excited about speeding up their workflows, their AI editing workflows. with AI, and they love the idea of an agentic co-editor that helps them do that. But they're mad that it's not as good as they want it to be, some folks, for some of their use cases. And so there, it's, like, it's not like, don't build Underlord, but it's like, why isn't it perfect yet? But I do want this. I do want you to build something that helps me get through the drudgery of editing faster. That's the general sentiment on agentic co-editing. Then, okay, so when saw the hatred, it is really the generative video is like the polarizing topic. To some extent, avatars, not so much voice cloning and TTS. People feel pretty good about that, but it's really like the generative videos. And when I dig into it, I think, and we've talked a little bit about this, so I don't want to belabor it, but there were really like two things that people didn't like that made people mad about this. In addition to some of the other kind of general pause AI, stop AI kind of stuff that's like in the air, but like strictly speaking from a creator perspective, I think there's like a, I feel like I'm going insane because everyone is telling me that this stuff is super good and it sucks and I hate working with it, which I think is a very reasonable perspective for the average creator to have. Because like I said, I think right now to have a good experience, you've just got to be really invested in the model because the technology is hard to use. And we don't talk about that enough in all the hype cycle. And so people feel like they're sold this story, that this stuff is amazing and incredible in the future, and they use it, and they're like, what's wrong with the world? Like, I feel like I'm taking crazy pills here. So that's kind of one part. And then I think there is like this idea, like along with the hype cycle, part of this discourse is like, how many times have you seen someone say something like, Sea Dance just put a gun to Hollywood's head and pulled the trigger. And it's like, yeah, okay, well, if that's That's how you're going to sell. That's how you're going to sell your technology. The people who you just said got a gun put to their head aren't really going to like or be excited to use the technology. And like, that's not how I talk about this stuff. That's not my perspective is like that this stuff is going to end the role of traditional film, that it's going to end recorded media, that it's going to like put everyone who works in traditional media out of a job. Like personally, I just don't believe that. I think this is a new tool, like many new creation tools that we have gotten over the past, like film was a new tool. And that this is like generally pretty exciting. And I don't know, is it threatening. I don't know that the main story should be that this is like threatening and going to displace a ton of jobs because it's not clear to me it's going to displace a lot of jobs. I think it may also create a ton of jobs. It may shift jobs, but this is simply a creative tool that we ought to be approaching with fun, like a sense of play and curiosity. And instead, because of the discourse around it, is like perceived as being like threatening and overhyped.

[20:24]Sequence: Sequence handles the full revenue workflow for complex pricing, from quoting and metering to invoicing, revenue recognition, and collections. Book a public demo at https://sequencehq.com and use code Cognizant in the source field to save 20% off year one

[21:32]Claude: Claude by Anthropic is an AI collaborator that understands your workflow and helps you tackle research, writing, coding, and organization with deep context. Get started with Claude and explore Claude Pro at https://claude.ai/tcr

Main Episode

[23:24] Nathan Labenz: I love the emphasis on play. That's one of my. most common refrains as well. This technology rewards play, and not just the video or visual generative models, but really all of the current frontier AI capabilities really reward play more than any other technology I've ever used. And that really is the right mindset to go into it. And I couldn't agree more with that. I think there's like 6 different follow-up questions that I want to ask based on everything that you just told me. And maybe the first one would be, How do you choose which generative models to put into a product? There are obviously many. They have very different strengths and weaknesses. They have different price points. And there's another question coming out. price and how you're thinking about that and managing that. But these things are super hard to benchmark, right? It's not like in a, when we get to the underlord portion, I think you'll have a much clearer line of sight to like, is this, you know, new model or new prompt or whatever, like doing what we want it to do in a reliable way for a finite set of understood use cases. With the generative stuff, it's tough. Is it just vibes or do you have a better answer for like how you're figuring out what to actually pull the trigger on moving into the product?

[24:38] Laura Burkhauser: Yeah, so there's sort of two stage gates. The first is like, should this be available within Descript? And the second is, should we make this the default model? Because most people are not going to change the default model. They're going to accept whatever you put as the default model, right? That actually might be surprising to you. I feel like that's something that if you're deep in AI, you're sort of like, why aren't you using the model picker? Obviously, Nano Banana 2 Pro is going to be like the best thing for photorealistic like face swaps, but then you should be using Cling for this other usage, right? That's how people who are deep in think about things, but the average kind of person doesn't have that level of sophistication, doesn't want that level of sophistication. And so like, how do we make decisions about default models? How do we make decisions about what models to improve or to bring in is a little bit vibes. I'm not gonna lie, because it's not like we like eval every single model out there and say like, these are the five best or whatever. So it's often things where It needs to be available via kind of the, we use FAL as our provider. And so if you're not in FAL, you're not going to be in Descript because we don't want to build our own custom connector for your thing unless it's like the best thing ever. But that means we need to like sign a new data license agreement and all this stuff that's like, what a headache. We've already done it with FAL. We're just going to do it there. And so that's why like SeaDance is now in Descript is it is finally in FAL. So we're like, great, you can come on in. And then within stuff that's in file, we try to pick the stuff that feels like generally the best or in the game. Because what you see are these standard kind of industry benchmarks of these different things. And you'll sort of like see that you have the same labs on the leaderboard kind of month after month. And so we try to make sure that we have some representation from each of those labs because you're always like one week away from that lab coming back to the top and having the best thing available. When it comes to the default, that is where we do do, we look at external evals, and then we run some of our own on common customer use cases to find out, like, generally we think that people are going to have the best experience. Now we have, like, for image generation, Nano Banana Pro, I think is our new default. And what we then do is we'll AB test it against the existing default and make sure that we're seeing kind of good things from the AB test and that the AB test matches kind of like what our internal evals tell us. And if it does, then it's like a definite ship. This is our new default.

[27:12] Nathan Labenz: When you do an internal eval, is it a panel of trusted people that are like scoring outputs?

[27:19] Laura Burkhauser: Yeah, it is.

[27:21] Nathan Labenz: Yeah, interesting. Hard thing to automate. We have done some of that stuff. It's been a minute since I last did a version of that. I was also struck, and our original use case was a little bit different. So with Waymark, we have this basically TV commercial maker for small business. And we've, for a long time, had a tool that pulls in all the images that the small business has published. to their website or their Facebook page or whatever. And then we make this library. And that was a great convenience factor even five years ago when there was not much we could do with it beyond just be like, here's what we collected for you so you don't have to go collect it for yourself and upload it. But obviously with AI, there's a lot more we can do. But the aesthetic quality of an image was always a really hard thing to evaluate. Like in early days, you could caption it, but did it look good? Like models who had a hard time with that.

[28:10] Laura Burkhauser: I'm not even sure that you should be trying to automate that. I don't know. Like I went to this dinner with the CEO of Midjourney and he's like, the reason why we have the best, like still have aesthetically like the best image generation is because I have my thumb on the scale. And like Google just lets like some kind of democratic panel or automation decide what the best, what the best image is. And the best image is always some generic pretty blonde lady or whatever when you ask for something. I thought that was pretty funny. But all of that is to say that like I, This may be an unpopular opinion, so fun for you because it's a little bit controversial, but I don't think that you can't, you should underestimate the importance of vibes in like aesthetic evals. And just when we first built Studio Sound, which by the way, like it's still our internal model and we still think it's better than we recently evaled it against all the other new Studio Sound providers and we still chose ours, even though in other cases we have thrown ours out and taken another kind of model That's obviously better, but we kept Studio Sound. Studio Sound was originally built by a cellist who just had a really good ear for things, and he did what we might now be calling eval, but he like... Our original eval thing was this guy would listen to different models and be like, this is better, this is better. Now, when he left Descript, then we had to actually write an eval that was like, what are the 37 different things that make one form of background noise removal better than another form of background noise removal? But I don't know. I think that it's reasonable to say in something that is like primarily judgment-based, We're just going to have a human do this and we're going to have a human do this forever. Someone that we know that we've vetted as having good taste is going to make these decisions.

[29:58] Nathan Labenz: Yeah. Okay. That's quite interesting. Could you do just a quick overview of the frontier model landscape as you see it? Like you started to a minute ago when you said nano bananas, best first face swap and then cling this other thing. Is there a like expanded version of that you would say is kind of, here's how users should generally orient themselves to their options.

[30:22] Laura Burkhauser: You mean specifically for like video and image generation?

[30:26] Nathan Labenz: Yes. Or others if you have a similar account for others. But yeah, that's what I was thinking.

[30:32] Laura Burkhauser: Yeah, I, you know, I'm probably not the, I'm like, do I have a cheat sheet here that kind of tells me that? I'm honestly like, I'm probably not the best person to ask about this. I know that like, But I know that we, that generally we have a perspective about it. I know that our defaults right now are Nano Banana Pro and Veo from Google and that we're considering replacing Veo with Sea Dance. What's interesting is like, I'm not gonna get into that actually. Yeah, but I would say that, especially when it comes to video generation, okay, so maybe I will, especially when it comes to video generation, I think that it really depends on your, okay, so I don't think that there is going to be a winner take all across all of these generative image and video models. Because I just think the use cases for generative video, for example, are so different that it's very difficult for me to believe that the same model is going to be the winner for something like Oscar film worthy special effects and making the cheapest good and cheapest but high quality enough video for all of the product pages on amazon.com. I just think like there's gonna be models that are really good for like massive bulk actions that don't require things like consistency across time or sound or voice. That's not gonna be important. It's gonna be like about a bulk play versus something where quality, you know, you'll pay thousands of dollars per generation if the quality is really, really good. And so like most products, I think it really helps to understand who your core customer and hero use cases are. And one of the things that this kind of came up for me with SeaDance is like, SeaDance is an amazing generative video solution. If you want a really opinionated edit, where it's going to like make a lot of artistic choices for you that you didn't necessarily ask for. And so if you're someone who's kind of like wanting to abdicate a lot of that or able to like ahead of the game, exactly describe beat by beat what you want to happen, you'll have some, you'll, you'll have like a good time with C Dance. But like a lot of people at Descript are using generative media as B-roll. And so then like C Dance feels sort of like almost too flashy and directed when you do a general prompt into it for it to not be distracting, right? Like B-roll is supposed to be generally like not super distracting and not kind of taking too much of your attention. Your attention should be on like the A-roll. And so I don't know, that's like a, it's a weird example where we're like, should we make Sea Dance the default? And I'm like, on one hand, yes, it's like the quality is really good. But on the other hand, for the typical use case, which is not using it as A-roll, but actually using it as filler B-roll, I don't know that. First of all, CDance may be overkill. It's like too expensive for that use case. And it's maybe like too much of a scene stealer for that use case. It's kind of like a, you told me it was okay to get in the weeds. So yeah, please. An example of getting in the weeds.

[33:45] Nathan Labenz: I rely on people to help me understand these different frontiers more and more all the time. I mean, I used to. three years ago, I could try all the new models myself. And now it's just getting to the point where I have to rely on the network to help me understand it. So yeah, please don't shy away from any and all of the nitty gritty detail. Okay, cool. I think that's really interesting stuff on generative models. And I think the perspective also on like, there won't just be one winner makes a lot of sense too.

[34:16] Laura Burkhauser: And I actually think that's why a lot of the orchestrator agents, one of the things that they're going to need to be good at doing is understanding which generative model or which other kind of model they're going to need to orchestrate between all of the different models and sort of understand, like, given the context that I have about the video or the project that this user is working on, this is likely the right model that I should use and this is why, or whatever, to kind of hit their cost quality and like use case bullseye.

[34:46] Nathan Labenz: Yeah. it makes sense. let's get into Underlord then, because that is obviously the sort of agentic interface for Descript these days. One thing I would love to understand, again, in probably as much detail as you would be willing to share, is how does the AI or AIs, as the case may be, see the video? How does it understand video? Because I've done a bunch of stuff with this over time where I've been like, and even with Gemini, which is video native in some sense, in that I can throw a video at the API and it will accept a video file, I'm not quite sure what's going on under the hood. Like, are they taking frames out of the video and doing some sort of sampling? It doesn't always feel like it has a true, I'm watching the video. You know, it doesn't always feel to me like it's, sometimes I've asked it to critique videos and it sort of says that there are like hard cuts when there weren't hard cuts just because like I moved my head between 2 frames or things like that. So, and obviously video is just like a huge, heavy file in the 1st place, right? So it's a big part of video software over time has been just managing that. And we've got another generation of that problem in managing that, but as we provide video as inputs to models. So like, how does it get processed and how does it get structured so that it can be presented to one or more AIs in the most effective way?

[34:53]AvePoint: AvePoint is building the control layer for AI agents so you can securely govern, audit, and recover every action at scale. Design trusted agentic outcomes from day one at https://avpt.co/tcr

Main Episode

[37:30] Laura Burkhauser: Yeah, right now we translate visuals to text and consume those. So we do, it's called like captioning. And so we do like frame by frame captioning of what is in this frame. And then we use some clever tricks to sort of like fake giving the agent eyes and ears that way. And I think it does okay. I think this is an area of huge opportunity for us and working multimodally is right now the agent quality team's number one priority. So I would stay tuned here to see a major upgrade in the next month or two. But right now, right now we do visual captioning.

[38:09] Nathan Labenz: This connects also to the idea of training your own models versus going out and getting models. It sounds like, if I interpret your previous statement correctly, you're kind of neutral on whether it's your own model or somebody else's model and really just focused on what's going to deliver the best, maybe cost-adjusted user experience. How do you think about, first of all, choosing between build versus buy when it comes to models?

[38:36] Laura Burkhauser: So we have sort of like a strategic bullseye of where we would, where we aspire to have the best models. And I'd say that where Descript aspires to have the best models is when you start with recorded media and you're editing recorded media. We want to have amazing, we want to be like the world's best at that job. So that's like I was telling you before, something we call Regenerate, where we can go into this recording that we're in and a few questions back, I may be like, oh, I really didn't like my answer. Can you actually make me say this instead? And you can change my voice and my lips to say the thing that I wish I had said in the 1st place. That's a great example of 98% recorded media, but oh, we need to update all of our branding to say this, or we need to update the dates, or California just added a new law and we need to change some of the language from the way that Laura explained to this concept 2 years ago. So can we just regenerate that without having to re-record the whole video? That's the kind of job Descript wants to be really good at. We're about to launch smoothing jump cuts. So if you do have some crazy jump cuts, you can use Descript, and we'll just make it look like you naturally moved over there and you never made the edit in the 1st place. So those are the kinds of things where it's like, we have Descript models to do that, and we want to own that space. For purely generative stuff, we've sort of said like, we don't want to own that space. It's very expensive to build those models. And I think most of the companies that are spending hundreds of millions of dollars to build those models are still going to lose to Google. So I don't want to set money on fire that way. So I think it's about like kind of deciding where you want to win and then deciding where you can borrow. And so we are very friendly to buy, to borrowing around, especially around like pure generation kind of stuff. I think like, I'm really excited about, but think this is where you kind of get into a blurry line, heavily augmented recorded media. So for what it's worth, you didn't ask, but like, I will say that like, there is a part of me, maybe like a stodgy part of me that just feels possessive about human expression, human face, facial expression, and feels a little bit unsettled by AI clones or something that purports to be me but isn't. At the same time, I'm very sympathetic to the idea of like, I had to turn a whole bunch of lights in this studio. I had to make sure that like my makeup was like decent before I came on to record with you. And wouldn't it just be nice if mostly we could have this conversation in an authentic and human way that we don't type on a piece of paper and then add my voice to and then add my robot face to? We can mostly just talk as humans, but then in post, we could do all kinds of magic to just make it look like I was wearing makeup and looked amazing and had a great outfit on and the light was perfect and I didn't say anything stupid. And so that's kind of like the, that's the vision, that's the vision that I have for Descript is sort of really making the killer use case that we're better than anyone at augmented human recorded media. So that's where we really like to build.

[41:56] Nathan Labenz: Yeah, cool. That's quite interesting. In the, I guess I'm sort of feeling out the kind of strategic landscape of models. It sounds like some of the models that you're building, maybe there are no offerings on the market. You know, I've not seen one, for example, that does like jump cut smoothing. And I could also imagine, you know, you've got kind of like retake removal now. And that's been there for a while, but I'm also guilty all too often of vocalized pauses, so we can remove ums and uhs and that kind of stuff in a pretty smooth way. And it strikes me that the structured nature of the edits that people make in Descript is an unbelievable data set for some of these use cases that probably just nobody else really has even collected the data on. So I feel like I'm... intuiting kind of probably where the core advantage lies based on all the work that people have done in the product over time. But maybe you could tell us a little bit more about like how you think about the data flywheel. And I guess, you know, if I was going to say like, what do I want Underlord to be better at? It would be some of these like subtle things where I stuttered, I repeated myself, whatever. And then, but which one do I cut? Do I cut the first version that I said or do I cut the second version that I said? It's not always, I think you'd usually kind of think, maybe cut the first one if you felt the need to say it again, then probably the second would be better. It's not always the case. And often also when I highlight a word and hit ignore on it, then I kind of go back and watch that. that passage again to see like, how did that land? was it glitchy in any weird way or whatever? And I think if there was a model that could sort of make those marginal decisions well, where you kind of try this edit, try that edit and see which one looks better, that right now is like the bulk of the time that I spend in Descript that I would love to offload is kind of, I just made that edit, How does it look? Let me try the alt version. How does it look? Making good decisions there would be, I think, an amazing upgrade. It doesn't sound like that's something Google's going to solve anytime soon or anybody else. I don't know, maybe of other candidates out there, but.

[44:15] Laura Burkhauser: That sounds like one.

[44:16] Nathan Labenz: You're probably going to have to do at home.

[44:19] Laura Burkhauser: That's right. That's exactly right. So it's like we don't think, we tend to invest where we think we have great data, where we think we can build something without breaking the bank, and where we think it's unlikely that one of the labs is suddenly going to care a ton about just like removing retakes, because that's much less interesting to Google, I think, than solving voice consistency between 3 second clips or whatever. And so I, yeah, you're intuiting correctly. And if that's where you want us to work, first of all, I'll pass your feedback onto the product team. We're kind of taking things on chunk by chunk. But that's where like the thumbs up and the thumbs down really helps us identify like, Where are we not hitting the quality bar in either this AI action, which are like the deterministic seeming tools, like remove retakes, or an underlord request themselves, right? Where you may say something like, get rid of all of my filler words, unless you can't make a clean cut. If you can't make a clean cut, keep the filler word.

[45:20] Nathan Labenz: I haven't tried something where I sort of give it the like freedom to determine it can't make a clean cut. Is that something I should be doing more of?

[45:28] Laura Burkhauser: Yeah, you should. We have tried to make Underlord truly open world. So like Underlord doesn't just have 28 tools that it can use. And if you choose something that's like idea 29, it doesn't know what to do. It ought to be able to handle nearly any request, not all with equal ability. And then we use the thumbs up and thumbs down to understand as, so we have an eval set, which I can tell you about. But then we use user feedback to help us understand like where the pain is, because there are all kinds of things that we're bad at. I'm just going to be real. There are some things we're great at. And we actually have like, in our evals, we have 3 grades that you can get for a user request. And that is like, you didn't break anything. So that's, I'm just like, you just didn't break my video. You didn't do something that made me say, oh my God, you ruined everything. That's like grade one. Grade 2 is like, you did what I asked. You know, I said, remove filler words and you remove filler words. Thanks, buddy. And did it well would be you remove filler words and you didn't have these, you know, really striking jump cuts or changes in tone as a result. And so what we do the way that we do evals is we have like a random selection of real user queries. We run Underlord against those, you know, in version one and version two. And then we have actually a ton of LLM judges go through and grade them kind of multiple. And we take the average of those and we say, okay, this is the percent where we didn't break things. And we aspire for that percent to be close to 100%. We never want to break your stuff. Then there's like, did what I ask. And right now, I think we're aiming for that to be like 90% of the time we do the thing that you ask us to do. And then there's like, do it well. Now, right now with do it well, we're like, we're okay. We could be better. I would like that number to be 80% by in, I'm going to say by the end of the year, because right now it's like we're not doing it. really well. And I think that it's when you really feel like 80% of the time that I ask you to do something, you do it at about the level that I would do it. That's when you like really trust your AI co-editor. And we, and, but that's across all use cases. There are some. nice, cool spots where we're doing it well a lot. And like rough cut is an example of that. So whenever you're asking it to help you with a rough cut and to like help you get a long story into like a shorter form, we tend to do that well. A lot of the visual stuff, we're not doing as well. And that's why multimodal is a real priority for us right now. But user feedback helps us understand like, wait, there's a hotspot here. There's a hotspot around filler words where like people are just not, they are not happy here. Why don't we go spend a couple sprints getting this part of the product really cleaned up?

[48:23] Nathan Labenz: So as you try to push the frontier on this, I can kind of imagine a couple different strategic directions that you might go in terms of how to get the best performance out of the available models with the various constraints that they have. And maybe you're even doing like multiple of these. But one angle would be to say, okay, well, Claude or GPT or maybe Gemini is going to be probably the best reasoning and tool use agent. So what we really need to do is set whichever one of those we're using up for success. How do we do that? Well, we need to give it a richer understanding of what it's working with. But since it can't natively detect these like awkward moments, for example, maybe we need like an awkward moment detector that we can run and then kind of feed in to the model to flag when these things are happening so that it knows to reason appropriately about that. But then you can imagine a different version where you're like, I've been hearing very good things about GLM 5.1, and I think the weights are out there for this. Maybe we want to try to do something deeper, you know, where we actually teach the core model to understand some of these inputs. And I'm adding video as a modality to a GLM 5.1 doesn't sound easy at all, but you could do some sort of late fusion, you know, cross-training, what have you. And I guess this probably This sort of decision probably depends a lot on what resources you have. Like, do you feel like you can hire the team to do Frontier work at that level? Or is it just so hard to compete with the Frontier Labs for that kind of talent that that's out of range? And it might also depend on like, do we think that open source models, it definitely would also depend on, do we think open source models are going to continue to be competitive Or do we feel like Claude 5 is going to run away from, for compute reasons or, whatever, constitutional reasons, whatever else, if it runs away from the open source bases, then like we can't really keep up even if we do get good at, doing more advanced stuff on open source bases. So I guess to bottom line all that, what's the model strategy? How do you think about where you want to, like what trends you want to bet on carrying you forward?

[50:44] Laura Burkhauser: I think that we are betting on the main bet that we've made with our agent is trying to build a very generalized harness and to give the agent access to a bunch of low-level tools, assuming that generalized intelligence is going to get better and better, and that we'll use that we'll use like probably a handful of like whatever model Anthropic or OpenAI or Gemini comes out with that, you know, when, yeah, that when the next cloud model drops, we'll have it evaled within 15 minutes and in the product, and that building for that, at least for the short to medium term, is the right bet to make, rather than investing a lot of time and money and research trying to keep up with the labs. So then it's about like, how do we build an agent harness that's going to be able to instantly take advantage of leaps in general intelligence? How do we not get bitter lessened into not being able to immediately take advantage of those of those leaps. And so that's how we've tried to build our agent. Just give it like a ton of context about the Descript model and also about video editing and how to think about user requests and give it access to our low-level tools. Yeah. And then we do some stuff. We have various experiments that we're doing to sort of make it better through personalization over time.

[52:09] Nathan Labenz: Okay. I mean, I'm very interested in the personalization. Does that boil down to basically saying, though, that you want Underlord to be essentially in the same position as your human users? I mean, obviously, it can't see as well, and it can't hear in the same native way. But subject to the constraints of like some of these things having to be arm's length tool calls to do the sort of sensing, it sounds like aside from that, you're sort of like building one harnessed kind of for both humans and for AIs at the same time. Is that a reasonable way to think about it?

[52:47] Laura Burkhauser: Yeah, I have not thought about it that way, but I think that is right. And we do have a design principle that like... Underlord should not be able to do anything in the editor that a human can't do, and vice versa. They should all have access to the same tools. And yeah, that we think about, and that also kind of like aligns with the general design principle we have, that Underlord is like a collaborator with you in the editor, the same way that we've been a collaboration, a video collaboration tool since day one. We've been a tool that teams use, because often video is not a solo job. It's a job that you do with a team, and that Underlord is like a member of that team.

[53:22] Nathan Labenz: Yeah, okay, that's quite interesting. I feel like I'm seeing this still in a fuzzy way, but it seems like increasingly like a sort of universal design pattern, which Descript doesn't quite follow yet, although what you articulate is very consistent with it, is sort of a app that you could use using your own intelligence and click all the buttons and do all the things, and then sort of frame that with a agentic companion that to varying degrees you could say like do this point thing for me or like do everything for me and it would kind of log out for you what it's doing using the same exact tools that you could use and then in sort of creating that log you could also kind of pretty easily go in and like undo the one thing that it did that you didn't want it to do or it didn't work well or what have you. Do you see the form factor? I mean right now there's not this sort of like very persistent underlord presence that's like the long-running agent where I can, see like everything it did. It feels like it's more kind of embedded into the product as opposed to like framing and sitting outside the product. Do you think that there... Is that something you think will change over time? Or do you feel like maybe I'm going the wrong direction with my framing notion there?

[54:39] Laura Burkhauser: No, I think that it will change over time. And I think we need to decide exactly how. But ultimately, Underlord would be more powerful if it had, if it, like right now it lives at the project level. It needs to at least live at the drive level. But I think it would be even more powerful if it could live outside of the drive and be a collaborator that you can bring, like with MCP, I think about that as like, that's Underlord is now my collaborator that I bring into Claude Cowork. And for example, now one of the ways that I create content is I say like, hey, Claude Cowork, look across everything I've done in Notion and Slack over the last week, and can you come up with like 6 ideas for clips that I can make about my thoughts on the AI space? And this is much better than saying like, hey, can you just like brainstorm 6 thoughts about the AI space that like maybe I believe or maybe I don't? It's like, no, I talk about this **** all the time. You've listened, you've been listening. I know you have. Go find some things that I'm saying that you think are kind of interesting. Suggest them for me. I workshop it in Claude Cowork. And then I'm like, great, go create like scripts in Descript. Go create projects for each of these and put this script in there as like scratch text so that I can then go into each each of these projects and record. And then I can say like, okay, I've done all the raw recordings, go turn them into like LinkedIn quotes using the skill that I built that tells you like what my LinkedIn clips look like, right? Or we have a user, I'm obsessed, actually I should send you this Claude skill, but he's been a big user of the MCP and he has like a podcast, an edit podcast skill that he's created and he just like runs it. It's like triggered whenever he finishes. a Zoom recording and it just like goes through the skill, creates the project in Descript and he goes and he looks at it. So all of this is to say that I think we need to bring Underlord not only out of the project and into the drive, but also like out of Descript and into like the team of agents and like into the world where you're already doing a lot of app connections and video, your video team, Underlord, your video team needs to be in there with the rest of your teams, right? Working on creating your content.

[56:49] Nathan Labenz: I would love to see the edit podcast skill. I've created my own. I wouldn't say it's very advanced by any means, but it's kind of, it's a working V1 that, for example.

[56:58] Laura Burkhauser: Oh, that's awesome.

[56:59] Nathan Labenz: It does like 6 different things. One is like, I typically open every episode with addressing the guest and then saying, welcome to the cognitive revolution. And then usually at the end, I've got an outro, trim everything before the welcome, trim everything after the, you know, the thank you. And there's like five other steps.

[57:15] Laura Burkhauser: Yeah. He does a pull quote and then he has the podcast theme come in after the pull quote and it like shows up pretty well.

[57:22] Nathan Labenz: Yeah, cool. I mean, I'm sure I could learn something from that, no doubt. And maybe there'd be somebody who could learn something from mine, although again, it's not that advanced.

[57:29] Laura Burkhauser: I'm sure they could.

[57:32] Nathan Labenz: I think the main thing they could learn is just like that you could do this at all. Not necessarily the quality with which I've with which I've done it so far. But. Okay, so this is, I think this is really interesting, and it does kind of get at, in some ways, some of the biggest questions about the future of software and even just the future of knowledge, or how is all this going to work? I, so as an early user of the Descript API and the Underlord API is really the core of that, I think it's a really interesting pattern that you guys have gone with so far where the tool is smart, right? It is, I am not, with the Descript API, I am not even afforded the ability to do like very specific, fully deterministic edits to a project. Instead, like I'm prompting Underlord and it's doing the thing. And so I can prompt it very specifically, but there's always this kind of translation layer. It works pretty well in what I've experienced so far. I'm new to it like everyone else. And I think this is kind of, in one way, like the only way maybe it feels like for software companies to have any sort of defensibility because you got to have some smarts inside your kind of periphery, right? That seems like it's got to be a core principle. At the same time, there are times where I'm like, I just kind of want to make sure I'm doing exactly what I want to do. And then there's also personalization, like my Claude code universe is ever growing and has tremendous amounts of context and access to every podcast I've done and also ones where I've guested, where it's not even in my Descript account, it just has like a broader world of and broader view of me. So how do you see that? And then I'm also thinking Finn too, I'm sure you've been following this pretty closely, like they have opened up, Intercom has opened up their customer service model to other companies to build their own, you know, competitors to Finn using the same intelligence. Benedict Evans, all software revolutions are either bundling or unbundling. What are you, how do you think this is all going to get bundled or unbundled or rebundled? Where are the lines going to get drawn? Where should personalization live? Like, where is all this going? Please de-confuse me as much as possible.

[59:44] Laura Burkhauser: Yeah, I can't. And I think anyone who tells you that they can is lying to you, probably for self-satisfying reasons. But what I would say is, I think that Descript currently, and my job is to make sure for the foreseeable future can give you a better experience if you are using Underlord and all of the context that we have about you in Descript. And it's just going to get more and more powerful as we build in personalization and drive level understanding. And so I think that is likely to be the primary way that you do video editing within Descript. Because we just, if you do it that way, not only do we think because of the way that the agent coordinates within Descript and understands the capital D, capital M Descript model and how everything in the app is set up, like we have important context in the Underlord layer where you're just going to have a better time than if you're asking Claude to coordinate across a bunch of tools. But the other thing that gets you is like if you're doing everything in Descript and that guarantees that when which for most people will be true, you inevitably need to go in and do the last 10% yourself before you're really comfortable hitting publish. You're going to be able to go into Descript and we will have access to all of the discrete things that have been done so that you can undo them, change them. They're not just flat files that have sort of been put in there and it's like, do you want to edit this thing that's fundamentally uneditable? And you're like, no, this isn't helpful. So I think like that's our vision. However, we do think that it is important to break out some of our tools and make them generally accessible. Things like the transcript like ought to just be callable in a deterministic way without having to go through Underlord. And there may be additional tools that it might be nice, like you don't need a ton of context to be successful with them, where I think we may want to think about distributing those just as deterministic tools and giving access to Claude. But when it comes to like really orchestrating like multiple types of media, doing visual edits, setting things up like a layout, like things like that, I just generally think you're going to have a better time. That's true today and because of all of the work that we're doing with the agent, I think is going to be true long into the future. Yeah.

[1:01:52] Nathan Labenz: Yeah. I like the framework of my job is to make sure you have a better time using our thing than using Claude code. That is going to be a, that's going to be a challenge.

[1:02:01] Laura Burkhauser: I'm using it together. Like I actually think you should be using Claude code, but I think like the package that we're talking about where it's think about what you're doing is like you're telling Claude code to hire Underlord as your video team and then to go off and do, to go off and do its job the way that it thinks it ought to be done best.

[1:02:20] Nathan Labenz: Yeah. I think that is, I think that's a good paradigm. And I do think video is one of the areas where it probably lasts longer than many others.

[1:02:31] Laura Burkhauser: I think that's right. And I think like, look, this is like, I am not offended in a lot of like, a common question is just, what happens if one of like, how are you going to defend yourself against like, I don't know, X big lab? And I think the answer that like any company that's telling the truth will give you is like, sure, if Google or Claude or OpenAI decides that the thing that my app does is exactly the thing that they want to be great at and they want to spend like the time and the money and the years and the sustained effort to make a great product to do that job, what can I do? Probably nothing. But I think that we overestimate the number, like, I think I think that there will be some low-hanging fruit for them to do that with a bunch of very lucrative businesses. And like we're having a robust video, a reliable video editor is actually like pretty high up the tree. You got to climb a lot of branches before you're like, why don't we just like build and maintain something like a robust video editor forever? I think that's why a lot of people don't do it.

[1:03:37] Nathan Labenz: Yeah, I think the one thing that changes, I think I agree with that analysis as long as were in something like the normal regime. And then beyond this, it's like sort of beyond conventional business strategy. But I don't think it's like it is a crazy world. It's a crazy reality. But I no longer think it's crazy to think about the possibility that the AI coding agents and AI entrepreneur agents just sort of can sustain that effort themselves, and that's where things get really through the looking glass. I don't know if you have a view on like what, if anything, can be done if that threshold gets crossed. It's OpenAI, as I'm sure you're aware, has a timeline that they've publicly stated from when they expect to have a autonomous AI researcher, and that is March 2028. So we're less than two years away from their target to have the autonomous AI researcher. it starts to be sort of a weird world where you're like, geez, like that thing might be able to create its own specialist models that, could do all these like very particular use cases and create, its own kind of sense organs, to figure out like what's going on in the video. And then I guess that's just the singularity. I don't know if there's any other, if there's any other interpretation of what happens at that point, but maybe it's just too remote to think about, or maybe you do have some thoughts.

[1:05:05] Laura Burkhauser: I, my general thought is like, I get really excited when I think about being able to automate more and more labor. Like I think that generally is like, leads to an exciting future if we're willing to do the work to make it one. I am skeptical of that timeline and think that when someone tells you something like that, it's really important to get very concrete about what exact bet that they're making. Like kind of get that on paper and in your process of getting it on paper, like what actually do you mean by an autonomous researcher? What is it able to do without human oversight? What is it not able to do without human oversight? You often get to a more reasonable picture that still, that implies like a different future than sort of understanding things at the sort of topic sentence level, which I know you are deeper than that. But that's just like a general tip that I give people when they're trying to understand these claims of like, in six months, there will be no more white collar worker jobs. It's like, is that what was actually said? Like, let's slow down and like, look at the claim that's being made. But generally, I am like very bullish about the direction that sort of labor automation is going in. And I think it's generally good news. And I think that what we can't predict exactly what the timelines will look like or how it'll play out across different industries. And that's why I think the companies that will win are going to be companies that are able to make decisions quickly and well, and who generally like embrace change and are not resistant to it. And so when I think about like, Can I right now tell you exactly what the labor market is going to look like in five years and where Descript will play within it? No, but do I think that we have built the company rituals that allow us to nimbly shift strategy and quickly take advantage of leaps in later labor automation to quickly build tactics to like deal with the competitive situation as it evolves over the next several years? Like that's where I have a lot of confidence. But I tend to be, I tend to think that we are overstating a lot of the near-term changes in labor and society that will come from AI and probably understating a lot of like the longer term changes in society and culture and labor that will come from AI. I don't know.

[1:07:17] Nathan Labenz: Do you think like podcast editor is a job in two, three years? Or maybe people still want to delegate like, I don't want to watch it, you watch it and do a little quality control and it's not even editing, but it's just sort of giving feedback to the AI is kind of 1 version of that I can imagine.

[1:07:35] Laura Burkhauser: I don't know if podcast editor is going to be a job. Will people be employed to tell stories? Yes. What kind of stories using what media and who will they be employed by? That's all subject to change. But like if the thing that you're really good at is telling stories for brands or is interviewing other people and finding out what's interesting about that and like getting that out, like doing a media job, like those, that will still exist. That will still be a job. And especially if you kind of like embrace new media and embrace new distribution channels, like you may still have that, like you may still be, have to have that job of the future. Does that make sense?

[1:08:18] Nathan Labenz: Yeah, I do wonder, I think there's a huge question around, will people make these transitions? I think that's a, Because a lot of times when I have this conversation, I'm like, jeez, it seems like this work is going to be pretty highly automated. And then people point to a different, like sort of adjacent kind of work. But a lot of times I'm like, yeah, but are the people doing the first thing going to switch to the other thing in any effective way? I think that is where I see disruption being pretty meaningful. Like I feel like the winners and losers are not, in many cases, they're different people. And that there will be winners doesn't mean necessarily at least that like the losers will be able to kind of pivot into a winning position. It might just be like a major redistribution of like of who's winning and losing. And that could be a huge challenge for society, even, you know, even if while there may be like, you know, all kinds of new things pop up that do create new kinds of winners and new kinds of opportunities.

[1:09:19] Laura Burkhauser: To some extent, like that's just always true, right? Economies are always changing, sectors are always growing and shrinking, and there's labor displacement. This may happen in a sped up way, in which case we'll need, whenever there's like tremendous labor displacement in a short period of time, which there may be in the circumstance, then you need to make sure that systems are in place to take care of people through those moments of extreme disruption. I think like where I, have a lot of faith that the, that long term, there will be big shifts in labor, but there won't, I am not like a subscriber to there being permanent losers and irrevocable losers, unless that is the society that we choose to build. I think like there may be a very difficult moment where there's an accelerated kind of labor displacement, in which case we need to be ready to meet that moment as a society. But I think it's a moment that we've met before and that changing labor landscapes is not like a new human problem.

[1:10:26] Nathan Labenz: Yeah, that might actually tie back into the original slop question in an interesting way. But maybe one more beat on just kind of some practical product stuff, because I did want to ask about something that I think is increasingly common and is like quite prominent actually in the Descript experience today, which is there are individual button clicks and certainly like individual prompts that I can give to Underlord that will spend a few dollars worth of credits for me in kind of 1 go. And that's a weird new world for software, right? Where it used to be just like click and do whatever you want. And now you're a few clicks in and you might be, you know, through your monthly token budget. How do you guys think about designing for that new cost paradigm?

[1:11:15] Laura Burkhauser: I think it's a temporary cost paradigm myself. So first of all, what I'd say is I actually like personally as a consumer, hate the concept of feeling like pressing a button is going to cost me a dollar, even though like, actually, if you do a good job on creating my clips, that's a pretty damn good deal because that used to take me a lot of time. Sure, I'd love to pay a dollar to get someone to do my clips. But I'll just say, as a consumer, I don't feel great about that experience, which is why when we, but AI costs Descript money, right? This stuff costs us money, so it can't be free and unlimited. So the way that we try to create pricing for Descript is like, we are going to make it so that hobbyists can make like one really good thing a month. and that creators can make one really good thing a week, and that businesses can have teams of people that are making multiple good things a week. And then it's like, well, what kind of things? Well, those things really are very different for different people. The reason why it kind of has to be a shared pool of AI credits is we used to say like, oh, like every creator gets this much AI speech and this much filler word removal and this many clip things. And it's like, but if I'm a podcaster, I may never need AI speech and I need clips every single week, multiple times a week. So that's not a good deal for me. So we wanted to have kind of like a budget you can spend across any kind of AI job. And we went and we looked across all of our main use cases and we're like, is this enough credits for like someone who's making one podcast a month or making one, you know, long form YouTube video a month or something like that for a hobbyist. And for a creator, one thing a week. Now you're making, how many did you say a week, two a week?

[1:12:54] Nathan Labenz: Usually 2 a week.

[1:12:56] Laura Burkhauser: All right, you need a double license. But in any case, like that's how we tried to price it out. And then like we have the idea that you can add on more credits or more media hours if you are in a special circumstance. And so that's how we design pricing. But I think like this is a temporary moment in pricing. Everyone's doing this and we all know it's like, and The consensus around where pricing is heading is more outcome pricing, where what you're charged on is maybe something like exports. And it's like you're not going to get charged unless you get to the outcome that is getting you the value that you need, and then we'll charge you for that value. And I think we all kind of want to live in that world, but because of the state that the models are in right now, and because of the cost, how expensive AI is right now, we live in this world that feels kind of uncomfortable for everyone. And so what Descript tried to do is set up a pricing situation again, using those general design principles with the idea that Less than 5% is our gate, like less than 5% of people have to buy some kind of top off every month, less than 5% of active users. And that feels okay to me. If less than 5% are hitting their limits and needing to buy extra stuff, I'm like, okay, this is feeling all right. If it were something like 50%, I'm like, wow, this is not a fun amusement park to be at. Everything costs so much damn money.

[1:14:21] Nathan Labenz: Yeah, interesting. That's real, that's, I think, again, very interesting. interesting and useful frame. This has been great. I guess my last question to kind of tie it back to the beginning and also try to zoom out a little bit, is it going to be possible to avoid a future of infinite slop? It seems like right now one barrier to infinite slop is that the models are kind of expensive. So that's, you've got to have like some reach or you've got to have some reason to believe that you're going to get paid back in order to spend all the credits. Maybe in the future, we have some sort of universal basic income or new social contract that sort of reduces the need for people to kind of push slop to try to earn whatever pennies per view or whatever the case may be. But if pricing is dropping over time, and that future may or may not, that new social contract may or may not arrive in a timely fashion, how do you think, what is the future of content? Is it going to be infinite generation? And what is it going to look like on the consumption side, like are we going to be lost in slop? Or is there, do you have a vision where we can sort of, even if it's infinite generation on the consumption side, maybe we can somehow rise above that reality? I'd love to kind of hear how you think the future content equilibrium shakes out in an aspirational way.

[1:15:40] Laura Burkhauser: Yeah, I don't think that, so whenever people ask me about this question, they tend to be people in tech or they tend to be economists. And with a ton of respect to people in tech and people who are economists, I just don't think that, like, what we're missing about content is like content is a little bit businessy, and certainly a lot of the people who use Descript are using it in businessy ways, but it's a little bit art too, right? Like it's a little bit artistic expression, creative expression, and creative storytelling. And whenever you're kind of like playing in that field, I think it's like, that isn't as clearly driven by like free market nihilism as other areas of the world. And so I'm not talking about descript now. I'm kind of like playfully thinking about video or film as a medium. But when something's about artistic expression, art, I think, tends to have a way of reacting to the technology of the day, to the culture of the day in ways that surprise us. I think about like the invention of the camera and how that changed painting, the medium of painting forever. And people were just not interested in photorealistic painting after the invention of the camera. And so like, I bring that up to say that like art always reacts to technological advances in ways that surprise us. I think like, and then you might be like, yeah, art, but we're talking about content, but it's like, artistic expression kind of content will change first. And there will be people who do very creative things in this moment that are unexpected and surprise us and raise the quality bar. And then there will be businesses that see that and say like, I want a little hunk of that. It's like the Miranda Priestly, is that like the Devil Wears Prada, where she's like, this designer over here decided that this cerulean blue that was in their spring collection and then you buy it in a bargain bin at Marshall's. And that's the way that content works too. You'll have people that are like interesting, exciting, truly creative, have an aesthetic eye that are going to do interesting and exciting things, both with this technology and in response and defiance to this technology and to this moment in culture. And that will inspire all of the then marketing people to steal from that aesthetic and that response. And so like, I don't, It is easy to look into the future and see our nightmares. And people pitch me on things like, there won't even be human creators anymore. What's going to happen is you're going to stare into your phone and there will be a seed idea of a video. And then based on where your eyeballs go, it'll generate more and more video in a way that makes you never want to look away. And I'm like, oh, like in Infinite Jest, like David Foster Wallace basically told us this would all happen in the 90s. And it's like, maybe that'll happen. It's easy to look into the future and see our nightmares, especially in a world of skeptics. as technology as the one that we're in right now. But I'm actually really excited to see what artists and creative people do with this technology and in response to it. And I think that there will be really exciting things that come out of it and that those are the things that will win in like in the marketplace. Yeah, even in a world of slop.

[1:18:52] Nathan Labenz: I love an optimistic vision for the future. So I think that's a great note to end on. Laura Burkhauser, CEO of Descript. Thank you so much for being part of the cognitive revolution.

[1:19:01] Laura Burkhauser: Thank you, Nathan.

Episode Outro

[1:19:08] One job, one job.

[1:19:14] I need the revolution.

[1:19:26] Them say every pixel a nightmare, every frame a ghost.

[1:19:32] Said the algorithm made the canvas Said the art is toast But bad art is the road, yeah That's how you grow First stroke always crooked for the river Start to flow, oh Soul I gone to Hollywood pressed against the head But the painter keep on painting Working with the thread, oh Slap, nah slap, cause it ugly or it new Slap is when the money pull the strings right through At scale, at scale Them feeding the machine, but we putting our thumb, yeah At thump on the scale Thump on the scale Thump on the scale Cognitive revolution, we all set the tail Hierarchy of hostility, them drive about the line.

[1:20:30] Love the button at the avatar, fear the next design.

[1:20:35] Taking crazy pills, they're telling me it's good. But it's up, and I hate it, am I losing where I stood?

[1:20:45] Whoa, now the tool is young, my hand's still learning how.

[1:20:52] Spend time in other medium, find your voice right now.

[1:20:58] Slap, nah, slap, cause it ugly or it new, slap is when the money pull the strings right through, at scale, at scale, them feeding the machine, but we putting the thumb, yeah, a thump on the scale, thump on the scale, thump on the scale, cognitive revolution, we all set the tale. Camera, I come to painter, change the game Never paint the same, never stay the same Art always answer, art always talk back To the tool at the moment, that a creative attack Easy if you see nightmares, easy if you see the fall But I'm excited to see why the artist do it all Tastes are sellers listening, air to the room Tastes her thumb on the scale that cut through the glue Not some penalty robots picking pretty blonde lady Woman hand on the wheel keeping it all steady From the bargain pen and marshalls to the spring collection sheet The artists move first, marketers just follow the scene Slop, not slop, 'cause it ugly or it new Slop is when the money pull the strings right through At scale, at scale, them feeding the machine But we put in a thumb, yeah A thump on the scale Thump on the scale Thump on the scale Cognitive revolution, we have set the tale Play and curiosity, that's the only way.

[1:22:42] Bad art today, good art someday Don't punish care, don't punish care Cognitive revolution don't ever fail

Outro

[1:23:02] If you're finding value in the show, we'd appreciate it if you'd take a moment to share it with friends, post online, write a review on Apple Podcasts or Spotify, or just leave us a comment on YouTube. Of course, we always welcome your feedback, guest and topic suggestions, and sponsorship inquiries, either via our website, cognitiverevolution.ai, or by DMing me on your favorite social network. The Cognitive Revolution is part of the Turpentine Network, a network of podcasts, which is now part of A16Z, where experts talk technology, business, economics, geopolitics, culture, and more. We're produced by AI Podcasting. If you're looking for podcast production help for everything from the moment you stop recording to the moment your audience starts listening, check them out and see my endorsement at aipodcast.ing. And thank you to everyone who listens for being part of the Cognitive Revolution.


Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.