AI is Supercharging Writers with Sudowrite's Founder, James Yu

Watch Episode Here


Listen to Episode Here


Show Notes

In this episode, Nathan is joined by James Yu, Founder of Sudowrite, an AI writing tool.They discuss how James started Sudowrite after using GPT-3 for his own fiction writing, how Sudowrite is able to be a supportive writing partner, and Yu's experience developing its tech stack and working through the backlash he's received. If you're looking for an ERP platform, check out our sponsor, NetSuite: http://netsuite.com/cognitive

Subscribe to the show Substack to join future conversations and submit questions to upcoming guests! https://cognitiverevolution.substack.com/

SPONSORS: NetSuite | Omneky

NetSuite has 25 years of providing financial software for all your business needs. More than 36,000 businesses have already upgraded to NetSuite by Oracle, gaining visibility and control over their financials, inventory, HR, eCommerce, and more. If you're looking for an ERP platform ✅ head to NetSuite: http://netsuite.com/cognitive and download your own customized KPI checklist.

Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.

LINKS:
Sudowrite: https://www.sudowrite.com/

X:
@jamesjyu (James)
@labenz (Nathan)
@eriktorenberg
@CogRev_Podcast

TIMESTAMPS:
(00:00:00) - Episode Preview
(00:04:28)– How James started using GPT-3 for fiction writing in early beta
(00:06:24) – Working within the constraints of the small context window in early GPT models
(00:08:47) - Using few-shot learning to get good results from early GPT-3
(00:10:25) – Focusing messaging on being a supportive AI writing partner
(00:12:08) – Helping fiction writers feel less lonely through pseudo-conversations with AI
(00:16:01) - Sponsors: Netsuite | Omneky
(00:17:22) - Evolving from paragraph-level assistance to full story understanding
(00:20:19) - Building the product for centaurs: half Human, half AI
(00:21:50) - Trying to create an "IDE" for fiction writing to reduce repetitive work
(00:32:15) - Using multiple models together to "kitbash" the best outputs
(00:39:21) - Iterating on prompts over time to get a customized "garden"
(00:45:18) - Keeping temperature high for creative writing, but avoiding cliches
(00:54:21) - Separate paths for professional writers vs casual creators
(00:55:06) - The future of mass-customized and personalized media
(00:57:30) – The continued role of human-centric art and culture
(01:00:10) - The potential for automation to replace repetitive jobs
(01:07:48) - Yu’s experience launching an AI writing tool and the backlash he received
(01:16:24). - Advice for future AI product launches to avoid or minimize backlash
(01:17:17) - Sudowrite's funding and team
(01:18:25) - Focusing on delivering value to customers during controversy
(01:22:00) - Monetization challenges for AI startups and the role of churn
(01:28:44) - The idea of an "AI bundle" to reduce churn and customer acquisition costs
(01:42:06) - Recommendations for underrated AI apps - Claude, Pika, Midjourney

The Cognitive Revolution is brought to you by the Turpentine Media network.
Producer: Vivian Meng
Executive Producers: Amelia Salyers, and Erik Torenberg
Editor: Graham Bessellieu
For inquiries about guests or sponsoring the podcast, please email vivian@turpentine.co

Music license:
F48RXAQQS8BHOJ3G



Full Transcript

Transcript

James Yu: (0:00) Sudowrite is serving customers who are capital W Writers. I think in this sort of medium term, these LLMs do open up the possibility that, hey, if you've never even considered writing a story or a novel, tools like Sudowrite can help you achieve that. We can let you give us both the 50,000 foot view all the way down to the leaves of the scenes themselves and who's in the scenes. We're not out here to replace writers. We're out here to create tools. We are writers who are giving us feedback all the time about how this helps their lives and lets them be more creative. We work hand in hand with novelists who literally, some of them cried tears of joy when they actually got this in their hands.

Nathan Labenz: (0:42) Hello, and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas, and together, we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my co-host, Erik Torenberg.

Hello, and welcome back to the Cognitive Revolution. James Yu is a co-founder of Sudowrite, an AI writing assistant built for fiction writers. I first encountered James and Sudowrite on Twitter back in May when he introduced a number of new product features with a video. And for some reason, presumably stemming from growing fears of AI replacement in general and the ongoing Writer's Guild strike in particular, he ended up getting an absolute wave of hate and vitriol in response. The video, which has now reached more than 8 million people, has one of the crazier ratios you'll see, especially if you take the time to watch the video and try the product as I did and see just how thoughtfully James and team have built Sudowrite.

Fascinatingly to me, their homepage headline speaks not to AI capabilities, but to human needs. They describe the product as the non-judgmental, always available, willing to read 30 drafts AI writing partner you always wanted. It's clear from this conversation that James is someone who loves both AI and writing, and he's building something that helps him write better. So we focused mostly on how they've built the product over the last three years, starting with the original GPT-3, which they do still use in some cases. And then we also cover various language model tips and tricks that may generalize to other applications, my idea for an AI commercial bundle, his experience of anti-AI online dogpiling, and more.

As it happened, just as we were recording, news broke that a deal had been reached between the Writers Guild and the Hollywood Studios. And James has since publicly endorsed the AI portion of the agreement, which seems to protect writers' traditional contractual rights and prevents studios from forcing AI tools on them, while also preserving the writer's freedom to use AI as part of their creative process going forward. I'm sure we'll talk a lot more about this precedent-setting agreement in the future. But for now, if you're enjoying the show, please do send it to a friend. We do a super wide range of episodes, and my hope is that each one can serve as a niche entry point into the broader AI discussion. But for that to happen, I do need some distribution help from the core listeners. So please send this one to a fiction writer in your life. And with that, I hope you enjoy this conversation with James Yu of Sudowrite.

James Yu, welcome to the Cognitive Revolution.

James Yu: (3:27) Thanks for having me.

Nathan Labenz: (3:28) I'm excited to have you. You have built a cool product and had a bit of a dramatic summer since you launched the most recent version of it, and I'm excited to get into all of that. I guess I should say right out of the gate, you are a founder of the company Sudowrite, and this is a company that helps people create long form content. And you've been doing it for a couple of years already. Just in checking you out, I saw that some of the quotes on the website were from a launch back in 2021, when you got some really nice coverage, including mentions in The New York Times and The New Yorker and so on. So I guess, for starters, I'd love to hear what you saw early on in the GPT-3 era and how that translated so quickly into a product. I think you're very much ahead of the curve relative to most who are getting into it a little bit later now.

James Yu: (4:22) Yeah. I think we've actually been around about three years, and it was mostly in private beta at that point. But yeah, as soon as GPT-3 came out, I was at that time heavily writing fiction, focused on science fiction. And when it came out, I was seeing all these people just create memes and all these things with GPT-3. This is the first model, right? DaVinci version one. But even back then, I sort of asked myself and my co-founder, Amit, can this actually be helpful for creatives? And that's really the heart of what we've been working on.

And I started using it in my own writing, and I thought it was helpful. So then we started building Sudowrite really as a toy back in late 2020. And even in that nascent state of GPT-3, the hallucinations and all those things actually were a great feature, not necessarily a bug for fiction writing. So I think that's actually part of why it was working for us. And yeah, we also were very involved with fiction communities and just really started seeding Sudowrite in this way of writing with writing groups and grew just very organically from there.

Nathan Labenz: (5:36) Yeah. Boy, it takes us back into the history of language models, which folks who've listened to this show will know that progression. And if anybody wants a recap, our Riley Goodside episode is probably the best history of models and how they were fine-tuned and how that plays out in terms of how users can best interact with them.

But I'm struck, going back again to that DaVinci 1, as you said, if I recall correctly, the context window was just 2,000 tokens in that original version. And also, you didn't really have any of the instruct training. So you really had to set something up in this sort of highly suggestive, autocomplete sort of structure. And that doesn't give you a ton to work with, right, if you're limited to that narrow context and trying to write long form fiction. So I'd love to hear how you bootstrapped your way into getting that to work for you and then how you started to turn that into a product where, obviously, it does a lot more now than it did then.

James Yu: (6:41) I think a lot of it just came down to user expectations. So actually, our first feature that we had was called Wormhole. It wasn't called Write or all these other things. We called it Wormhole because, hey, this is what five other versions of you in other multiverses could possibly write for the next paragraphs. And for us, that really set the tone of, okay, it might be a little wild. It may not follow your tone exactly, but it might inspire you, right? So we set the expectation that, hey, this is not always going to be precisely what you will write next. So then our target really was when you were just absolutely stuck, right, when you're writing something, to just get you unstuck. And maybe it's just the wrong thing. And we actually put that on a pedestal to say, hey, that may actually be good, because if it writes the wrong thing, it maybe gives you an understanding of how you want to write the right thing, how you want to actually approach this.

So yeah, actually, with that expectation in mind, the constraint of the smaller context window wasn't really a problem. It was more that you're at this cursor, it reads whatever, about 1,000 words or so, just gives you the next 500 words. But we don't insert it straight into the document, and that's the other thing too, and we still do that today. We give you cards so you consider them. You should read them over. You shouldn't just immediately insert it.

So yeah, I would say it comes down to those constraints. And obviously, that's very localized, right? And that's working with DaVinci, was really works very locally. And as the models got better, the instruct models, and obviously today with GPT-4, we've been able to widen that lens so that we can help you with a larger part of your story, not just the next paragraph.

Nathan Labenz: (8:34) Yeah. Very interesting. Do you remember, just out of pure curiosity, any kind of clever prompt tricks that you used in those early days to get the right kind of output back?

James Yu: (8:46) Yeah, it was a lot of few-shot things, right, just giving examples. Funny enough, we'd give examples using, for example, we had a Describe tool that would give good descriptions based on something that you highlighted in your text. And that's something that fiction authors need to do to give very rich descriptions. And what I would do is I'd give a few shots from my own stories and how I describe things. So I would actually say the first iterations of that tool was probably more like how James would describe this kind of thing in his...

Nathan Labenz: (9:18) You broadcasted yourself in your own product.

James Yu: (9:21) Right, exactly. But because we also took the context of what the user is giving in their manuscript, it didn't matter too much. It was like giving the AI a bit of a flavor of, okay, you want me to go into this literary latent space? And it was really just kicking into there. So there's a lot of tricks like that, of just giving few shots, doing a lot of, I would say, prompt engineering is one thing, but I think prompt evaluation was a big thing. We did so many panels between me and my co-founder and our writing groups that were very invested in helping shape this product, getting their feedback on the wide diverse set of fiction and fiction writing was very key in those early days. And today as well. We still do that today.

Nathan Labenz: (10:07) Yeah. Interesting. Okay. Well, I want to get into a little bit more of the details of the product and how all this is continuing to evolve. But for a little bit more context, I just noticed the headline right at the top of the homepage, which is very colorful: "The non-judgmental, always available, willing to read 30 drafts, AI writing partner you always wanted." And I guess what jumped out to me about that was how sort of non-utilitarian it is. You know? I mean, you could imagine another version of it that's like, "This thing writes the best copy. You buy it now." Or some maybe more artful but still clearly purpose-oriented version of that. This is much more about the need of the author, really. It speaks to the emotional needs of somebody who's maybe, as you said, stuck in the writing process.

I'd love to hear a little bit more about how... was that something that you started with originally? Is that something users have led you to? Who are... I mean, you mentioned a little bit about your early users being committed writers, but I wonder how that's evolved over time. But that just really caught my eye as a super fascinating way to position the product.

James Yu: (11:24) Yeah. For sure. Yeah. And this has remained consistent. For example, it compares to an enterprise writing tool that is trying to help you write emails. Right? In that world, you're not really thinking about maybe the emotional connection, but more about productivity. Right? And so that wasn't really the reason why I started using GPT-3. It wasn't necessarily so that I get more words per minute. It was really about getting me into the right headspace.

So based on a lot of our early interviews, it's funny. We're talking to a lot of fiction writers and early beta customers. Some of them would say, "Hey, I love your tool. It helps me get unstuck. But actually, the other part that I found is it makes me less lonely." Because as a fiction writer, most fiction writers are lone wolves. They're writing on spec. They don't have a writer's room. They're not like the top 2 percentile like Hollywood writers, so they don't have that luxury. And maybe they have a writing group, but even then, it's very asynchronous, and most of the time that you're spending is yourself, your keyboard, your brain, and that's basically it. Right?

So they began to really anthropomorphize the AI as in, "Hey, this is actually reading my stuff and giving me relevant suggestions. It feels like they are reading. I feel less lonely." And a lot of them, some of them are saying, "Hey, even my partner, my life partner won't even read my manuscripts anymore, or I'm too embarrassed to show my first draft to my writing group, but I'm not embarrassed to show it to an AI," right?

And so when we started hearing this over and over, I think that's where we really... this seed of, this is not merely, "Give me the next few words of my manuscript" kind of thing. I think it goes beyond that. It's really a different way of you working with your ideas and a way for you to connect with your own work, right, through the lens. I always think about these large language models as a cultural lens into maybe adjacent texts in the latent space that you may or may not read, or maybe, for example, you go into a library and you read an author that really speaks to you. These language models can fill that space as well. So, yeah, we've always spoken to that part of the messaging.

Nathan Labenz: (13:51) Yeah. That's really, really interesting. It reminds me a lot of an episode we did with Eugenia, the CEO of Replika. And one of the things that she said, because they also started, not even just early in the GPT era, but actually pre-GPT models with their virtual friend product. And I'll always remember when she said to me, "We knew we recognized early on that we might not be able to create an AI that could talk, but we could create one that would listen." And I was like, wow. That just so profoundly speaks to the perhaps often neglected by most technology products needs that people in fact have. And it sounds like you have a similar relationship to your users. So I think that's really super fascinating.

So okay. So most of these folks are long tail independent, which makes a lot of sense. Right? I mean, most fiction is probably written by folks that have never actually sold a book but are doing it for the love of it. And I can certainly imagine how that would be a bit lonely at times. How has the product evolved to today? I mean, when I checked it out, I think one of the elements that was most striking about it is that you've built up both a pretty powerful hierarchical system for cascading, starting with a kernel of a story idea and then really developing that across a number of different dimensions and even cascading down into chapters and then paragraphs and ultimately, obviously, it cashes out into word by word.

I'd love to hear how you describe that. And then I also want to get into a little bit more of the different angles that the product presents because there are sort of character-based lenses on the work. And you also described this rich description based angle. And yeah, I think it's all super interesting. I'm not really a fiction writer myself, so I'm kind of learning about the craft also through studying this product. But yeah, take us through how the product has evolved and how people get the most value from it today.

Hey. We'll continue our interview in a moment after a word from our sponsors.

James Yu: (16:04) Yeah. Sort of going back to that DaVinci 1 world where it was very localized. The big unlock that's happened with GPT-4, all the newest models, is that now we're at a point where the AI can really understand a story from the 50,000 foot view. So I see it as we are evolving from that local paragraph to maybe multiple paragraphs to scene by scene understanding and now to the whole story arc understanding.

And so what we're trying to build up, and we're still, I think, in beginning phases of this, as you mentioned, this hierarchical thing. The way we structure it is that if you use Story Engine, which is our new hierarchical sort of way of writing with Sudowrite, how do we get an author to mind meld with the AI in the right structure, in the right format, so that the AI has the right context at all times. Because even with the newest context windows that are out there, I know Claude also has a 100k context window, but the pricing doesn't really work out in many cases. Do you want to always be using your whole novel as part of the inference? Probably not, right?

So even in that extended case, it's better to give tools, at least to today's state of the art, for the author to give the right context to the AI at the right time. And so, yeah, when we're working in this very localized space, when it's just, "Give me the next paragraph," we had little fine-grained tools to be like, "Oh, give us the sense of what is the tone of this scene? Is it ominous? Is it happy? Maybe give us some idea of who's in it." Very small amounts. But now with Story Engine, in this new hierarchical way, we can let you give us both the 50,000 foot view all the way down to the leaves of the scenes themselves and who's in the scenes.

And so this allows the AI to pull out the relevant information so that it can do whatever the task is that you want it to do. For example, is it ideating the next plot point? Is it just, "Yeah, give me an idea of the next paragraph or so?" But I can do that now where it feels less like a slot machine, because in early days, DaVinci 1 is full on a slot machine, maybe one out of 50 times. It's like, "Wow, this is a really big gem that I'd love to add to my manuscript." So we're trying to battle this slot machine problem, which is, can we get it so every one or one out of two pulls, you get something that I'm like, "Oh, wow, this solves a problem that I have in my storytelling," where it unlocks something.

So I see it as humans and the AI sort of meeting in the middle, and not trying to get the AI to do everything, right? Because I think that's, especially a lot of the Twitter demos that are out there, they're like, "Oh yeah, the AI can just do everything." Well, that's not the point, really. I mean, we are really building this product for centaurs. So half human, half AI, or whatever percentage you want, but with the human at the head of the centaur, not the butt, because I think that's when we lose the plot on why are you even writing fiction? That's what you're doing, right?

So the human's at the head and we give the human as much control at whatever fidelity they want, right? So this is the other big challenge that you mentioned, how do fiction writers work? Well, they work in as many ways as there are fiction writers, because every fiction writer has their own way of writing. Now, this is in contrast to, for example, going back to writing corporate emails. There's probably less ways to write a corporate email than there are to write a novel. That's the challenge of training this type of tool, because we want to make it flexible enough so that even if an author doesn't adopt Story Engine, doesn't adopt parts of our system, it can still be useful. And that, I would say, is our primary UX challenge, delivering this. How do we work in the myriad ways that fiction writers do and at least be good enough for a good baseline of how people are writing, whether they're free writing or they're outlining first and they like to get everything in place first? We want our system to work in both of those ways.

And from a programming part, I'm a programmer and also a writer. I think about Sudowrite, what we're aiming for is almost like an IDE for storytelling. Couldn't really make that happen before. Writers have used lots of different kinds of templates and Excel spreadsheets, but it needed a language model to make this actually happen because to actually pierce the veil of what is actually happening in these scenes, and now it's possible. And that's our north star internally. Can we make it as programmers have had IDEs and all these things that help them structure their programs for ages? Fiction writers and authors have basically nothing. They have to do a lot of that slog themselves. So how can we reduce that slog so they can focus on the part they love most, the storytelling part?

Nathan Labenz: (21:14) The IDE thing made me think about the Copilot thing from a few months ago where somebody sort of deconstructed the prompt that Copilot uses and figured out, how does it go around your file structure and what does it grab and what does it stuff into place? And so I'm going to try to riff a little bit. You tell me where I'm going right or wrong in terms of how yours might be working now that we're getting to where this is becoming possible.

I guess in the two modes of writing, right, there are some people, as you said, who start with an idea, and then they maybe flesh out an outline and whatever, and you can help them with that. And then I imagine as you get down to the leaves, as you put it, of the actual individual pages and the paragraph by paragraph, what's happening to help generate ideas in that context. I imagine that you would have some sort of summarized view of the book as a whole. Right? Like, we're writing a book that... and interestingly, some of this stuff might not even be generated by the author. I'd be really interested to know if there's any sort of summarization or characterization that may be going on in the background that is then fed into a downstream prompt.

But again, if I was trying to build something like this, I'd be like, we're writing a certain kind of book. If that's not clear, I'd maybe use a summarization or something to try to get some sort of headline statement. The book consists of this many chapters. Maybe here's an outline of the chapter titles or something. We're currently working on a chapter 22 scene, whatever. Here's a little bit of the context immediately before, and maybe if it exists, here's a little bit of context after. Your job is to fill in the blank. Is that roughly what you're doing? Or how would you complicate my base analysis there?

James Yu: (23:07) The main challenge is context stuffing these models to give the right information at the right time for the right function that you're doing. For example, if you want a good description of this character in a scene, you might need to know the fears of this character, right? Or maybe what this character has done in the past. Maybe not, right? But you need to be able to give those pieces to the AI so the AI can make a decision on, "Okay, is this what I need? I can just throw this stuff away." But it's very challenging, right? Because, for example, there may be very nuanced things that happen.

Well, here's an easier problem. It's like, this character died in chapter 6. Probably should not reintroduce this character in chapter 8, right? So those are the kinds of problems that we are working on right now, which we have parts of it, but in the ultimate case, I think what we want to be able to do is give the AI an understanding of the entire timeline. And this is more detailed than an outline, like a timeline, where the characters are, the state of the world. And then given the states of the world, what are the global contexts that you bring in that are static? For example, this character is afraid of fire or something like that. Or what is the premise of the novel? What's the genre? What's the style? And really, all this state is being fed into some decider that figures out, for the AI, which of these things are the most relevant. And some of that is heuristic.

Something that we're also starting to do now is more doing a semantic search or vector search across these things to give the right context to the right LLM. Because this is also trying to understand the different characteristics of LLMs. What are some things good at and what are some things not good at? For example, GPT-4 is great at being consistent and following rules, but man, it sounds like C-3PO. Sometimes it's not really the best creative prose writer, right? But then Claude, from Anthropic, is very good at creative writing, but maybe less good at following rules. So I would say there's a big context piece, but then there's also a piece of understanding the pen, if you will. I consider these different LLMs as different styles of pens. And mixing and matching them, some authors, man, they really love using Claude functions. Some authors, they have very distinct opinions about this, which is very interesting.

But for us to be able to provide good baseline is we need to really understand the characteristics of these models at the prose level. So I would say it's a combination of those two things. But that first piece, there's a long road. I think that ultimately, we're still building to that system that can really... how do we, almost like a time, break down the novel writing process into these constituent atoms, like these components of how I'm approaching a scene as a human. We sort of use that as a guide for, okay, how would we get the AI to also approach this scene like a human would? How do we give it the right context? Because a human can also... we can't fit the whole novel in our head as well, so we also have a context window limit, right? So we're doing the same sorts of processes. So we're taking an almost anthropomorphic view behind the scenes in our algorithms.

Nathan Labenz: (26:46) Let's talk about these leading models because you touched on that. I think that's a super fascinating thing. Right? We've seen this progression from GPT-3, world's biggest autocomplete, this sort of savant-ish thing that could pick up anywhere, and all of a sudden you're in the middle of an internet forum or whatever. And now we've got way more dialed in RLHF'd, RLAIF'd, much more consistent.

I was kind of expecting that you might say that both GPT-4 and Claude 2 kind of had some problems in this respect because I do find them to be... I don't get a lot of variety. If I run Claude 2 over and over again, I feel like I get almost the same thing back every time no matter what, which doesn't seem like maybe what you would want. Maybe it is a little bit... I do think it's, in my experience, a little bit better at imitating a user's style compared to GPT-4. But I kind of had expected that you were maybe going to say, actually, we use 3.5... we maybe do some background processing or whatever with the frontier models, but then we actually get maybe a little better results from the one step down, like the 3.5s or the Instants because they're a little less RLHF'd to death, so to speak, or maybe even open source models, because access to the base model can really open up, obviously, the window of possibility.

But it sounds like... give me more on what models you're using and the mix. And in particular, if you think that there's something I can be doing to get more diverse results out of the frontier models, I would love to understand that for my own purposes too. James Yu: (28:26) So to be clear, we are still using a lot of DaVinci models. I know they will be sunsetted soon, but I actually still like them because for certain use cases, RLHF has this characteristic, as you're saying. It smooths out all the rough edges and just makes things a little bit more predictable in some ways. I like to say the C-3PO is like, "First, we must consider this and blah blah blah." You can sort of smell that from a mile away. So we still mix it up. We still use a variety of DaVinci models behind the scenes in tandem with the big ones, the GPT-4s of the world. But there are also ways to—it's jailbreaking, but it's like trying to get it into a particular latent space is, I think, less effective than DaVinci, but it is possible. And there are ways to describe certain types of styles which you want in a literary sense, which can get you to have more varied styles.

So one of the things that we have on our site is something called Match My Style, where you can essentially upload a whole scene of yours, and then we will try to get it to a prompt that will kick GPT-4 into a latent space that's at least closer to your style without having to fine-tune. I'll check that out because I think GPT and all these things have read a lot of literary reviews, so it understands the pairing between how a literary reviewer will review this particular piece of prose and the prose itself, because that takes you closer. It doesn't get you 100%, and that's actually a concern. And that's why we are now exploring more open source models and other things, because I think what I'm seeing in the market is that, especially for enterprise use cases, which makes a ton of sense for OpenAI, that they do want to smooth out these rough edges and solve enterprise problems, not necessarily solve creative professionals' problems. We might go down the path of, yeah, either fine-tuning or getting our own open source models where we can have more control. That's going to be in our books this year, to do more of that.

That being said, I think a lot of these people who are using it more wholesale to generate full scenes, for example, they're using that as a starting point. They're not using it as a final draft. This is the first draft. A lot of authors have been able to come up with tricks to get more varied style. One example is called kitbashing. So it's where you generate using two different models, and then you have a center column where you kitbash and just copy-paste this paragraph from here, this paragraph from here, almost like a pastiche. So you can combine sort of the best of, for example, Claude versus the DaVinci versus a GPT-4. It came from how artists do this too, like rotoscoping where they basically have a photograph and they just paint over it. So there's some new techniques now being discovered by authors using multiple models together, and a lot of them are also using their open source models as well in tandem with Sudowrite, more sophisticated ones, where really they're just taking multitudes of outputs and just mixing it, remixing it all together, which is really fascinating. I don't personally do that, but a lot of authors do.

Nathan Labenz: (32:01) Yeah. Really interesting. Okay. So just for super specific techniques, because I think people are always looking for little nuggets that they can apply in their own projects. When I try to get something to write in my style, and the main thing that I've done that for actually is just the intro essays that I write for these podcasts and then sometimes the Twitter thread that I'll post to promote it when we release it. I've gone back and just grabbed five intros from earlier podcasts that I feel pretty good about. Just put those in, writing samples one through five. Then I give the transcript of the current podcast. And obviously, this kind of surrounding instructions are like, here are five examples of the writing style. Here's the current discussion. Please write.

I would say I use Claude 2 for that because the context window basically requires that—the length of the podcast is typically too long for anything else. I find that even above an hour and a half is about as much as it can take. And if it goes much beyond that, I have to start to break the transcript into chunks or it somehow kind of loses the thread even though it does technically fit into the 100K context window. And then basically what I get out of that is, okay. I'm interested too in your kind of where you think the number of successes needs to be before people find it useful. You kind of said one in 50, back in the earliest days. And now, you know, maybe you kind of alluded to one in two or one in a small number anyway. For me, more often than not, I end up doing a full rewrite. It still sometimes helps me to get some kind of boilerplate down and something to react to. It's almost—I don't think we're quite to drinking game status on this yet, but I often refer to this Simpsons scene where Mr. Burns says he's at an art unveiling and the art is unveiled, and he says, "I'm no art critic, but I know what I hate." And I do feel like I kind of behave that way sometimes where Claude 2 gives me something back. And I'm like, that's not it, but it somehow gets me writing faster even if it's not it.

So that's kind of how I approach it with Claude 2. And sometimes I mess around with things like having it assess my writing style. Sounds like you're doing some of that. Also, kind of give a critique or an assessment or just a description of the style before then trying to go into that mode. I guess going over on the GPT side, the OpenAI side, I'm finding these days dumping all the kind of instructions into the system message seems to be the thing to do. Any coaching? How can I get my little micro app that writes these intros to do a better job for me?

James Yu: (34:52) So, yeah, in GPT line, definitely, you don't use ChatGPT. Use the playground. The system message does matter. So if you basically set the tone of what kind of writer you would like it to be, so you describe your style. But I would say the other thing to think about is if you're giving a very long prompt, and I think there have been studies recently, which is LLMs are kind of with a really long prompt, they just ignore the middle, kind of like a human would, which kind of makes sense with RLHF. So the most important bits of that prompt would be at the beginning and the end, and especially the end. So for example, how we prompt for style is that the style goes last, because you really need that to be the tip of the pen. And so if you're putting any style information, I'll reemphasize it again right at the end. Be like, and write this in the style of blah blah blah blah blah blah blah. You describe your style in detail. And that will yield better results in getting it to follow the sort of tone and writing of what you want.

Now, if you care less about the tone and more about the content, then you sort of ignore that part and be like, okay, we'll just get to rewrite it anyway. That depends on kind of your workflow. But I'll say a lot of authors, yeah, they are definitely rewriting a lot of the things that are coming out. So here's the thing. I'm not sure if you're doing this, but if you're asking it to not do something, sometimes it's not great because then it will do the thing. I call this the pink elephant syndrome. Don't think about pink elephant. You're going to think about pink elephant. So LLMs have this syndrome as well. If you say, don't be overly cliche, it actually might be more cliche. So it might be better to just give statements or even random quotes that are not cliche at all so that it gets into the latent space of, okay, you want me to be in this really creative space where I don't talk about cliches. Yeah, but it's a tough wrangling thing. And the evaluation part is tough, right, for text versus images. Images, you can just see things that are thought, and you're like, I can firmly evaluate that. But if you're generating paragraphs and paragraphs of this thing, you don't want to be pushing that lever too many times.

A baseline that we have through Sudowrite is about one in two, that an author would use something that comes out 50% of the time. That's the gold standard that we've found so far. But I think for individual basis, that number is just going to be much lower, because you might be writing this prompt bespoke each time. You might not be dialed into a particular set of instructions that work. And I think that's okay because you're sort of interacting with something in the chat interface. But I always say if you want to get more repeatable is to keep a spreadsheet of the prompts that work well for you and start iterating and making notes of, okay, this wording makes it worse. It's when you have to do this four times, two times, and just start iterating that prompt over time until you have this prompt that works really well for you, Nathan. The Nathan prompt, right? And so yeah, we're seeing a lot of authors who use ChatGPT in tandem. So they also have a whole set of prompts that they keep for themselves that works for the way that they work. And that may actually be a new practice for writing in general in the future, that you are not only sort of gardening your craft, but you have a whole garden of prompts that you're constantly evolving that are working really well with LLMs that you love. That just may become a common practice.

Nathan Labenz: (38:37) Yeah. That's a great transition into, I think, the second half of the conversation, and I want to kind of widen the view from the app and all the techniques. But before I do that, I just want to ask you a couple more things on the techniques. Can you tell me a little bit more about kind of the background information processing that you are performing on the user's behalf? Like, if I show up—let's say I have, you know, maybe a draft already in a Google Doc or whatever, and I'm like, alright, I'm going to come try this tool. And then the first thing I do is just paste in, you know, 20 pages of something I've already got. It sounds like you're doing a number of things to kind of figure out, okay, who is this person? How do they write? What is this? What are the characters that are in here? And I haven't actually done that exact workflow, so I haven't experienced how all that plays out. But it sounds like you're kind of coming in pretty hot and processing that information to kind of synthesize a bunch of things. Love to hear more about that because that too, I think, is something that may spark some creativity in others, well beyond the creative fiction realm.

James Yu: (39:49) Yeah. So we have, if you start with Story Engine, we do have a brain dump field, which we've always made free-form text, so you can put anything in there. So yeah, whether it's pages from your manuscript or Wikipedia articles. And so we felt very strongly that it needs to be unstructured, because I think, as a writer, you're going to have all these random notes coming in, right? But then that's the first step, because then from there, essentially, if you were to try to replicate this with your own project, it's like, how do you then get—because the AI is just really good at being able to start structuring these into a synopsis or an outline. So just asking it for that and then just start putting it back and back into the brain dump of what you have, it will get better and better understanding of the 50,000-foot view in a way that LLMs understand.

So I think that's the major challenge. I mean, I think our minds work very associatively, but when you're writing, you have to linearize it.

Nathan Labenz: (40:46) Right.

James Yu: (40:47) And when you're writing with an LLM, you not only have to linearize it, but you have to hierarchically linearize it. So I would say if you, for your own purposes, you may want to keep your own silos of, okay, this is the 50,000-foot view of what you're writing. This is now a closer up view. So each time, closer all the way down to the leaves. Because once you have all that sort of filled out, you can then pick which fidelity to give to the LLM at the right time. And that's kind of what we're doing in our back end. And I would say it's still very—I would say it's still v1. We're always improving that process. This is really tough to pick the right fidelity to get the AI to understand for this particular task.

But I think that's useful for writing practice in general, right? So you should be able to know kind of, keep your thesis in mind for this essay that you're writing, but also know in this particular section, how is it built into that thesis, but how is it also introducing new questions or new information that is going to get the reader to want to read more, continue reading. There's always a tension between those two things. No silver bullet there. I think the interesting part is that you can get LLMs to help you create these different fidelity documents, which will actually help you. Because I think that you reading that and parsing that yourself will give you a better understanding of what you're trying to achieve with the piece, right? Because I think thinking hierarchically will unlock, oh, wow, maybe this is not even the right thesis. It's better that you know that earlier than later. So I actually saw a lot of our users, when they go on Story Engine and they're like, oh, wow, now I understand, maybe my novel is not what I think it's really about. I need to question myself.

Nathan Labenz: (42:39) Yeah. That's interesting. I imagine too, you know, one thing that the language models are so good at is adopting all these different perspectives. I almost wonder if you could create a focus group style feedback where you could provide a whole bunch of different perspectives. I haven't done this on creative stuff so much, but done a little bit of experimenting with, if I tell the language model that it's a person and it has a certain belief, is it able to kind of come up with correlated beliefs that a person likely would have depending on what we kind of generally know goes together in people. And it is pretty good at that. So I kind of suspect that you could kind of call forth a bunch of different perspectives on a piece of work and give people a sort of diversified feedback that they otherwise might really struggle to get. And, again, that's probably just an interesting technique in general that extends well beyond creative writing.

Are there any settings—you know, this is, again, very technical, but people who listen to this are many of them building stuff. Obviously, different language models are big deals. Sounds like you're not yet into the open source and fine-tuning those, but likely coming. Are there any settings or other kind of techniques or tricks that you find to be particularly relevant? Like, I assume you have temperature up. I wonder if there's a different temperature setting. Does OpenAI versus Anthropic demand different temperature settings? Top-p is also one I used to use mostly to eliminate crazy long-tail stuff. I'm guessing you don't want to drop your top-p at all. But anything in that vein you think is kind of interesting or might show the way to others?

James Yu: (44:23) Yes. For temperature, we do keep it relatively high, but we also have a knob to give the user to dial that in. Yeah. Top-p and temperature do help kind of shape these things, but they also have confounding effects with each other. So that's the other thing. It's so multivariate at that point. For example, if you just put temperature really low, then you probably need to start fussing with your frequency penalties and your presence penalties, because it can create sorts of problems, like repetition, or it can introduce other problems. So I would say, if there's any advice, I would say, if you're fussing with one thing, you be aware that it may degrade performance for another thing that you're not even aware of, right?

So even for us, we're very cautious about changing temperatures and changing these settings once we have it sort of dialed in, but we do a lot of testing of these things to get statistical significance on whether things are good or not, for some definition of good, whether it's user actions and various other things. But it's tough because, yeah, for creative writing, you really do want off-the-wall ideas. Hallucinations are great in our world, right? So keeping temperature relatively high. And really, the big challenge that I see for creative writing is cliches. Getting the LLM to not be cliche is a harder task, especially with RLHF. Temperature or top-p—it's not one hammer that will fix that. And fine-tuning is another option too, but that has its downsides as well, where it can get too dialed into a particular world, right, and kind of the richness of having the full gamut is lost there. So yeah, I would say I don't have a very dialed-in silver bullet there as well. It's really you need to experiment a lot, and take notes. It's not an easy—it's not a trivial thing if you want to make it repeatable, good for you.

Nathan Labenz: (46:39) So are you still paying $0.06 per thousand on the original DaVinci? They haven't changed the price on that still, have they?

James Yu: (46:46) Yeah. I don't think they have. But most of our workload now has moved to either GPT-4 or Turbo. But there are definitely some things, yeah, that are still—yes, I think it makes sense. Yeah, they haven't changed that pricing. Although I think with the new—I think they're introducing a new model, right? Turbo Instruct, which may be a little cheaper as well. But yeah, that is the other challenge, the economic challenge. How do we balance context windows and model choices while keeping our prices sort of affordable for the average writer? That is a whole other—probably a whole other podcast of how do you even tune these things for pricing?

Nathan Labenz: (47:29) Yeah. There's certainly a lot that goes into it. I mean, most folks, I think, still kind of have the luxury of being like, just use the best model and then figure it out later. And, you know, we've seen enough of the price drops and can probably assume that more yet more price drops are on the way. Although, you know, if you're attached to an earlier model, and I've been there too, where it's like, yeah, I don't really want to—I have a fine-tune thing that's working really well, and the fine—you know, one even goes, oh, it's not six. It's now down to two. They had dropped it to two, but the fine-tune is still 12. So, yeah, excuse my mistake. Still 10 times. Even the base DaVinci, I believe, is still 10 times more than the latest Turbo. So it's a nontrivial difference. But, yeah, they did a two-thirds drop before the 90% drop. Yeah. The way the cost thing, I think, is definitely super interesting. The context does get long. You know, the Claude one at 100,000 tokens, it's like, yeah, that sounds great. Can fit the whole Great Gatsby into it. But every time you click that button, you're looking at, you know, potentially it can be a dollar, right, per generation if you really stuff the whole thing in there.

James Yu: (48:43) Sledgehammer to write one paragraph.

Nathan Labenz: (48:45) So, yeah, that can be noneconomical pretty quickly. Your comments about the kind of frequency penalty and whatnot too, I think—is that mostly an old model thing? Like, I guess what I would—how I would summarize my experience today is, I guess, for starters, what is temperature? What is top-p? If folks don't know, temperature is the parameter that governs how the final token is chosen based on the distribution that is generated by the model. Because the model at every step is not just generating a token. And most people probably will know this, but instead it's generating a distribution over all possible tokens, which could be like one on one token and zero on everything else, but generally speaking, is much more long-tail than that. And so then you have the question, well, do I just take the most likely token every time? That's temperature zero. Or do I kind of sample from this distribution proportional to the relative numbers? I believe that's temperature one. Although this stuff is not super well documented.

James Yu: (49:52) Yeah. I think you can go above temperature one as well.

Nathan Labenz: (49:54) The way I understand that is that you're then sampling even more from the tail than its weights would naively suggest. And then top-p is basically saying, okay. I may want to eliminate super long-tail, super unlikely things from being selected. Maybe I do want to have some diversity in my choices. I don't just always want to take the most likely token. But if there's something that's under 1% likely, then I just don't want to consider that at all. If you want that kind of behavior, then you can bring the top-p down to say 0.99. Then by definition, anything that is smaller than that delta is just not considered at all. Typically, today, that takes me far enough with the frontier models. And I would say only in kind of the earlier models have I needed to really mess with the frequency penalty and those other kind of things. Does that jive with your experience too?

James Yu: (50:50) Yeah. Mostly. I think presence penalty is powerful, because if you want to write things that are not sort of present in the prompt, right, it is helpful. But yeah, I would say overall though, it matters much more in the prompt than these parameters—tuning these parameters to micro settings. And that input is the bigger, the higher order influence to the quality of the output.

Nathan Labenz: (51:20) Use cases are everything. You know, most of the stuff I do is about trying to get the job done, right. And in your case, the job is writing, and it definitely opens up a lot of different considerations. So, okay, thank you for indulging me in some of this detailed stuff. I think it is—can be a little bit tedious to wade through, but hopefully, people take away from it some tips and tricks that they can use as they're building their own projects, even if those are just things that they're often building for themselves or a few teammates.

Turning then to the big picture. So I guess maybe for starters, I'd love to hear your vision of kind of the future of creativity. You alluded a little bit to it with the Centaur, maybe also the future of entertainment. One thing that really struck me from the website is like, hey, with Sudowrite, you can get a full novel written in a week or maybe even just hours. It's striking that, you know, in that case, certainly, the AI's doing a significant amount of the actual, you know, banging out line by line that—people may have all sorts of feelings about that, and the quality may vary widely. But notably, that is kind of opening up a possibility of the content can get generated faster than we can consume it. And so there's kind of this potential inversion of, instead of creative content always kind of being one-to-many, I'm really curious about this possibility where it can become kind of one-to-one or even choose your own adventure. But I'm putting too many words in your mouth. So forget what I said and just tell me what your vision for kind of the future of creativity and entertainment is.James Yu: (52:57) Today, we're serving customers who are capital W writers. I mean, these are professional writers, or they're self-published, but they consider themselves writers. I think in the medium term, these LLMs do open up the possibility that if you've never even considered writing a story or a novel, tools like Sudowrite can help you achieve that writing goal that maybe was out of reach. Maybe you didn't have all this craft knowledge that you needed before. What we're hoping is that through using tools like ours, you will upgrade your craft along the way, because you'll see a lot of storytelling tricks that are happening and foundations there.

Now, if you go even further in the future, I think there will probably be a possibility of this one-to-one kind of thing happening. I have a 7-year-old, and we use GPT sometimes to do storytelling and just play around with it and tell bedtime stories. One thing to consider is that when you're a 7-year-old, the things in the world are default. So he's coming into a world where these intelligent agents are default. They're not new. They're just like, "Yeah, of course, dad. AIs can storytell. Of course, they can paint too. But I can do it too." He loves drawing. He also loves telling stories, but he doesn't have this adversarial relationship with it. He's like, "Oh, maybe the computer can help me, or maybe I can flesh out some ideas." It can do that at scale and personalize.

So I think there are a few things that will play out. One thing is this personal kind of craft, not scaled kind of media production. For example, I recently had a birthday party for my son with a Zelda theme. And I created a whole AI video where I incorporated his face on Link, and it was a scavenger hunt with AI voice where Zelda was talking. The kids were like, "Oh my god, how does Zelda know my name?" I'm not going to share this publicly. This is really only just for me and the other kids that came to the party. But this is something I would not have been able to do before. I wouldn't have hired voice actors to play Zelda for this, but with AI, all this is possible.

So I think for this kind of micro media, where it's not about sharing it to the world, it's possible now. And that's pretty cool. I think that won't necessarily be fully monetized in a scalable way, but I think it would just be akin to making your own website. Very easy to do.

Now, at the other end of the spectrum, will there possibly be the rise of a parallel track of novels or films which are fully AI generated, or at least AI generated but with a producer, a human producer in the middle of that? I think that, to your point about generating content more fascinating than you can read, we're sort of at that space already. You can't read everything. You can't watch every film. But I think there will probably even be emergence of a film created just for you, Nathan, that has your interests in mind. That is a possibility. I'm not fully certain that will happen at scale still. I think it will be a little niche to do that, just because I think there is a water cooler effect. In that world, we will still seek out things created and directed and produced by humans, not just fully automated. I think there is something to that where we will still look at that. So I think it will be a parallel track.

Beyond that, I think it's a little harder to make predictions there. I wouldn't say it's all going to be AI generated or all human. It's probably somewhere in the middle. And maybe there will be a rise of a different kind of media that can only be possible via AI, whether it's like an actual adventure or a fully immersive game that is generated on the fly with particular story arcs. And it's just impossible to have humans creating that on the fly. But I think there will still be books. There will still be films. There will still be people writing books the quote unquote old fashioned way. The old fashioned way using Sudowrite. But in that world, it's still stringing together words.

These things have been predicted to kill books all the time, but actually reading has gone up. If you look at the past few years, book sales have gone up, even in this world where TikTok and all these kind of things exist. I think that will still be a cornerstone.

Nathan Labenz: (57:56) I always say everything everywhere all at once for kind of all of this stuff. So I totally agree that there's not one outcome that everything coalesces into, but rather even more diversity and choice in media than we already have, which is already pretty extreme.

Your comment about the blurring of the line between what is quote unquote productive or economically valued work versus just kind of similar, maybe even the same activity that is leisure. I expect that to be a pretty interesting development, and I feel like you're right at the center of it in as much as if a lot of jobs that people don't want to do can become largely automated. I think about, not that many people want to get up and go work at a call center all day and take 30 calls an hour that are largely repetitive or whatever. I see a lot of that kind of stuff going away, and then there's this kind of fear that we just sort of consume, consume, consume, and we become extremely passive and low agency.

And this feels like there is kind of a positive vision in here somewhere of being creative to the degree that you want, being exploratory, having stuff that's interactive, generating stuff for you that's engaging and interesting, but also giving you opportunity to shape it, to craft it directly, to choose options and explore in a way that's still meaningful. I'm really interested in that space.

It's a big assumption, obviously, that a lot of jobs are going to go away. But I do think a lot of jobs are going to go away. That's not to say that necessarily employment drops because we may classify a lot of new things as jobs in some sense, like creating these bespoke birthday parties for your neighborhood kids. But I do think we need to develop this vision for what is a positive interaction mode that we can have, how we're going to spend our time in relationship to all these new AI tools that isn't about purely eliminating or purely being more productive, but it's a new, fun way to be and explore.

So I really love that you're creating that and also doing it in the context of writing. It's kind of meta in that respect. Are there things that you would recommend that we check out? I always lament that positive visions of an AI future are pretty few and far between. But you strike me as one of the people that I should be asking. What should I read or engage with, whether it's a novel or otherwise, to inspire myself about what the future might look like?

James Yu: (1:01:14) That's a good question. I love sci-fi. Because I feel like sci-fi is this meta thing because a lot of the people who are creating technology are reading it, and so it comes out in the products. They are designed from these worlds, which were actually written 40, 50 years ago. So I would say it's not... if you want to discover... I mean, sure, you can go on Twitter and see all these demos of new AI things. I think it's hard because there's so much noise there.

So I actually like reading older books to see what were the visions of the future there, because I think it's easier then to extrapolate what would be 30 years from now if you look at what people thought it was going to be, whether it's reading Gibson, or reading different sci-fi like The Three-Body Problem, just exposing yourself to some very different points of view, both culturally as well. I think that can widen what you can see as the possibilities of AI.

To your point about elimination of skills and jobs that are necessary, it's kind of funny because I do think novel writing will still be there. So it's funny enough that maybe in one version of this future, we can just indulge our creative pursuits, or at least give more time to them. So I think that's a positive outcome here.

And I think the other point about a lot of these AI systems is controllability. Especially the earlier versions of all these things, whether you're talking about Midjourney, whether you're talking about Sudowrite or ChatGPT, we're still in this pre-iPhone moment. And maybe ChatGPT is sort of an iPhone moment, but I still think that there will be another iPhone moment where an AI totally understands your brain state and understands your intent and can do it and you can control it and not feel like it's a slot machine. Because I think once we get to that point where we have this melding, I think it solves a lot of problems with agency. So I think that you can now fully express what you want to create. And you could also fully express, as a person on the job, the jobs that need to be done. But you've put humans in control of that.

Now, I'm not sure how that squares with, well, if the AI just does all the programming for us, and we just take the human totally out of that loop. I do have some worries about that, probably not to the same spectrum of doomerism that are out there, but I think at least some humans should be in that loop directing something, but there will probably be fewer folks there. So that is a problem that we will need to contend with.

There's always this joke that a lot of artists are up in arms about Midjourney, but then all the programmers are like, "Finally, I don't have to write a unit test anymore. I can just have AI do that." There's a little bit of irony in that too. I think that we need to be able to understand these systems, and maybe not understanding at the leaf level, but having some understanding at some level of fidelity in that chain, I think that will be important going forward.

Nathan Labenz: (1:04:40) There's a notion, I guess it's presented in some cases as an AI safety agenda called cyborgism, which is along the lines of what you're discussing here that might be of interest to folks to go check out. And it's what Elon Musk said was the motivation for starting Neuralink as well. It's the idea that we need to be, first of all, able to communicate with machine systems at a higher bandwidth than we currently do and potentially with higher fidelity to our internal mental states than we can capture with language. So the future could get very weird in those regards.

I saved this kind of to the end because I don't think it defines your story by any means, but I do want to touch on the experience that you had when I first reached out to you. I think it was maybe 2 to 3 months ago now. You launched a big update to Sudowrite and put out a video, 9 minutes or so on Twitter showing some of the new features and how it works. And it was an interesting thing because if I had seen that in total isolation, I would have been like, "Cool product. One of lots of things that are coming out that are using AI," and a thoughtful implementation certainly of how to apply these tools to the storytelling task. And that went pretty sideways for you, at least for a minute, when... I mean, you tell me why you think it happened, but timing is maybe an issue. And all of a sudden, people were like, "This guy is public enemy number one of writers." So I'd love to just hear that story from your perspective and see if there's anything that we can learn from that, because it certainly seems like the kind of thing... I don't know if people are getting canceled anymore by the woke mobs that some folks used to be concerned about, but now there's maybe an anti-AI contingent out there looking to cancel people. So tell me how that all went down.

James Yu: (1:06:53) Well, it's my first time really getting canceled. I definitely got dogpiled there for a while. I still stand by that demo. I stand by what I said. I think if I were to update with at least a little bit of nuance, I would say this is great at the first draft. It's not about writing the entire novel.

So this is the other thing. There is this division that is maybe more complex than even just anti-AI versus pro-AI. I think that there are also shades of... we have a lot of indie writers who are using us, and actually, their audiences, a lot of them know that they're using AI assistance for help. And they're like, "This is great. You can write 20 books instead of 10 books a year? Give me more of this." So there is really seeing that feedback and being like, "Hey, it's really capturing your voice well."

And I think there were a lot of folks out there who were saying, "Oh, how could you even find any novelist to even work with you?" We literally worked with hundreds of novelists who were begging for this product. So I think that was the other thing that I wanted to say that was lost in this tweet. It's just that this product was not born out of just my mind in isolation, just like everything else. We worked hand in hand with novelists who, some of them cried tears of joy when they actually got this into their hands. So it's a stark contrast to some of the reactions out there.

Now, that being said, I totally understand this third rail that's happening right now. I know writers are not getting paid and compensated well enough. For example, definitely in what's happening in Hollywood right now, looks like there's some deal that's going through. I think that's part of that timing and maybe the nuance in that video which was lost. And I think that writers should be compensated. They should be paid fairly for the work that they are making. But I think that question should be separated from the tools that they use. So Sudowrite is a tool. It's not... we're not out here to replace writers. We're out here to create tools for writers who are giving us feedback all the time about how this helps their lives and lets them be more creative.

That was the thing. We got so many influx of new users as well from that, just the awareness of like, "Wow, this kind of tool actually even exists." And even then, a lot of authors are not using us in that kind of way. Some of them are using us in a very AI maximalist way, where they are writing whole scenes, rewriting them, doing lots of things with it. Some of ours also are just using it in a very brainstorming way. Just like, "Okay, what do you think of the scene?" And not even using any words that are coming out of it. So I would say it's very diverse, as diverse as people using Photoshop back in the day. We also saw that sort of backlash too. Is digital art really art? No one questions that today.

Nathan Labenz: (1:09:55) You can find a few.

James Yu: (1:09:57) And obviously, that's happening with Midjourney and the art generation, but I like to shine a light on the creatives who are pairing this with their craft, and how can we build a tool that is mindful to that as well, and also pointing them in directions and increasing their craft? For example, we also run weekly classes with writers to talk about how to use Sudowrite for writing, but also just talking about how do you become a better creative writer. So we're really invested in this space. We haven't pivoted. Amit and I are both fiction writers, and we built this. We keep on building this thing from the core thesis for 3 years, and we're going to keep building this because we are really seeing that creative writers will see these kinds of tools as maybe not a default, but just like using any other writing tool like grammar check.

But it was definitely an experience. I didn't get dogpiled, but it definitely also renewed my seeing the folks on the other side of that point being like, "Whoa, can I not talk about me using AI for writing in public?" And that kind of stuff is really concerning. There are witch hunts happening in the author world where they are just trying to out these authors as using AI. And I think that is just... I think that will be a transitionary state as well. That is happening. I mean, that sort of thing is happening out there in these groups and enclaves.

Nathan Labenz: (1:11:28) Do you think... I mean, the timing, obviously, because this was kind of, I thought it was the earliest days of the writer strike, but at least it was underway at the time that you put this tweet out. Seems like that's probably the biggest variable. And then I guess, do you know, trying to understand what is really happening here. Is this... who's responding to you? Are these... because I did an episode which I sent you to try to win a little trust from you to get you comfortable to do this. In the first place, we did an episode with a couple, 3 members of the WGA and talked to them about what their AI-related concerns are and also how they're using it in their own practice. And the 3, possibly biased sample of those that were up for coming on an AI podcast, but they were not at all hostile to technology. It seemed like their thing was really just, "We don't want AI to start getting assigned certain contractually defined roles that have payment rights attached to them." And so it seems like they were pretty... if they could get that kind of agreement, they were going to be pretty content to continue to experiment with all sorts of AI tools.

Do you have a sense for to what degree the reaction that you got was driven by people who are actually writers versus just kind of general Internet vitriol? And aside from timing, you mentioned a little bit of more nuance. Maybe the thing would be to emphasize more the kind of "it starts with why" type thing. Why are you doing this? Who you are? Your own passion for writing and storytelling at the top? Is the reaction that you got something that you think really comes from writers, or does it just kind of come from general Internet culture? And second, what would you say to future AI product builders when it comes time to put things out in the public? What can they learn from your experience to try to minimize the risk of any of that kind of blowback coming their way?

James Yu: (1:13:45) I just think the high order bit is Twitter or X. There's no... I was looking at the analytics of people watching video, like basically 0%. Like 99% watched the first 5 seconds, which had basically no content. So it's mostly, I think, the mechanism of retweeting and quote tweeting that got to this place. And it's interesting because I think there's probably... it was probably on the tip of a few people retweeting with the right verbiage that got the dunkers to come in. And if they hadn't tweeted it, maybe it would have just been kind of a semi not even viral tweet. So I think the mechanisms in the structure of the social media caused this issue.

I think that a lot of them were writers, but I'd say mixed. There were probably writers and also just general Internet kind of dunking happening. But I mean, I became sort of just the poster child of like, "Oh, this is the AI bro coming to take all our writing jobs away." Nothing I could have done at that point. There's nothing you can do with that kind of deluge. You cannot reply to every tweet.

I would say, in terms of advice, maybe Twitter is not always the right conduit for these kinds of announcements. I would say also putting customers first. I think that if I were to redo it, maybe having one of our customers talk about how they use this in their workflow, to maybe humanize it a bit more to be like, "Okay, this is an actual use case. It's not like I'm just contriving it out of thin air." So I do wonder if also... I patterned it after Twitter demos for the tech people. It's like, "Okay, we just take this demo, we don't take it at face value. It's just like, okay, this is a demo of what could, how it could be done." I would say if you actually have a real showcasing of the actual real problem and if you can with the customer, I think that would be even more wise there.

Nathan Labenz: (1:15:56) Have you raised capital? Are you hiring a team? Are you guys just bootstrapping this as your own passion project? What's the story there?

James Yu: (1:16:04) We raised a small round in 2021. And we have a small team, 7 folks, including me, so 4 developers. And we're profitable ever since March. So we're sort of in this place where I want to build the team mindfully and find folks who are actually... I think 3 of the developers have English degrees. They come from that world. They're also writers. So we're looking to really just grow mindfully and just really focus on this particular problem. And now we have the right team to sort of do that, with our structure in place.

Nathan Labenz: (1:16:43) When this storm hit, it varies a lot. Your experience and what you have to do varies a lot depending on whether it's just you or you've got a team that you're representing and who's looking to you for answers. What did you do? Did you just kind of keep your head down generally and let the storm pass, or was there anything else that was useful?

James Yu: (1:17:01) It's tough. I mean, when you're dogpiled like that, the one thing you want to do is reply to everyone. But we basically kept our heads down and just really focused on our customers. Because they're the ones who are coming in. We've got so many requests, feature requests, things like that. So really, it's just building and focusing on the writers who are using us and getting our team to focus on that.

Obviously, it's a little bit hard when I'm just like, "Wow, this little rectangle is now screaming at me all the time." But I feel like I definitely became a little more resilient after that experience. So I would say that if anyone else is in that same position, it's the Internet. It's structural. It's how the incentive mechanisms in these kinds of social media apps are made. And it also made me mindful of also being a dunker. I mean, it's just like, you see all these opportunities to dunk. Just don't do that. And then, actually, in that world, there were folks who had more nuanced discussion on Twitter, which I appreciated, who I actually had conversations with that were not a screaming fest. So I appreciate those as well.

Nathan Labenz: (1:18:13) First of all, it's awesome, and congratulations that you're profitable with the company. I imagine that your actual paying customers little noticed this or barely cared. Right? I imagine it didn't have a big impact on the business?

James Yu: (1:18:29) No. I mean, yeah. I think someone posted in our community Slack about it, and a lot of people were like, "Wait. What? What's happening?" The impact on the business is that we got a lot of exposure from that tweet, and it actually exposed a lot of writers. I think in any of this vitriol on the Internet, there's always a big middle population who are not saying anything, and they're like, "Let me check this service out, and let me make up my own mind about whether I want to use it or not." So that's the thing, but they're not tweeting. They're not retweeting. So I think that's the other thing to keep in mind as a founder of a company, that you're not hearing from a lot of people. You're hearing mostly from people who are on their phone at this particular time and have a particular agenda.

Nathan Labenz: (1:19:23) Another question I have for you because of your success monetizing the app and also just this moment where you got this surge of visibility, traffic, probably new trial users, some of them paying customers, I'm sure, but probably many were just kind of like, "Let me just hit the free trial button and generate something and just see what it's all about." I think that's a challenge for a lot of AI app developers right now. Most of the ones I talk to are kind of like, "Yeah, there's a lot of interest. We have maybe had a viral moment or two or whatever or some minor ones. So we're getting people that are interested in us." That's my company, Waymark. We've had kind of, I'd say, 2 viral moments this year, not as big as yours, but definitely big by any previous standard of just tweeting about the product. We went a lot further than we had in the past.

Yet then what we've kind of seen and what has happened to a lot of others too is you have a lot of people that just come through, try the free trial. If you're using a kind of expensive model, and if you're trying to deliver a premium experience today, you probably are using a model that has somewhat nontrivial cost for each user that comes in. Then you have this kind of, "Okay, well, where do I draw the line? When do I insist that people pay so that this is economically sustainable for me or for the business?" And then you also have a lot of churn problems, or at least most of the apps that I talk to do because yours may be an exception to this because I can imagine writing might be more of a lifestyle where people are just really into it on a long-term basis.

With Waymark, we have small businesses that create videos, and a lot of times they're like, "I need one now. I don't know when I'm going to need one again. So I'll subscribe so I can do the thing, and then I'll immediately cancel because I got what I needed for now, and I can always reactivate later." So I mean, to some degree, this is timeless SaaS business problems. But I do think it's a little bit different in the world of AI, particularly because of two things. One, we're having these viral spikes everywhere right now. And two, you have this kind of not huge, but nontrivial token cost, and it creates an even more kind of high stakes question of how and when and how aggressively do I monetize. So I'd love to get your thoughts on that too, because it sounds like you've dealt with all of it. James Yu: (1:21:48) What you touched on is, yeah, it is a SaaS problem. I think that AI companies are still startups. A lot of it still just depends on distribution, the age-old thing of reducing your churn. It's not necessarily about model choice and all these things that we talk about on Twitter all the time, which are important, but there are also just foundational things like, are you pricing your service properly? Are you capturing enough value there? Are you providing enough value to those users? Are those users actually telling other users about your product? So I think that this whole panoply of startup advice and SaaS advice still runs through AI. AI is not a panacea. You don't just sprinkle that emoji sparkle thing, and suddenly your company just takes off. So I think that is a big divide between what a lot of people see with these Twitter demos, and they're like, oh, so cool. It's so awesome. Man, users are going to eat that up. But the gulf between taking that demo to make it a profitable business is immense. And I think it really still comes down to, do you know your customer well? I think that's the biggest thing. Do you also know that domain well or get advice or become an expert in that to become an indispensable tool that is used often? I think that's what you're talking about under the churn problem, because I think a lot of these AI ideas are not viable businesses. Maybe they are great as a demo, but if it's not repeatable customers who have a hair on fire problem, you still run into the problem, and as you say, it can be even worse because the consumption model of these AI services, the margins are worse. Because imagine if your database cost 30 cents for every row that you save. And you're doing it every single time with every update. That's what we're talking about. So that's the other unique challenge of AI: how do you tune your context windows, how many free inferences you give to trial users. So you have to pay a lot more attention to that than a regular SaaS would pay attention to their database costs, because it scales more easily per customer. And so we do a lot of that. We do a lot of tuning. But one thing that we have stuck to is that we've taken more of an Apple-like approach, which is, hey, we're going to give you the best models. We're using the best top-of-the-line models, latest state-of-the-art, but you're going to have to pay a little bit more for that. We've stuck to that. So I think that's one philosophy that you need to figure out, whether it is that you're going to use state-of-the-art models, or are you going to be like, okay, I want to give you a lot of volume, or maybe bring your own key, or doing other things like that. I think you need to be very crisp about that, because otherwise you just do random things here and there. I think you should, top-down, figure out that philosophy, because that will determine your audience, and you want to be mindful about choosing the audience. Are you going after prosumers or more professional folks who are willing to pay, or are you going for more consumer types of experiences where, man, you're going to have to have a lot, a lot of consumers to make that work? I think for us, we're a little bit in the middle, but we skew more towards the prosumer professionals who are working writers. They're working day in, day out, using our tool on a daily basis. Personally, I find that to be an easier space to be in, because then you can actually charge for value that you're actually providing. It's still not easy, even with AI. And a lot of times, it's funny because I'm working on some cool new AI futuristic feature, but then other times I'm like, man, I just need to make our flow for the pricing better. This is not rocket science, but you do need to work at those kinds of things that are just normal startup things.

Nathan Labenz: (1:26:05) Here's a speculative question. I've been starting to circulate this amongst some AI app developers because of this high token cost. For us at Waymark, somebody comes through, first thing they do is they set up a small business profile, and then once they have that set up, they can generate videos that use that profile plus their runtime instructions. Setting up the profile takes a certain amount of fixed cost, if you will, of the AI. Actually, a lot of it is on the image understanding side. And we actually should ask you too about possible multimodal extensions of your product in the future. But we have that one fixed cost. And then as you generate each video, there's some AI model cost with that too. We figure it's something like 15 cents per random user that comes and does the free trial. And then we're like, yeah, at our retail price, whatever, if it's $30, then we need one in 200 to subscribe to pay for all the usage that we have for the other 199. And obviously, everything's connected by springs to each other, it seems. But I do think that sucks. So I've been dreaming of this idea of the AI bundle, which is another angle on it. One big challenge, of course, is most people don't have a token. They've used ChatGPT. You don't get a token from ChatGPT. You've got to be on the platform to do that. So right off the bat, if you're asking people to bring their own token, most of the time, most writers, most small business owners are just going to be like, don't have one. What's that? So there's a couple different ways you could orchestrate something like this. But I'm thinking about this notion of the AI bundle where maybe, taking inspiration from a cable bundle, you could set something up where the average consumer gets to say, okay, I'm going to subscribe to tier A, tier B, or tier C, whatever. Maybe for $100 a month, I get some baseline access to 1,000 apps. And then you, as an app developer, if you participate in the bundle—and again, this could obviously get complicated—but let's just say for a simple starter, you get 10 cents as one in 1,000 apps that constitute this $100 bundle. And then what you have to do is maybe give all those users that come in for that free trial, you have to give them your base plan, which maybe you have a 30,000-word initial starter plan. And obviously, again, all this could be adjusted. But I wonder, would you be interested in participating in something like that if somebody could say, hey, I'm going to go sell a million people on a bundle? That would translate into basically 10 cents a month. That basically translates into a million dollars annual revenue for the participating companies. You could still upsell them to your premium plan or whatever once they get beyond the bundle limits. But you would serve whoever comes in that's a bundle subscriber, and you would just get more consistent revenue. That way, you can welcome those people hopefully with more open arms and more access. They can get more value without hitting super early limits. You have to worry less about churn from these pop in, pop out users. Probably also have to worry less about managing these viral moments, which can be good and bad and a weird mix where it should be all good, but in some ways, it's not necessarily always all good. Do you think you'd be interested in participating in that kind of bundle as an app developer?

James Yu: (1:29:52) That sounds super interesting. Obviously, depends on details, what is the share of payouts and how does that work? And maybe this is actually just a cable network. It was a cable inference cable.

Nathan Labenz: (1:30:06) Yeah, because people love their cable bundle so much, you know, I'm trying to—

James Yu: (1:30:11) No more streaming, right? Go back to the cable. Because it's kind of like the streaming model today in AI. Yeah. I think that's interesting. I think I would have questions of, well, it might be more diffused, right? The people who are using it may not be fiction writers, right, obviously. But I guess it depends. I guess you need to get to a certain size to make that work out. Yeah. What I mean, you need to customize. How would you actually implement that? Are you, man-in-the-middle-ing tokens that you manage? And this bundle would manage some tokens? And maybe you negotiate deals with OpenAI. I mean, that could also be interesting. Maybe that's how you could transfer, get those discounts over to customers. Yeah, I mean, that's something—yeah, it'd be interesting to chat about that. The other thing I've been thinking about is, what if—I mean, OpenAI and Claude made their own login system, right? Login with OpenAI. Oh, no, you don't have to log in. Oh, you like Claude? Login with Claude to the AI application. Suddenly, your intelligence, your AI, your favorite AI, goes with you anywhere. I think there's a coordination cost. I'm not sure why OpenAI would be incentivized to do that. I don't think they really care about making it cross-compatible, but I think that's better for the consumer. I would love to have that, to be like, okay, any app, I just log in with my inference provider and have that be the entry point to the intelligence. And maybe it even has all my prompts. It has all my settings in there. I think that's also an interesting idea, but I'm not sure how you overcome coordination costs there.

Nathan Labenz: (1:31:54) It is tough and unclear who would be able to pull it off. I mean, the most likely candidate to pull that off would probably be OpenAI. And I want the "login with OpenAI" or "login with Anthropic" or "Claude" or whatever as well. That's part of how this could work would be you log in to validate—I mean, that's how the cable bundles work. When you're online and you're going to use some ESPN app or whatever, you log in with the cable bundle login. So something like that would probably make a lot of sense here. I think the idea too of bringing personalization with you, I'm with that also. But I think I'm going to do a whole show on this and flush out this vision more broadly. But they have the ChatGPT Enterprise, which connects to your stuff. And they're beginning to power vector database search where you just connect your Google Drive or connect your Dropbox or whatever. And they ingest all that stuff. They chunk it. They index it. They embed it. And now they also do all the querying. So it's all this vertically integrated experience is coming to retrieval augmentation. Even that could be extended to the individual level and could bring a lot of value to this login experience. If you could log in to bring your identity and bring your access, you could bring your system prompts. You could bring your prompt library. You could even bring your whole vector database profile that could come with you anywhere. I think that is definitely super interesting. The vision for that is a little bit foggy. But that's why, I'm probably not the one to do it. But maybe I'm the one to talk about it until somebody at one of the bigger players actually decides to implement it.

James Yu: (1:33:44) I mean, it feels like the hope of OpenAI plugins was sort of that, but it feels like they've pumped the brakes on that. I don't hear much about plugins being talked about. But they said they have the fortitude to be able to do vector search on your company's private or your app's private data and then bringing that into ChatGPT.

Nathan Labenz: (1:34:03) The big quote from Sam Altman was a lot of people thought they wanted their app in ChatGPT, but what they really wanted was ChatGPT in their app. And it seems like, in my experience developing stuff, the plugin paradigm and the function calling capability that the newest models have is basically two sides of the same coin. When you create a plugin, you're basically declaring, here are the calls that you can make to me. And when you're using the function calling, you're basically saying, here are the functions you have available. So they present slightly differently. As of now, you have to coerce them from one into the other format. But broadly, it seems like that's converging toward, regardless of the interface, there is a way where the model can be made aware that it has these tools or these functions or, affordances is a great word too, available. And it's getting good at knowing how to take advantage of those. And so then that distinction probably starts to gradually fade away. Whether I want to—we'll see. ChatGPT could get really good. But at least for now, there's enough value in all these other apps. I don't really want a paragraph summary of flight options that are available. I want to not have to deal with the advanced search and sort thing on Kayak. I just want to be able to say what I want, but then still see that in the richer format and have all those controls if I do want to dive into them. So it does seem like bringing the AI to the app is more of a thing. But again, that speaks to some need to grease those interactions because right now, it is tough if folks like you and me are in this spot where we're like, well, we can give you—how many tokens can we afford to give you before this just becomes unsustainable? And if they could take that problem off the table and be like, yeah, now you don't really have to worry about that. You can have everybody use your app, not necessarily for free, but freely, and then they can fall in love with it, and then they can pay for your higher-tier plans. I think they could do just a tremendous amount of good for the ecosystem that surrounds the core technology.

James Yu: (1:36:27) Yeah. As a consumer, you really want—the first few inferences are not going to be representative of your experience with this. You do need more time, I think, with an AI app.

Nathan Labenz: (1:36:36) Yeah. Totally. It sucks to hit that barrier after two clicks, because the one out of two that you mentioned earlier is probably tough to achieve in your first two. And at Waymark too, we have this problem. We actually have a pretty—our thing is very well-defined. And I think we do have one of the highest success rates of any app that I've seen out there. But still, it does kind of suck because we put people in a position very quickly where it's like, you either have to buy if you want to really do this, or you've hit the end of the experience. And I just wish we could open that up a little bit more without having to just eat all the cost and make it a tough thing to sustain. Anyway, again, thank you for coming down this brief detour rabbit hole with me. Anything else that you want to talk about today that we didn't—oh, multimodal was my one other thing. That's obviously a super hot topic. We got DALL-E 3 out. We got image understanding coming. Do you envision extending your product to—obviously, you have interest in creating visual assets given what we talked about with the birthday party—but is this something that you think Sudowrite will do as well?

James Yu: (1:37:46) We probably won't do it ourselves. Or if we—we do have a small visualize button, which just uses DALL-E on the back end. But we really remain focused on the text and the story outline, editing, and story understanding. However, one place where we could possibly expand is that maybe Story Engine becomes sort of a back end to other apps that want to be able to create legible stories. If we have a lot of folks who already have their stories in our particular format, maybe with a Midjourney front end, it's like you can create storyboards. But that's really far down in the future. I think there's a lot of thorny problems as it is, even just with text, that are perhaps even more thorny in some ways than images. But yeah, we remain focused on that. But it's interesting to see OpenAI. It feels like they're moving into a world where it's everything to everything, which I think is smart. And I wonder how that will—the implications of that. I imagine for accessibility, that's really cool. Because imagine in the future, if you have a 360 camera in the middle of a meeting room, AI just recognizes sticky notes that you're putting on the board, the brainstorm session. There are some interesting applications there with storytelling too, like having a writer in a writer's room, having an Alexa-type device that can see everything, hear everything, and then make suggestions that are highly relevant to the context at hand. I think there are some opportunities that will unlock. I'm excited to hear what OpenAI will announce at their developer conference. Maybe that will unlock some things there. Another tangential thing is some ideas I've had around, what if—because a lot of people dictate their stories now. So what if you could just walk around San Francisco with your earbuds in and just talk for two hours. At the end of it, you come home and you have a whole outline that's already pre-assembled and all that from your rambles. So those are the more types of multimodal things with this at the service of how do we get great stories out of people that we're thinking about.

Nathan Labenz: (1:40:00) Any apps you would recommend from your many travels that you think people are maybe sleeping on? Or, you know, I used to ask this question to everybody, and then everybody would just say ChatGPT, and so I kind of—

James Yu: (1:40:13) What's wrong? ChatGPT is 80%—it's good for everything. I mean, this is probably not unknown. Cursor—I mean, I as a programmer, I've been using that nonstop. Just because they can rewrite and just highlighting is just so powerful in terms of being able to rewrite functions and things like that. That goes beyond what Copilot can do. Interesting that Copilot had some sort of lead and had this little headway, but it feels like Copilot will catch up as well. I think Copilot X is the thing that they're building. But I've been loving that. Yeah. Cursor, that's my main other AI product that I've been using.

Nathan Labenz: (1:40:49) On the image side, do you have any favorites? What did you do anything special or non-obvious for the Zelda birthday party?

James Yu: (1:40:56) Oh, yeah. I mean, I used a combination of Midjourney and Pika Labs for just adding some motion. I found that to be pretty effective. But also just using Claude as well to just get—flesh out some prompts along the way. Oh, ChatGPT is really great at making scavenger hunts. Themed scavenger hunts that are relevant to a normal house, man, it can generate 100 different clues that are Zelda-themed. And it's like, now go to the fairy fountain—and there's something referring to something in the bathroom. Wow, this is so good. And these are kind of bespoke things, right? But really, the combination of these tools. For the AI voice stuff, I just found a random off-the-shelf website that was doing something—it had to be a Zelda voice. I don't remember the name of it. But really just exploring those tools. It was fun to explore those tools.

Nathan Labenz: (1:41:54) Thank you very much, James Yu, founder of Sudowrite, for being part of the Cognitive Revolution.

James Yu: (1:42:00) Thanks, Nathan.

Nathan Labenz: (1:42:01) It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcr@turpentine.co, or you can DM me on the social media platform of your choice.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.