OpenAI Fine-Tuning Update, Acceleration Debate, and Bundling AI

Watch Episode Here

Video Description

Nathan and Erik chat OpenAI GPT-3.5 fine tuning updates, using GPT 4 outputs to fine-tune 3.5, when to accelerate, and AI bundles. If you're looking for an ERP platform, check out our sponsor, NetSuite: http://netsuite.com/cognitive

SPONSORS: NetSuite | Omneky

NetSuite has 25 years of providing financial software for all your business needs. More than 36,000 businesses have already upgraded to NetSuite by Oracle, gaining visibility and control over their financials, inventory, HR, eCommerce, and more. If you're looking for an ERP platform ✅ head to NetSuite: http://netsuite.com/cognitive and download your own customized KPI checklist.

Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.

X/Social:
@labenz (Nathan)
@eriktorenberg (Erik)
@CogRev_podcast

TIMESTAMPS:
(00:00) - Intro and GPT-3.5 fine tuning update
(05:50) - Using GPT-4 to generate training data for GPT-3.5
(09:54) - Potential for training models on synthetic/cleaned data
(10:24) - Using chain of thought prompts to improve model performance
(15:02) Sponsors: Netsuite | Omneky
(19:39) - Accelerating applications vs new AI models
(20:40) - What to accelerate vs what to slow down
(26:07) - AI as a co-pilot vs fully automating tasks
(30:12) - When to delegate to AI
(36:40)- Displacement of human roles by AI systems
(40:39) - Does training on synthetic data solve a problem?
(45:10) - The idea of an AI bundle/subscription
(50:28) - Bundling in other industries like cable and SaaS
(54:27) - Churn and retention challenges for AI apps
(01:02:54) - Low retention for easy-to-use AI apps
(01:03:57) - Incentives for AI companies to join a bundle
(01:04:39) - Potential for collaboration between AI companies
(01:12:01) - Leading AI firms creating separate bundles
(01:16:43) - Outro

#openai

Full Transcript

Transcript

Erik Torenberg: 0:00 Somebody, a friend of a friend, reached out to me with a legal question. I'm like, I'm really not qualified to answer this, but I am qualified to put it into ChatGPT.

Nathan Labenz: 0:11 If they were one company, they could have avoided this whole mess. And I spoke to Emil Michael, the chief business officer at Uber, and he was like, yeah, I tried to get us to acquire them, to merge, because they just didn't make sense. But Travis said we got to kill them. And Lyft was like, we hate Uber. So businesses just are irrational at the end of the day. And the fact that they're funded by VCs who all have take over the world visions, it's challenging for people to be in a spot of like, hey, you play this niche, or hey, you collaborate with this competitor.

Nathan Labenz: 0:45 Hello, and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas, and together, we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my cohost, Erik Torenberg. Let's kick it off with a GPT-3.5 Turbo fine tuning update.

Nathan Labenz: 1:14 Let's do it.

Erik Torenberg: 1:15 It's been just a couple weeks since the last episode where we covered it in the immediate wake of the release. And it is interesting. I had, as I mentioned then, had explore fine tuning Llama 2 on my to do list for a little while, and then kind of said, yeah, I'd probably scratch that off and do the 3.5 fine tuning instead. And they do make it super easy. I probably at this point in time wouldn't have still gotten around to doing the Llama 2 fine tuning, not because it wouldn't have been interesting, even just if you do it and you get it to work, then you still have the inference problems that we talked about, where it's like, okay, how am I going to host this? And what kind of load can it handle? And do I auto scale up or down? Who's got a good solution for that? There are solutions coming, but they're not mature. In contrast, OpenAI just makes it so easy. If you've used the previous version of their fine tuning, it's very similar. This time around, they have both the prompt and completion format, which is the old format, and now the primary one is the chat format. So you get to set up your system message, who you want the AI to be, what its job is. And then from there, it's a back and forth between user and assistant. You can set up a couple of examples of the task that you want it to do. And next thing you know, you're off to the races fine tuning. What has been really interesting though, is using GPT-4 to create the dataset for 3.5 fine tuning. And we came to that in a couple of steps. But again, I'm doing all this for Waymark, right? So my goal here is we already have a product in market. Works really, relative to anything in the not too distant past, amazingly well. You can just tell it what kind of video you want. It will make you that video. Next thing you know, you're watching that video, complete with a script, images from your business loaded in to complement the narrative, and even a voiceover layered on top of that. And that all happens in 20 to 30 seconds. So pretty cool. But obviously, there is still room for improvement in all of these things. Our scripts are often not quite as good as we'd like them to be. And sometimes we see wonky images chosen where we're like, oh, God, we really rather wish you had picked something else. And sometimes the voiceover, it's usually well written and apt, but sometimes it doesn't quite sync up just right with the timing of the scenes. And so there's obvious opportunity for improvement. So it's like, okay, we got a new model. And per what OpenAI said, as soon as you fine tune it, it's available immediately for use and you have the same rate limits as you have with the normal models, which are high, meaning I don't have to worry about scalability at all. I don't have to worry about any hosting complexity, assuming that's true. Spoiler, seems to be pretty true. So I'm like, yeah. Now, if I can just make the model better, I don't even have to deal with any of these other problems. I can just hand something right over to our development team that could be a drop in replacement. So what you had before, just switch this model out for this new model ID, everything gets better. And that could happen with almost no work from the rest of the development team. So that's a huge attractor to, okay, let's do this now instead of having Llama 2 on our to do list and maybe getting to it when we have time. So first round we did was the same thing we've done before. We just took inputs and outputs. Here's the setup. And our setup is typically here's the structure of the video. Here's a profile of the business that also can start to include what are the images available. We represent those as text just as image captions. Starting to use some aesthetic evaluation as well so we can bracket like these are the super high quality images. These are the medium ones. These are the low ones. Sometimes all they have are low ones. So we got to use what we can. And then here's the user's runtime instruction. Typically, these are small businesses, so they might be saying, I'm opening a new location or I've got a sale this weekend or I'm hiring for a particular role or whatever. They've got all sorts of different things. And it's a long tail, as you might imagine, with all these different idiosyncratic local businesses. So we tried the prompts and completions and not that great. Not that great. Why is it not that great? Just not as good. Didn't seem quite as good. I don't know. It just wasn't quite there. So I took a walk and I was thinking, all right, GPT-4, by the way, does this task pretty well with a couple of examples. Do we need just more data? We already have a decent amount of data. I mean, I don't know. Our dataset is reasonably small. We've used anywhere between 100 and 1,000 scripts for most of our fine tuning processes. So that's not big data, but that's always worked for the last however many rounds. So GPT-4 can do it. It seems like, okay, we could get more data from GPT-4, but is that really going to move the needle all that much? And then what I landed on, just to get away from the keyboard a little bit and really think, okay, what would really help here? And what have I seen in the research that people seem to do that was pretty effective? Came up with the idea of let's train 3.5, not just on GPT-4's output, but actually its reasoning that leads up to that output. So we moved from instead of just asking GPT-4 to do the task, which by the way, it can do well just straight away, with a couple of examples anyway, it can just do it. We then moved to asking it to first analyze the task, explain its reasoning, classic step by step chain of thought. We're still in the process of refining that. Exactly how do we want it to reason about this? But immediately, even though GPT-4 didn't necessarily obviously do any better with the chain of thought, so in some sense, it kind of seems like a waste, we were able to generate a still modest sized synthetic dataset using the chain of thought approach on GPT-4, then take that output, go over to 3.5, and run the fine tuning now where the dataset is not just inputs and result, but inputs and then analysis, reasoning, breakdown, coming up with a strategy, and then the result. And that works a lot better. And I think this really suggests that this loop is going to become super common, where basically if you have, and we can talk about the cost and time savings on this too because both are substantial, but if you have a prompt that works with GPT-4, even if it's a lot of tokens, in our case, we're starting to get close to maxing out the 8,000 token context window with just two examples. So we got boilerplate instructions that adds up to like 1,000 words by the time you say everything you want to say. Then a couple of examples, and those are each, let's say, 2,000 tokens, and we'd leave 3,000 tokens at the end. And you're starting to max that out. That is cost wise, like 30, 35 cents per. So it starts to be a little bit prohibitive for production. And it's also slow. I mean, typically over a minute, sometimes over 90 seconds, typically in that 60 to 90 second range. So you're waiting a long time and you're spending a lot on it, but the result is good. If you are in that position, what this really suggests is that you can add that chain of thought element to the GPT-4 approach, even if it's not obviously adding anything in terms of GPT-4's quality of output. And then when you go to fine tune on the 3.5, notably, we're also cutting out those examples. So our token count drops from the, you know, up to 8,000. The limit on the 3.5 is 4,000, although they do have 16,000 coming. But the limit is 4,000 there. So we cut the two examples. Now we just have the instructions, the inputs, and the outputs are just the one case that we're actually concerned with. And it basically learns to do that reasoning. Maybe not quite as well as GPT-4, but very well. I think this implies a lot about where things are going. We've already seen a huge trend toward using synthetic data in training for all sorts of reasons. Our last episode with Zico Kolter and Andy Zhao talked about that a decent amount where it's like, the whole problem of jailbreaks stems from the fact that the model is trained on all this crazy content that includes all sorts of toxic and hateful and anything you might imagine. And so we have to try to paper over that. What he suggested was, well, why don't we start trying to train our models without all that shit, or at least with a lot less of it? Then maybe that problem would be much less severe. We wouldn't have to paper over as much stuff if we could just not have the models be exposed to all that internet sludge in the first place. People are starting to work on that. And this just goes to show how easy that is starting to be, at least for narrow use cases. Now I'm still building, of course, on all the pre-training and all the, we're very much on the shoulders of giants here. But I can take GPT-4 with just two good examples, have it generate 100 examples with chain of thought. Now I can go move that to 3.5, get it to work with half or fewer of the tokens. The cost of that is also like a third. So you're in the ballpark, probably not quite, but in the ballpark of a 90% cost reduction. Latency is also much better. It's often under 10 seconds. Sometimes it seems like it varies by just probably the load on OpenAI systems. But under 10 seconds, often up to into the twenties sometimes if it's more slow. But basically, the slowest ones with 3.5 are about at the same level as the very fastest ones from the GPT-4. And this whole thing can be run pretty quickly. Another really interesting data point on this, just from our experience over the last couple of weeks of really getting into it, is we have these datasets that we've created of what good videos look like. But here, we didn't have the chain of thought because we never needed it. Right? There was never any time or place where anybody was like, I'm gonna sit down and write three paragraphs about how I'm making this video. Just, you know, just do it. Right? So that all just is internal to the humans' heads who made those videos. So we didn't have that, but I wanted to run the fine tuning on it. So I had to have GPT-4 create it. Now I could have taken the existing videos and generated the chain of thought for that, and we might still do that. But we found that just asking GPT-4 to just do the task with chain of thought straight away was to at least, you know, first approximation roughly as good as the human work that we were doing anyway. So instead of replumbing everything and generating just the chain of thought, I just said, well, GPT-4, just do the chain of thought and do the video. Just do it all. And then we'll just train on that. And I don't think that's probably where we'll end up because we do have a dataset that we probably trust a little bit more, and I think we can refine still further from there. But if you don't have a dataset, if you just have a prompt that works, then this cycle of use that prompt to generate a dataset with how you're going about it, not just what you do, but how you're planning and thinking about it, and then the what you do. Port that over to 3.5, run the fine tune. The fine tune typically takes minutes. The model is indeed available to use typically in seconds after. I've seen a few errors where the model gets done and I go ping it immediately and it's like, sorry, not ready yet. Typically, the next time I ping it, it is ready and it's ready to go. And then you've got just the full scalability and responsiveness of 3.5. It is, I think, going to be a hit product for them if it's not already and definitely suggests there's so much opportunity. Now that we have language models that are good enough to generate a lot of this data, training on synthetic data is just going to be such an obvious win in so many cases. Data provenance issues, too, is another one. OpenAI probably is not going to get away with saying, well, we just use synthetic data from GPT-4, because then people are going to be, well, how'd you train GPT-4? So there's that regress that they may have to defend. But in another context, at Athena, which we've talked about many times, there's this question of like, well, geez, what are we going to train on? Forget about what OpenAI is going to train on. We're using the API. They're not going to train on our data. But what if we want to train on our data? What if we have clients who have somewhat sensitive data? What if their executive assistants are using somewhat sensitive data or just personal information in prompts, and now that becomes part of our dataset. Would we train on that?

Nathan Labenz: 15:01 Hey, we'll continue our interview in a moment after a word from our sponsors. Nathan Labenz: 15:01 Hey, we'll continue our interview in a moment after a word from our sponsors.

Erik Torenberg: 15:05 You probably want to take some steps to make sure that information is cleaned up or anonymized somehow. Sometimes they use the term de-identified. But are we really experts in that? No. And we don't really want to get caught making a huge mistake. So based on this experience, I'm kind of like, maybe we can just make the policy now that we just won't train on client data. And instead we'll just train on synthetic data that maybe looks like client data or is inspired by client data, but never actually having to use the client data in the training process at all. So I think those are my big updates. Let me know what questions you have. But bottom line, 3.5 fine-tuning is a great product experience. It's really easy to use. It runs quickly. The models are very scalable once you have them. Biggest insight from the process is training on chain of thought can get you really good performance, much better than you get just by training on the work itself. And the synthetic data approach, generating from GPT-4 or whatever, and then feeding that into the smaller model really works. Give a shout-out too to former guest and friend of the show, Human Loop, because I'm using that for this process. And I think they've done a really nice job of anticipating what this loop is going to look like and building the tools to make it pretty easy to do. As of 2 weeks ago, I was thinking, alright, how am I going to code myself a loop here to do this? And then I got back into Human Loop, which I've been using throughout, but I hadn't used every last feature until recently. And as I looked into that more, I was like, yeah, these guys have definitely done a really nice job of anticipating what kinds of loops people are going to want to create. You can go in there, track every little aspect of everything you've done, make corrections. If you have something that's like, okay, this was wrong, I want to fix that or modify how it's reasoning through this, and then use that in the next batch of training. You can have what the model did, your correction on top of it, and then just by layering on those little corrections, again, just improve performance so much. So I think all these apps are going to start to get pretty good. I would say ours at Waymark is already among the best in terms of just reliably doing a pretty good job and giving you something that you would have a decent chance of actually wanting to use. A lot of room for improvement, and I think we will see a lot of that improvement come online over the next few weeks to a month as we do another 10 iterations probably on the fine-tuned model that powers it. And it's exciting. This is a great instance too of just what do we want to accelerate and what do we want to slow down? I'm all for accelerating this kind of stuff, making 3.5 work well and scalably and responsibly for users and just make all these apps in our daily lives so much better. That is to be accelerated in my mind. And notably, I don't think we really even necessarily need GPT-5 to do our task. You want to do medical work, you want to get into science, you want to think about advanced cybersecurity type stuff, GPT-4, you can hit its limits. But for the kind of script writing we're doing, I'm not even sure that there's that much more improvement left beyond what we can get from the GPT-4 based system. So very cool. Definitely a fun learning experience, and that's what I've got to report.

Nathan Labenz: 19:16 There's a few questions that stem from that, but just to close the last loop. So in your dream world, what we would accelerate is the applications that we've just been talking about, but what we would slow down is new versions, GPT-5 basically. I guess, how would you delineate? What's your policy for what you want to slow down?

Erik Torenberg: 19:32 The leading model developers have, I think, put a pretty good position out, which is that they think systems more powerful than GPT-4 deserve special scrutiny. And they've committed to a few different ways of doing that, including having third party independent auditors or red teams look at them and try to assess the risks. And the talk of licensing also, I think, often willfully or not, but often it seems sometimes willfully mischaracterized as like, oh, you're not going to be able to do anything on your laptop anymore because Big Brother's going to be watching you. And they have been very clear that they want that to apply to systems more powerful than GPT-4 as of now. And it's presumably a sliding scale. If we get to GPT-5 and that's safe, then maybe that gets edited to be systems more powerful than GPT-5. But I do think some of this stuff with the synthetic data, I just think we need a little bit of time to develop some better techniques before just more raw scaling. And it's possible that within some of the leading labs, they have developed some of these techniques and we just don't necessarily know about them yet. But cleaning the data is just one obvious thing that we can now do pretty well and pretty efficiently with the language models that we do have to try to create datasets that are just not so problematic in the first place. The biggest risk that people tend to identify is pandemic, that a language model and Anthropic had a Senate testimony thing about this. They've worked with some leading biosecurity people and they've said, yeah, it's not quite there yet. And they don't even like to talk about the details of the work, but they have some extremely credible people like MIT professors involved with it. And they basically say, there's a number of things that you would need to do to create a new pathogen. And language model can't do all those things, but it can do some of them and it can help you figure out what those steps might be. The problem right now is that information is in there because it's just been trained on everything. And you try to RLHF it to not happen, but then you got things like the universal jailbreak that show, especially if it's open sourced, but even if not, but definitely if it's open sourced, you can get that out. And you can have a cascade of things where somebody might just be like a free speech absolutist and say, well, I'm going to figure out a way to take off this refusal behavior and somebody else comes along. It's not just like one person's going to do all these steps, but you do see this proliferation, loss of control. And if the model is capable enough and the information is in there, if you are open sourcing something, it's out there. So I think up to around GPT-4, we can be reasonably confident it's not going to design a pathogen. GPT-5 very well might have pathogen designing capabilities. And I would much rather we get to a place where we know what the dataset is and know that it doesn't have exposure to certain kinds of super sensitive stuff before we build it. And I think it is increasingly you can see a path to that. Are we going to take that path or not? It does seem like OpenAI seems to want to. Anthropic is definitely all about that. Google, DeepMind, I don't really know. But they've been leaders in science with things like AlphaFold and many others. So clearly they have awareness. Meta's probably the biggest wildcard there right now where the scuttlebutt is that they are training a GPT-4, going to open source it, don't care what anybody thinks. We'll learn a lot if they do that about, did they take any extra precautions? Did they filter the dataset? If they are going to go down that path, I would love to see some additional foundational thought being put into this. It's not just like, hey, maybe we can paper it over, but maybe in fact, we can find a way to not have this be part of the model's capabilities in the first place. And that would not compromise anybody's utility really at all. I mean, there's maybe a few biologists that would get a little less value from something that didn't have access to certain information. And maybe you could have, if we're operating in a world where there are going to be rules, maybe there could be rules around bio-aware language models versus non bio-aware language models and maybe similar for some other domains too. But we just need some time to figure that stuff out. So yeah, that's what I would like to see us take a breath on. Meanwhile, we can have all the cool apps we want and increasingly affordably and responsibly with the 3.5 fine-tuning loop.

Nathan Labenz: 24:48 Well said. Another thing that came up for me when you were talking is this idea, this question going around now of, like, where is it going to be copilot for X versus where is it going to be just doing X directly, replacing X, taking the human out of the loop? Obviously, in regulated industries, you're going to need the human in the loop, but what are the spaces where it's copilot for X versus X directly and on what timeframe?

Nathan Labenz: 24:48 Well said. Another thing that came up for me when you were talking is this idea, this sort of question going around now of where is it going to be copilot for x versus where is it going to be just doing x directly, replacing x, taking the human out of the loop? Obviously, in regulated industries, you're going to need the human in the loop, but outside of that, where are the spaces where it's copilot for x versus x directly and on what timeframe?

Erik Torenberg: 25:14 Yeah. Great question. I currently divide the modes of using AI into 2, and I think there's a third one coming. One being Copilot. And I describe that as you as the human are the primary agent going about your business and the AI is there to assist you, answer your questions, do the menial tasks, increasingly can do pretty significant chunks of work. I've been coding quite a bit over the last couple of weeks as I've been getting into this whole fine tuning loop. And I basically don't write any code by hand anymore, almost at all. I pretty much always take some existing code, whether that's something that I previously wrote or something from some documentation somewhere, go to GPT-4 with it and say, here's what I have and here's what I want. And I just try to force myself and it helps force you to do it because it rewards it. But to really just try to think through what is it that I really want? But that's still copilot mode because I'm basically doing that one bit of code at a time. It's able to be more and more helpful, but I'm still going to it with, okay, next I want to do this class and I want to base it on this one and I want to borrow the caching pattern from over here. And your job is to come up with a new thing. It's remarkably good at that. I would say, last time we talked about coding, I said that's where I use it the most. But just over these last couple of weeks, getting into it more intensively again for a sprint, I would say it is safely a multiple X speed up on my ability to get projects done. Just much less likely to get stuck on something I don't know, much less likely to have some really stupid typo or whatever, kind of unobvious mistake confuse me for a while. Because the AI just doesn't make those kinds of mistakes very much. Not to maybe say never, but not very much. So I'd say it's a multiple speed up, but we're still in copilot mode there. The other mode that I talk about is delegation mode. I borrowed that word from Athena because their whole thing is about delegation and the transformative power of delegation. And in delegation, I kind of think the core difference is you're trying to get the AI to do the work at a level where you have enough trust in it that you at least don't have to review every single thing it does. You're probably still going to have some sort or form of review in most cases. Whereas I'm coding in Copilot mode, watching it write the code. I then typically go run the code immediately and I may find an issue. I may come back with a bug. If I do come back with a bug, I literally just copy and paste the bug message right in and say, hey, I hit this issue. Often it can help. Often it helps pretty much immediately, not always, but it takes, again, just tons and tons of time out. But I'm driving. Whereas in delegation mode, it's like, okay, I've used this example a lot of times, like good news, bad news, good news. We got 1,000 applications for this new job. Bad news, nobody has time to read them all. Good news, I can create a prompt that I can validate on probably the first 10 that takes me about the time it might take me to read through 50 to 100. And then I can achieve 90% time savings, definitely not to make the hiring decisions, but at least to separate the bottom half from the top half or maybe the top 20% from the bottom 80%. And the real key there where I would say you're getting into delegation mode is where you're confident enough that that bottom 80% or whatever is something that you can, in fact, be comfortable letting go of without actually reading those. Then you might just look at the top 20% or the top 10 or whatever. So you're still going to have typically a human in the loop if it's anything important, but you can save a ton of time. And, again, my key distinction for delegation mode is you aren't planning to review all the work. So you need to get to some standard, and satisfy yourself that the work is good enough that you don't have to review every bit of work. Rachel Woods has a good 3 tier hierarchy for this. She's got these tiers where it's work that just needs to get done, good enough work, and then great work. And she's like, work that just needs to get done, that's the first thing you want to delegate to the AI and put into some sort of automation. Work that's just good enough could be borderline, but often you can get there. Great work, insights, strategy. You're not going to delegate that stuff. So at most you're going to be in copilot mode. Maybe you get some ideas from AI along the way, but you're going to be the one that's judging whether those ideas are good or not. What is the standard? Do we need we've talked about this in the past too, right? Do we need breakthrough insights or do we need reliable, consistent execution of a given standard or a given rubric or a given protocol against some fairly predictable data. Those are the 2 modes as they exist today. And what's coming soon perhaps is the rise of agents, which kind of sit in between. I think of those as the bridge between the copilot mode where you're driving and the delegation mode. And the dream is you can delegate something in real time and it will be reliable enough to do it for you without you having to supervise every step. But you also hopefully don't have to spend a ton of time creating scaffolding or creating a rubric or validating in the way that you do pretty much have to do today if you want to get something set up for a delegation style automation. So all that was in your question of, okay, Copilot for what versus doing what? It's an incomplete account too, because when I think about myself, I'm like, it's also a huge question of who the user is. I've had instances in the last couple of weeks where somebody, a friend of a friend reached out to me with a legal question. And I'm like, I'm really not qualified to answer this, but I am qualified to put it into ChatGPT. So it's a weird in between case in that one, right? Where I'm like, I never called the lawyer that I might have otherwise called because I have a pretty good sense for what GPT-4 can and can't do. So pretty sure it would be solid on this one. And I got good enough information that for my purposes, I was able to move forward and basically tell this friend of a friend that I think they need a new lawyer. What is that? That's not Copilot for lawyers, that's Copilot for me, but it is perhaps replacing a call to a lawyer or, the more maybe optimistic take and possibly true take, I don't know, is maybe I just never would have done that. I would have just told this person, sorry. I can't help you at all. And they would have had to go figure something else out. So maybe it's just purely expanding the pie and no lawyers were harmed in this use of ChatGPT, but definitely there are plenty of situations. I've done a number of contract reviews recently to just get whatever, somebody sends me an independent contractor agreement. Hey, Claude, GPT-4, does this look standard? Anything of concern here? If both of them say there's nothing of concern, I'll sign it. Because I've seen enough to know that they'll flag stuff. And usually they flag something. They'll flag something anyway. So the things that they flag, if they're reasonably consistent and they seem fairly normal and they don't seem to be of particular concern. Like, again, I don't need to probably in the past would have read it myself. So again, it's more Copilot for me than copilot for a lawyer, but it does sort of substitute. I don't know, man, it's everything everywhere all at once. To deny displacement at this point is definitely head in the sand. It is clear that there are calls to experts, lawyers, doctors, whatever, that would be made and would incur billable time that are just not made because you can get what you need directly from the AI. That's not enough yet to say that those jobs don't need to exist, certainly far from it for some of the most sensitive jobs that are out there. But to say that there's not displacement happening is, I think, just denial really at this point.

Nathan Labenz: 34:50 It's interesting. I just had Sam, Lesson and Seth Rosenberg on the VC podcast that will publish it here actually too. They were debating the whole time around whether value will go to all incumbents or some start ups, etcetera. But at the end, they both agreed that costs going down is going to radically increase demand. And thus, they were very bullish on unemployment. And I wanted to bring up the tweet that you had around, tell that to the farmers, etcetera, like, employment for who. And, yeah, there's a big question around, to what degree will the displacement happen and on what timeline and the new jobs that are enabled, who will be able to do that, and will enough people be able to transition into whatever these new jobs are demanding? And I think these are all big questions.

Erik Torenberg: 35:47 Yeah. It seems to me like we are definitely headed for some significant disruption.

Nathan Labenz: 35:52 How concrete would you be willing to get in sort of a prediction?

Erik Torenberg: 35:55 I want to be thoughtful about what I would want to get concrete about. One data point where I was a little hesitant to get concrete a year ago, this was in the GPT-4 red teaming time. I was like, holy shit, this is going to be huge. Because it was just immediately so useful to me. For this stuff like medical too, I mean, just you name it, right? I'm just using it for everything. So I'm like, okay, this is going to be huge. And I was talking to a friend about it and they're like, well, how big do you think the market size is going to be? And I was like, Well, I don't know. That's tough because first of all, they just keep dropping the prices. So when you see 98% price cuts or whatever from one year to the next, that is kind of tough from a market growth standpoint. So I kind of hedged on that. Was like, I don't know that I would bet on the revenue directly attributable to LLMs as the metric that I would want to bet on. And so we ended up not making a bet at that time. Now I would say safely that OpenAI's revenue growth has significantly exceeded my expectations, even having seen GPT-4 at that time. I think they did something like high twenties million revenue last year in all of 2022. And now they're at 1,000,000,000 annual run rate, which is to say, eighties million per month. So they've scaled from end of last year, maybe whatever they could have been at 4 or 5, they're at like 15 to 20x in 9 months. And think too about how many tokens that is. $80,000,000 a month when the retail price of GPT 3.5 turbo tokens is $2 per million. So obviously they're serving different models. That's the lower price, although they're also serving a lot of free tokens too. So maybe just to take a totally naive, okay, $80,000,000, $2 per, 40,000,000 times 1,000,000, 40,000,000,000,000 tokens per month they're generating. 5,000 tokens per month for every human on earth is kind of roughly what that backs out to. That's definitely grown faster than I thought it would. And I was pretty sure it was going to grow quite a bit. So, yeah, I don't know. Maybe we should think a little bit more and do this on another episode, and I will definitely be happy to put some guesses on. But the metrics are tricky. One of the things that people are speculating they might deliver at their upcoming developer conference is another price drop. So they're not maximizing in a conventional sense, I don't think. And it is a little tough to figure out exactly what you would want to predict as a result. As hosts, I have to make a prediction, so I'll punt on it, but I won't dodge it indefinitely.

Nathan Labenz: 39:23 Your synthetic data based on client data or inspired by client data as opposed to client data itself, does that solve the problem for the client, you think? Is that something that people would be happy with? Or is that kind of a loophole or workaround?

Nathan Labenz: 39:23 Your synthetic data based on client data or inspired by client data, as opposed to client data itself, does that solve the problem for the client, you think? Is that something that people would be happy with, or is that kind of a loophole or workaround?

Erik Torenberg: 39:34 I think it probably is contextual. Nobody wants their phone number coming out of a language model, right? So that type of very explicit, this is obviously your data and it shouldn't be here, is pretty easy to avoid. You could just change up all the phone numbers, change up the names, the old Dragnet names have been changed to protect the innocent. I think you can definitely go clean up that kind of stuff pretty simply. Conceptual stuff is a little bit trickier. I need to do a little bit more reading on this, but there has been some recent research that showed that it only took very few examples, maybe as few as just one example in some cases, for a model to essentially memorize a certain text. Exactly does memorization imply understanding or whatever? But if you had conceptual IP and it's not the kind of thing that's so obvious as a phone number or an address or whatever, or a price of an item or your code. I put code into ChatGPT all the time, right? So I don't really want my SQL schema to be in the training set, but they could fudge that around pretty easily, I would think. But more conceptual stuff that's like, how is this done? Do we have a certain secret sauce that kind of constitutes a sort of intangible IP? And might that still get through some of these things into a model's conceptual understanding such that other people interested in the field could kind of indirectly get access to your hard won specialized knowledge. Tough. It's really hard to say. It seems unlikely that that would happen, but I certainly wouldn't say you have nothing to worry about. I'm trying to think of a good example of this, right? There's just process chemistry. I studied chemistry in college and there's like the reactions and there's like the molecule and you could sort of say, okay, if we don't tell them what the molecule is, they don't know what it is, they can't reproduce the drug. There's also a lot of stuff that is just kind of known and developed along the way of, well, how exactly is this done? How fast do you heat it up? How gradually do you add this other thing in? And a lot of this stuff is just kind of learned by trial and error and is kind of, you might look at somebody's notebook and say, well, there's not really anything super special here. They're just kind of recording how they did a certain thing and it seems fairly innocuous in and of itself. But if you're like a big pharma company and you've got all this kind of process knowledge that's kind of represented in all these notebooks, I wouldn't just hand that over to OpenAI for training, that's for sure. I'd be very interested in what that might do for me to train my own model on that kind of specialized knowledge. But I would be reluctant to have anybody else get their hands on it. In some ways, it becomes more valuable perhaps in the language model than it is even in the notebooks. From what I've seen, there's not a lot of time spent one scientist reading another scientist's notebook. But if the AI could read all the notebooks, then it might just have some insights that are not obvious to perhaps anyone. So yeah, I think a lot more needs to be kind of figured out there. So far, it's been pretty superficial, but more conceptual stuff is harder to say. And so that's why OpenAI has kind of had this very consistent message recently of like, we don't train on your data. We don't train on your data. We don't train on your data. And I don't even think the synthetic thing would get around it for some of this fuzzier kind of delocalized information.

Nathan Labenz: 43:54 Let's get into some AI bundle talk.

Erik Torenberg: 43:57 Yeah. So, okay, this is possibly a good idea, possibly not a good idea, but I wanted to bounce it off of you. And for context, it stems from the fact that as a customer of AI services all over the place, I am constantly running into, A, just an increasing number of services that I think are awesome and potentially worth paying for. And then also an even longer tail of services that are potentially worth paying for, or kind of something I want to pay for once maybe, but not something I'm going to be a power user of. This is coming to the fore very much at Athena. Again, 1,000 plus executive assistants serving 1,000 executive clients all over the place, a lot of different needs, a lot of different contexts, a lot of different tools that would be helpful in that circumstance. So one of the skills that I'm trying to teach folks is product scouting. Just how do you go out there and figure out, is this good? Is it bad? Does it add anything to kind of base GPT-4? What's worth paying for? And it's tough. And it's also tough on the app developer side. So putting on my Waymark hat, then I'm like, man, we get a lot of traffic from a lot of people who don't really, they're not like power users of our app. We help small businesses make marketing videos. We sell to big companies that do a lot of that. So like cable companies, TV companies, Spectrum, Fox, Gray TV, et cetera. They have dedicated sales teams that go out and sell video advertising all the time. And so they need a lot of video creative. And so they're a natural kind of long term customer for us. But then the small businesses themselves, even today, are not like making a video every day. No matter how easy we make it, they're just not going to make that many videos. And we offer, just like so many retail SaaS businesses, we offer a monthly subscription, cancel anytime. And it's kind of putting us in this weird position where we want to show the product off, we want to give that demo without requiring payment, but we figured we're paying about 15 cents for all the different AI services that we're using to serve one random new user, whether or not they pay us anything. And then our lowest price point, well, we've varied it over time, but you see a range of lowest price points out there, but typically they're like at least $10, often 20, often 30, sometimes more. And so you're kind of in this weird spot where you're like, okay, I need to get one person to sign up for a $30 a month subscription to cover 200 people just trying the product. And then I can kind of break even on that. But it doesn't really feel like super fair or awesome for anyone. And I think Waymark has really good customer retention metrics at those high end enterprise customers. But like many AI apps, we see a lot of people try the product. A lot of people just bounce, obviously, they're not ready to pay the $30 or whatever. Then a lot of other people do pay the $30 because they're like, I love that. I want that. And we don't want them to download the thing that they created until they pay us something. But a lot of those people will just quickly cancel because they're like, I only wanted the one thing. I don't want another subscription. This sucks. So I've got pain on both sides, right? As a customer of AI products, I have a proliferating set of things that I find value in and the bill is adding up and it's ridiculous to try to imagine buying all the things that I might want to use. Just to go down my power rankings a little bit, ChatGPT, $20 a month retail, $60 a month if we're going to buy the recently announced Enterprise Edition. $60 a month is the highest AI platform price that I've seen so far. Windows Copilot, $30 a month. Google Duet, $30 a month. Perplexity Pro, $20 a month. Claude Pro, $20 a month. GitHub Copilot, $10 if you buy it for yourself, $19 if your company buys it for you. Replit Ghostwriter, $10 per seat. And those are just like the literal, the top, top tier of things that I use repeatedly every week. So you add those up and even leaving the enterprise price off of ChatGPT, and just figuring I'm only going to buy one of Windows or Google Suite, I'm still at $110 for kind of the other four of seven of those things or whatever. That's adding up to quite a bit. And Waymark notably doesn't see a cent. So this got me thinking, is there an opportunity here to create some sort of cable bundle like bundle? And who exactly should sell it and how it should be governed? I mean, there's, I think, a lot of little nuances to how this would work inevitably, right? Certainly cable bundles are not without their drama, as we've seen lately, too. But if I'm paying $100 a month, it seems like what I really want and maybe could get would be like a little bit of all the different AI apps that are out there. So instead of having to kind of decide, do I want to subsidize the next 200 free users for this given app by paying the $30, even if I don't really intend to be a power user of this app, maybe there could be some other way where if I'm subscribed to a bundle, I can kind of get access to all these things, at least in some limited fashion, right? So I could kind of pop around the web and like make a Waymark video and then maybe make like a Gamma slide deck and maybe make another one in a couple of weeks. But I'm never going to become a power user and I don't want the $30 per month level. How could that look? And it does seem like there's something there that really could make sense. If I'm a customer and you said, hey, your bill is adding up for $100, we'll curate for you 100 going on 1,000 apps and you'll get at least baseline access to all of them. Presumably, there would be higher tiers for many of these things beyond kind of what would be included in the bundle. But those would be for the real power users that are not going to just make one off things here or there. I think that would be very compelling to me as an individual customer. And then I think about it from the Waymark side, and I'm like, I don't know what our share of that bundle would be. We're more like the ESPN Classic than the ESPN in the cable bundle hierarchy. But we've got a cool product and it's something that a lot of people do need every so often. A lot of people do have a very good experience of it. And then they're just like, I just can't get over that $30 hump right now. And we can't really lower that price that much because that person has to cover the other 200 people. So what I would definitely take is like some small share of this bundle just to be a part of it. And you can imagine complicating that in any number of ways. But even if I were just to say, okay, let's say that bundle is $100. Let's say Waymark gets one one-thousandth of that bundle. We get 10 cents per user that they sell. If they go sell a million users, then that is $100,000 a month in revenue for Waymark. That's a million dollars a year. So every million they sell of those bundles is a million dollars to Waymark and potentially hundreds of others of long tail kind of episodic, yeah, it'd be cool to use this, but I don't really want to subscribe to it right now sort of apps. Then we could serve an audience that we just never really could otherwise serve. That million people, when they need Waymark, they could have it. And we would feel like we're not getting cheated. We would also feel like we don't have to pressure every user to sign up that we're pressuring, but we're gating, we're like putting calls to action, we're trying to do all the things that SaaS apps do to get people to subscribe. It feels like we could really be more like, hey, just have at it, have fun. And then if you're going to make more than five this month or whatever, then you can maybe subscribe to a higher tier plan. And I don't even think that would cannibalize much of our business. And from what I hear from other app developers, this problem is very widespread. And I think you've probably heard some of this in your VC talk as well, right? All of the app developers that I talk to, at least, have a lot of people interested, a lot of traffic as people just kind of explore new stuff all the time. Typically are doing pretty well in new customer signups too, but the retention sucks. And so that's like, man, you get into this high churn environment and that's just not a great dynamic to be in. I don't know. I feel like I like it on both sides. There would obviously need to be some sort of market maker in the middle. Who would do that? Could it be OpenAI? Maybe. Could it be a more neutral kind of editorial type of body? A Wirecutter type of thing where we don't, as Wirecutter, we don't make any of these products, but we just have an authoritative, credible voice of what is in and what is good and what's not good. I could also see that. I don't think OpenAI is going to be super keen in the short term to take on all this editorial. But I could imagine them creating a more sort of rules based system where you might imagine something that's like, you get in based on an initial review and then you stay in based on some amount of usage or also payments could be sort of, what they really don't want to do presumably is do all the negotiation with 1,000 different apps as everybody's like, whatever. But they could easily do like a take it or leave it sort of deal where it's like, you're at this scale, you have this many users, that puts you in this tier, you get this percent and that's that, right? And we have room for however many companies and maybe as the bundle subscriber base grows. I mean, we've got, what, probably 50 million cable subscriptions still in the United States. So like a million cable bundles is small. At a more mature state, if it became 10 million AI bundle subscribers, I don't necessarily think Waymark's share would even necessarily grow proportionally. Instead, it might be like, now we go from 1,000 apps to 2,000 apps and you guys all get more, but not as much more as the whole thing is growing because we're going to expand all the different things that we can include. Obviously, a lot more things are coming on all the time as well, which would make that a natural progression. I don't know. Why doesn't this work? You're the VC. What are the holes that you see in this theory?

Nathan Labenz: 55:34 I totally get how it makes sense from the developer perspective, the user perspective. And my natural inclination is to sort of look at other industries and think about where things like it have happened or where things like it haven't. And we, you know, we sort of alluded to to to cable and sort of the strength of or the durability of that bundle despite people's expectations otherwise. And Ben Thompson writes a lot about how people underappreciate the economics of the bundle and why it makes sense for so many people. I go to SaaS, right? So companies have a whole suite of SaaS products that they pay X amount a month. And wouldn't it be easier if they just paid a bundle in terms of access to all of them? And maybe in some cases that exists. But I don't see that sort of like super widespread. I think it's just the cost of coordination such that on the producer side, such that it would make sense for every individual company instead of the individual companies trying to own the customer relationship directly and then use that as a wedge to expand into, you know, multiproduct themselves. I wonder if that's what might hold that back here is just companies not wanting to be intermediated between them and the customer and then also thinking that they will not just occupy their spot, but that they will occupy the other spots within that bundle, then they'll become competitive.

Nathan Labenz: 55:34 I totally get how it makes sense from the developer perspective, the user perspective. And my natural inclination is to look at other industries and think about where things like it have happened or where things like it haven't. We alluded to cable and the strength or the durability of that bundle despite people's expectations otherwise. And Ben Thompson writes a lot about how people underappreciate the economics of the bundle and why it makes sense for so many people. I go to SaaS. Companies have a whole suite of SaaS products that they pay X amount a month. And wouldn't it be easier if they just paid a bundle in terms of access to all of them? And maybe in some cases that exists. But I don't see that super widespread. I think it's just the cost of coordination such that on the producer side, it would make sense for every individual company instead of the individual companies trying to own the customer relationship directly and then use that as a wedge to expand into multiproduct themselves. I wonder if that's what might hold that back here is just companies not wanting to be intermediated between them and the customer and then also thinking that they will not just occupy their spot, but that they will occupy the other spots within that bundle, that they'll become competitive.

Erik Torenberg: 57:05 But you can still have a lot of competition within the bundle, right? I mean, when I think about my cable bundle, there certainly is plenty of competition within that. It seems to be pretty fierce. And this is very different technology, right? So in the case of cable, historically, you basically just had no other choice to do it if you're a producer of content or very few other choices, bordering on no other choices. And so you might have wished that you could go direct to the consumer, but you just really couldn't. Now with content being delivered over the internet, obviously lots of people are trying that in various ways. And we've got everything plus. The content industry decided to add plus to everything, and the AI industry is adding Pro to everything. So they're doing that. It seems like it's kind of working for some of them, but probably, again, kind of a Power Law type of thing where it's obviously going to work for Disney because they have the gravity to command that type of, yeah, I'm going to buy it. And it is what it is. Few have that. I've not subscribed to Discovery Plus or whatever other pluses there might be. CNN Plus obviously was not super well received. So in any event, in this case, you kind of would much more easily be able to own the customer relationship or at least have it not totally, you maybe would have it partially mediated by the bundle. You'd have to have some login with or connect with. Again, have that in cable too, right? I go to espn.com, I want to watch something on the website, you connect with your cable provider or what have you. So some version of that is maybe still needed to just manage the logistics. But then once you're logged in, presumably the app would know who you are, be able to see what usage you're doing. If you're Waymark, you're going to be creating a business profile and we're going to have a pretty good sense of who you are if you're using the app at all, kind of regardless of who owned the billing relationship. And I honestly think for us, we want to own the billing relationship with our best customers, but we really don't necessarily want to own it with our worst customers. And it's not to say that those customers are bad people, but just that they're not long term subscribers to whatever, 30 dollars a month video creator, especially when we don't lock them into anything. So you just see this so often, this is like, yep, create, download and cancel. That wasn't something where people were disappointed with the service. That was their plan the entire time. They're not coming back later and saying, I don't like this anymore. So I don't know. I still feel like I'm still bullish on the bundle.

Nathan Labenz: 1:00:07 Say more about SaaS. Why isn't this more prevalent at a company level for SaaS products or SaaS tools, either via some sort of Wirecutter thing or just the companies coordinating, one company coordinating?

Erik Torenberg: 1:00:22 Maybe churn isn't so high in general. I think maybe SaaS historically has more of a stickiness to it. I mean, the magic of AI from a user standpoint kind of cuts the other way from a retention standpoint. I can use Waymark with no learning curve, no time spent. And I'm not really forming a sense of, hey, I've invested in this or I've learned how to use it. I don't want to go have to learn another thing. We've taken the learning part totally out of it. So that could be one reason. If people are happy with their retention and they're like, yeah, I don't have, if you simply don't have a lot of people trying your product that you'd like, would be happy to kind of take 10 cents a month from, because they're maybe only going to show up once every 24 months anyway, then maybe it's just not appealing to you. And so much SaaS is kind of that way where you're onboarded and you're trained and all this kind of stuff. That's the first thing that comes to mind for sure. Just the ease of flipping into and out of all these products. Like I can get value super quick, but I also know that I can go get value anywhere else super quick, too. So I just can't, I'm just not forming, don't have time to form attachment to these products in the way that maybe a couple of years ago, I really had to sit down and learn to use it, and therefore maybe I'd stick around a little longer.

Nathan Labenz: 1:02:04 Yeah. I mean, to go back to the cable thing for a second, it's like, from a user perspective, at least, it's like, hey, I really just want to watch ESPN or whatever this channel. I don't need this entire collection of things, and it's probably, it might be cheaper for me to just get the one. That's when people want it to be unbundled. They want it to be bundled when they think it would be cheaper to get the aggregate. They either want the aggregate or they might want the aggregate. And there's some variability in there. And so I understand here why people would want what users would want, the aggregate or the collection, the bundle. And the question is, is it in the company's incentives to be a part of this bundle? And more so than that, whose incentive is it to form the bundle? This is sort of the, to form, maintain, keep up, be accountable for. In theory, whoever does would own a customer relationship. So that seems it'd be very strategic for someone to do so. But would they let other companies be in the bundle and also let them own the customer relationship as opposed to trying to just build competitors? Like you said, maybe it's OpenAI or a new company like Wirecutter, but I don't know how many companies would be incentivized to do such a thing where they include their competitors in a bundle?

Erik Torenberg: 1:03:28 Yeah, so I don't know how realistic this is either, but I was kind of thinking, there's these really interesting moves that the leading AI companies are making where they're competing with each other, but it's honestly not very hard competition right now relative to what it probably could be. There are nascent collaborations with things like the Frontier Model Forum. There are just generally people saying nice things about each other up until pretty recently. And even still, to some degree, there's a lot of knowledge sharing in the form of just publishing research. And maybe above all, there's this most tangibly and maybe most relevantly, there's this part of the OpenAI charter where they say that if another credible organization is getting close to AGI, then they will work with that organization as opposed to racing that organization because they're really afraid of these race dynamics. So we're in a much more mundane moment right now. For how long? Who knows? But certainly at the moment, neither Claude 2 nor GPT-4 are sufficiently close to AGI as to have triggered that clause. But I do wonder if there's, given all the movement that they are making to try to kind of set up these collaboration forums and try to agree on some regulation framework seemingly as well. I do wonder if a commercial agreement like this could be appealing in a way that it wouldn't be if it's just two content owners duking it out, right? Because here they do seem to have a real kind of principled worldview that's like, the last thing we want to do is get into a cutthroat race against each other to the point where we're both, or multiple parties are incentivized to cut corners on safety. They've certainly paid a ton of lip service to that. I tend to think it's pretty sincere. So if that is the case, you could almost imagine the Frontier Model Forum. I think this would end up being a separate thing because this would be like, people are already cynical enough. I don't think there'd be an infinite amount of cynicism in response to the literal Frontier Model Forum doing it. But something like the Frontier Model Forum, the same companies could say, hey, we've all got kind of some strengths and weaknesses. The pie is growing extremely quickly. Let's not compete for every subscriber because we know that probably people are not going to want to subscribe to everything unless we can make it just really compelling and easy for them to subscribe to everything. And then if we could kind of agree on some rules based way to divide up that revenue, then that could be just good for everyone, right? It takes pressure off of us. We don't have to worry that they've got a slight edge or they're releasing their thing before we did. I mean, the moment earlier this year where Microsoft is racing to get their Bing thing out, and then Google's trying to get ahead of them and they get ahead of them by like one day and have some terrible announcement. That is a pattern we really don't want to get into in the GPT-5 Plus era. So anything you could do to kind of say, hey, you launch this week, we'll launch next week, we'll launch next month, whatever, it'll all be fine. Because all these launches make the bundle more compelling, more people sign up, we all get revenue. Maybe we kind of compete on the margins for users, but not necessarily for that more discrete payment decision moment. I could see that being pretty cool. And there's a long list. Let me just give you the anchor tenants in my bundle, but here's just more things that I would like to have in a bundle that I either have paid for and canceled or was at least tempted to, or maybe I'm still paying for without realizing it. So DALL-E, I get that with OpenAI. Midjourney is its own subscription. Playground AI has a super generous free tier, as we may recall from our very first episode, but obviously also has an upsell. Lexica, same deal. Stable Diffusion from Stability has its own kind of pricing for its usage. That's just five image creators. There's a ton of niche ones too. I follow this guy levels.io who is like a prolific individual kind of solopreneur who's put out multiple pretty cool AI apps. And he's also at this kind of price point. And he does well for himself, probably could do maybe even better in a bundle. I mean, I don't know, he's been among the most successful, but I think it could even make sense for somebody like him. But he does these very niche ones like put yourself in this. He was doing that before other people were. So you could have a bunch more just in the image category. Agents type stuff is coming up all over the place. We've had three different agent companies on the show. Text to Speech is another one where I've got live accounts with Well Said Labs, PlayHT, former guest, and 11 Labs. That's just in voice. Then you got all these creative tools, Waymark, Gamma, Tome, Jasper, etc. Every single one of those has a 20 dollars plus per month tier. And they're all super useful, but they all have this kind of dual problem of the friction of paying. And then also the fact that I just don't do it that often. Gamma, it's an awesome way to create a great visual presentation. I don't create that many. I created one for the AI scouting report. I've created another one for like an AI task automation primer. Not that many. Khan Academy, too. Tutoring. I signed up for that thing. I since canceled it. I'm not looking for algebra tutoring on a daily basis. So at some point you just got to keep the credit card clean. I feel like easily there are 500 dollars worth of apps that I would be like, yeah, if I subscribe to everything that I occasionally use, it comes out to a ton. 90% of those are probably not worth it. And kind of in the same way as the cable bundle, I sort of imagine subscribing to this thing where I'm like, yeah, okay, whatever that price is. Well, if I just went and bought ChatGPT Pro by itself and also Copilot by itself and also Perplexity Pro by itself, I'm basically there. So I might as well just buy the bundle and get a hundred other things too. It does feel super compelling. You want to be an equity investor in my new AI bundling, aibundle.co?

Nathan Labenz: 1:10:33 I think it's going to be hard to coordinate. I mean, think of something like ride sharing, like Lyft and Uber. Those companies should have merged. Those companies were offering the same product, basically, and they were fueled by venture dollars to inefficiently offer the same product. Basically, they were both running each other out of business, more or less. And venture subsidized rides. So they diluted, I don't know. Did they both dilute, like, 80%, 90%? I have no idea. But they both diluted a crazy amount because they took so much money. And if they were one company, they could have avoided this whole mess. And I spoke to Emil Michael, the chief business officer at Uber, and he was like, yeah, I tried to get us to acquire them, to merge because it just didn't make sense. But Travis said we got to kill them. And Lyft was like, we hate Uber. So businesses just are irrational at the end of the day. The fact that they're funded by VCs and they all have take over the world visions, it's challenging for people to be in a spot of like, hey, you play this niche or hey, you collaborate with this competitor. When everyone's trying to dominate market share, it's hard to get them to agree to a bundle long term when they would just go on getting increasing pies of this bundle. Now you do see some major players collaborate, but there are a lot of times where they don't, even though it would be much better not only to the user, the customer, but also to themselves. It would cement themselves. I see the example of levels, etc. There are obvious examples where it makes sense. Yeah. Even in situations where it makes sense for all parties, sometimes pride or just the narrative that they told or believed themselves prevents them from doing it. So I'm, color me a bit dubious, though I would like it to happen. And I think a creative idea. Nathan Labenz: 1:10:33 I think it's going to be hard to coordinate. Think of something like ride sharing, like Lyft and Uber. Those companies should have merged. They were offering the same product, basically, and they were fueled by venture dollars to inefficiently offer the same product. They were both running each other out of business, more or less, and venture subsidized rides. So they diluted, I don't know, did they both dilute like 80%, 90%? I have no idea, but they both diluted a crazy amount because they took so much money. And if they were one company, they could have avoided this whole mess. And I spoke to Emil Michael, the chief business officer at Uber, and he was like, yeah, I tried to get us to acquire them, to merge, because they just didn't make sense. But Travis said we got to kill them. And Lyft was like, we hate Uber. So businesses just are irrational at the end of the day. The fact that they're funded by VCs and they all have take over the world visions, it's challenging for people to be in a spot of like, hey, you play this niche or hey, you collaborate with this competitor. When everyone's trying to dominate market share, it's hard to get them to agree to a bundle long term when they would just go on getting increasing pies of this bundle. Now you do see some major players collaborate, but there are a lot of times where they don't, even though it would be much better not only to the user, the customer, but also to themselves. It would cement themselves. I see the example of levels, etcetera. There are obvious examples where it makes sense. Yeah. Even in situations where it makes sense for all parties, sometimes pride or just the narrative that they told or believed themselves prevents them from doing it. So I'm, color me a bit dubious, though I would like it to happen. And I think it's a creative idea.

Erik Torenberg: 1:12:23 Yeah, you can almost see in the Lyft and Uber example, and also if you visit the Sling TV pricing page, you can see kind of an alternative vision of this, which I think is maybe more likely for the reasons that you're describing. What they have on the Sling TV pricing page is the orange pack and the blue pack. And basically they're kind of just two different content alliances that have kind of clustered around some core pieces. So you can kind of imagine a version of that that could be like the OpenAI bundle and the Anthropic bundle, or maybe the OpenAI Microsoft bundle is kind of one axis and then the Anthropic Google bundle on the other axis. And then each one could kind of have a bunch of smaller players that are kind of in their ecosystem. You can imagine like to be in our ecosystem, you have to use our models. That sounds a lot less awesome to me for most people involved, but I could see how certainly the folks at the leading companies right now do feel like they're kind of, if not taking over the world, certainly in a very privileged position where they're going to get to call a lot of shots. And yeah, if they don't want to cooperate, certainly it's going to be very hard to make them. The other thing I think could be also pretty interesting about this is I think you could get a lot of people to buy the bundle, particularly if a lot of the app developers, and maybe this would end up being like something you have to do, again, don't want to end up negotiating too much, it's just going to be way too much friction. But I can also see something where it's like, hey, you know what? At Waymark, we don't give away the AI experience anymore. You got to subscribe to the bundle. And if you did that and all of a sudden all these apps that are currently kind of out there for free-ish for a little or whatever, then maybe each one kind of serves as a point where it's like, yeah, another reason to get the bundle. That would be a very different dynamic from, say, cable where I don't get to see what's on Discovery Channel haphazardly and then decide to buy it, I guess, maybe out of home or whatever. But I don't have that kind of quick taste of this one random thing that I want that then leads me into the bundle from like a thousand different directions in a more kind of distributed high surface area, more touch points way might have more ways into the bundle as well. I'll maybe remain hopeful. I think it's funny, this is kind of the classic, the market can remain irrational longer than you can remain solvent. Companies don't necessarily always act rationally. Certainly undeniable fact. But if that's the biggest reason it won't happen, then at least it feels like there's maybe some chance that it could.

Nathan Labenz: 1:15:08 That's a good note to wrap.

Erik Torenberg: 1:15:10 Cool. Well, let's see if we can go talk anybody into it.

Nathan Labenz: 1:15:13 Nathan, always a pleasure. Until next time.

Erik Torenberg: 1:15:15 It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcr@turpentine.co, or you can DM me on the social media platform of your choice.

Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools

Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools

Zvi's Mic Works! Recursive Self-Improvement, Live Player Analysis, Anthropic vs DoW + More!

OpenAI Fine-Tuning Update, Acceleration Debate, and Bundling AI

Watch Episode Here

Video Description

Full Transcript

Transcript

Nathan Labenz

Read next