AI Engineers, Pendants, and Competition Between OpenAI and Developers with Swyx of Latent Space
Watch Episode Here
Listen to Episode Here
Show Notes
In this episode, Nathan sits down with Swyx of Latent Space to chat about AI engineers and tools to check out, competitive dynamics between OpenAI, other foundation model providers, and developers, and if Swyx would wear an AI Horcrux. If you're looking for an ERP platform, check out our sponsor, NetSuite: http://netsuite.com/cognitive
Subscribe to the show Substack to join future conversations and submit questions to upcoming guests! https://cognitiverevolution.substack.com/
SPONSORS: NetSuite | Omneky
NetSuite has 25 years of providing financial software for all your business needs. More than 36,000 businesses have already upgraded to NetSuite by Oracle, gaining visibility and control over their financials, inventory, HR, eCommerce, and more. If you're looking for an ERP platform ✅ head to NetSuite: http://netsuite.com/cognitive and download your own customized KPI checklist.
Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.
LINKS:
TCR Episode with HumanLoop: https://www.youtube.com/watch?v=EKKJrRWOU30
TCR Episode with Guardrails' Shreya Rajpal: https://www.cognitiverevolution.ai/videos/e28-keeping-the-ai-revolution-on-the-rails-with-shreya-rajpal-of-guardrails-ai/
AI Engineer Summit: https://www.ai.engineer/
Latent Space: https://www.latent.space/podcast
X/SOCIAL:
@swyx (Swyx)
@labenz (Nathan)
@aiDotEngineer (AI Engineer Summit)
@CogRev_Podcast
TIMESTAMPS:
(00:00) Episode Preview
(00:00:49) AI Nathan’s intro
(00:03:14) What is an AI engineer?
(00:05:56) What backgrounds do AI engineers typically have?
(00:15:51) Sponsors: Netsuite | Omneky
(00:17:13) Swyx’s Discord AI project
(00:20:41) Key tools for AI engineers
(00:23:42) HumanLoop, Guardrails, Langchain
(00:27:01) Criteria for identifying capable AI engineers when hiring
(00:30:59) Skepticism around AI being a fad and doubts about contributing to AI
(00:34:03) AI Engineer Conference speaker lineup
(00:41:14) AI agents and two years to AGI
(00:46:04) Expectations and disagreement around what AI agent capabilities will work soon
(00:50:12) Swyx’s OpenAI thesis
(00:53:03) AI safety considerations and the role of AI engineers
(00:56:24) Disagreement on whether AI will soon be able to generate code pull requests
(01:01:07) AI helping non-technical people to code
(01:01:49) Multi-modal Chat-GPT and the future implications
(01:03:33) Nathan living in the same dorm as Mark Zuckerberg
(01:04:44) Competitive dynamics between OpenAI and other AI model developers
(01:05:39) Play.ht vs ElevenLabs
(01:09:20) The tension between platforms and developers building on top of them
(01:11:40) The best thing startups can do to compete with foundation model providers
(01:16:26) User identity/authentication services like Login with OpenAI
(01:19:20) Google vs the other live players
(01:20:46) AI Horcruxes / Pendants
(01:22:05) The concept of an AI app bundle for consumers and developers
The Cognitive Revolution is brought to you by the Turpentine Media network.
Producer: Vivian Meng
Executive Producers: Amelia Salyers, and Erik Torenberg
Editor: Graham Bessellieu
For inquiries about guests or sponsoring the podcast, please email vivian@turpentine.co
Music license:
YDFN97TZTKQFPQFT
Full Transcript
Transcript
Swyx: (0:00) The AI horcrux, or AI pendants. I would be willing to wear one of those. So I have an Aura on me right now. I'd love to log all my conversations and then be able to talk with it. These things are coming. And yes, they will be multimodal. All of us having an effective digital twin that we can talk to and use as at least just a note-taking thing, if not exposed to the wider world, I'm very excited by, and it will be built by the AI engineer.
Nathan Labenz: (0:24) Hello, and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week we'll explore their revolutionary ideas and together we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my cohost Erik Torenberg.
Nathan Labenz: (0:47) Hello and welcome back to the Cognitive Revolution. This is Nathan's AI voice cloned with ElevenLabs. The real Nathan is traveling for a wedding today, but we are both very excited to share this conversation with Swyx. Swyx is a renaissance man in the AI space. He is a builder, an educator, a connector, and an all around very nice, encouraging, and collaborative person. We cross-posted one of his Latent Space podcast episodes with Linus Lee of Notion AI a couple months back, and I was excited to finally have a proper one-on-one conversation. We talked about Swyx's thought leadership, which has helped define the emerging role of the AI engineer, and also the AI Engineer Summit that he has organized for October. Tickets are sold out, but you can still join online to hear from an impressive lineup of speakers, including a number of past Cognitive Revolution guests. We also covered the state of AI agents, whether code PR generators will soon begin to work, the future of the app layer, the possibility that OpenAI and other leading model developers may come to compete with app developers as they add more tools to their platforms and modalities to their APIs, and the possibility of an AI application bundle. As always, if you're finding value in the show, I invite you to leave us a review on the platform of your choice. Now I hope you enjoy this conversation with Swyx.
Nathan Labenz: (2:09) Swyx, welcome to the Cognitive Revolution.
Swyx: (2:12) Thanks. I've been a longtime listener and very excited to be a first time caller.
Nathan Labenz: (2:17) Well, thank you. Glad to have you here, and I'm also a big fan of your work with the Latent Space podcast, the newsletter, and also looking forward to what you guys are putting together with the AI Engineer Summit, which is coming up in just a couple days. So I'm excited to get into all of that with you.
Swyx: (2:35) Yeah, happy to dive into that. We did a cross-post, I think, a few months ago, and I really liked your deep dive into the Tiny Stories stuff, and that's the one that we featured on our feed. And so I feel like you have the room to go much more in-depth than us. So I really appreciate the work that you're doing, these two-hour things with researchers. It's really impressive.
Nathan Labenz: (2:56) Thank you very much. I really appreciate it. I guess for starters, I thought we'd organize this by taking a little bit of a broad view survey of your work over the last year or so. As far as I know, you've coined this term AI engineer. And so I guess I wanted to start off by just asking you, what is an AI engineer? I'm fascinated in general by these new AI jobs. We've got the prompt engineer and a few different things have been put forward. It seems like the AI engineer, though, might have more staying power than the prompt engineer. So how do you think about that new emerging role?
Swyx: (3:31) Yeah, I definitely think of prompt engineering as slow 2022 and AI engineering as slow 2023. And I feel like this is a controversial take a little bit because everyone should be able to use AI. There is no restriction on who does and does not use AI, but I do think that people who are choosing to specialize in the AI stack probably deserve a full-time role that describes what they do. And out of all the possible names that people have proposed, like Cognitive Engineer, LLM Engineer, probably the one that is going to win is AI Engineer. And so I'm not so much coining it as observing that this is a trend that's happening and putting all my chips on red as they say. So the AI engineer is a software engineer specializing in AI and the emerging AI stack. An ML engineer is not an ML researcher, both of which are much more established roles and much more on the research-oriented and MLOps side of the fence. It is everything to do with what happens after you have a model in production and maybe with a little bit of fine-tuning, which is, as everybody knows, just extra training on top of the vast amount of pre-training that has already been done. And I think it's basically an emerging category for a few reasons. It's more or less just demand and supply. And I started life as a finance and economics guy. I was a trader at a hedge fund for quite a few years before I was a developer. And I just think it's pure demand and supply. There's maybe 5,000 good LLM engineers in the world, LM researchers in the world, and you cannot hire them as the average company, average startup, whatever. There's just no way you'll ever actually be able to build this talent in-house. For better or worse, these models are now available as APIs or as open source models. There will be a corresponding rise in demand for engineers who are capable of putting them to use, even though they don't necessarily train them on a foundation model research basis. I do think that there will be a rise in this category of the AI engineer, and there's definitely a rise in startups serving that category. And so I've pivoted Latent Space, the newsletter, the podcast, and now the conference, all towards serving this persona, which I identified to be one of the most important.
Nathan Labenz: (5:45) Yeah. I think I would count myself among the AI engineers as well. I guess I probably come to it from a somewhat nonstandard background, but maybe you can tell me, where do you see the AI engineers coming from? Are they mostly software engineers who've just taken an interest in AI and gone down the rabbit hole, so to speak, or are there other backgrounds that you see as well?
Swyx: (6:08) Yeah. They're going to be mostly software engineers taking an interest and getting deeper in AI. In the post that I wrote on Latent Space, the key visual to have in mind is this left-to-right spectrum of research-constrained work versus product and customer-constrained work, or basically, how close are you to machine learning research, or how close are you to applications of that research? And so on the far left, it's the research scientists and the ML researchers. A little bit further along is the ML engineers, data scientists as well. I would class them within that category. After you've put those models into production, then you get the AI engineers, which are the emerging category, and then finally, the full-stack generalist engineers who are working on the last mile of UI and UX to deliver AI products to people. And so I think just like this emerging category of AI engineers, it's much more people on the right side of the spectrum, the software engineers moving left into AI. So they're learning this. They're suddenly learning what tokenizers are and what embeddings are and what vector databases are, why prompting chain-of-thought makes sense rather than zero-shot generation of stuff. And it is much more likely to be people from the right moving left than people from the left moving right, meaning the ML engineers and data scientists moving right into applications, even though you do get those. So for example, Raza Habib, my most recent guest from Humanloop, has a PhD in probabilistic programming with Bayesian networks. But he's working on an LLM ops solution just because he sees that a lot more people can be served that way. So there are people with all sorts of backgrounds, but I do think it's primarily software engineers.
Nathan Labenz: (7:47) We've had somewhat of a parallel trajectory in the podcasting game over the last few months. Raza was also an early guest because I've been a customer of Humanloop since early this year and definitely have got a lot of value from the platform. Now starting to use it also to, I think, really get to the part of the vision that they probably had in mind early on, which is the really accessible fine-tuning loop that they've enabled. And we just did an episode going down that rabbit hole a little bit talking about, I think my number one insight there was using GPT-4 reasoning as part of the fine-tuning dataset for 3.5 as a way to really, in my experience, dramatically improve the results. And that was something that Humanloop made much, much easier than it otherwise would have been. So funny because I had already been a customer for months, but I hadn't used all the latest stuff. And then I was thinking to myself, how am I going to code up my own loop to do this? And then I thought, well, I should check Humanloop and see what all the latest features are. And sure enough, they had done a really nice job of anticipating the need and really streamlining that process. So what do you think are the key skills for the AI engineer? If I were to come to you and say, Okay, I've got a background in software development. I've played around with ChatGPT, and almost everybody's used Copilot or something like that at this point. Where do I go from there to make myself employable as an AI engineer? What do employers, what do the app developers most need?
Swyx: (9:25) Yeah. Totally. I think that's where most people start, which is use the off-the-shelf models that have the most adoption, and that is going to be Copilot and ChatGPT. I do think I have been mapping this out. So if you go to the About page on Latent Space, we also have an emerging email course that we call Latent Space University to trigger the Louisiana State University fans. And it's mapping out the curriculum of what I expect to be the baseline competencies of an AI engineer. Like, just what the job title implies. Like, what would you expect people to do? And I think I want to ground this in a very practical manner because it's a demand and supply issue. Companies are trying to hire these people, and people are trying to learn to be useful to companies so that they can be part of the AI movement, but then also apply their engineering knowledge towards building useful and interesting things. And so I'll just list through some of them, but we can go into more details as needed. The high-level themes are AI UX, AI coding tools, LLM tooling, AI infra, inference hardware, and that includes fine-tuning as well. And then finally, the most speculative would be AI agents, which everyone has been talking about but doesn't have that much practical use. So I think there's definitely a trade-off between what people actually use at work versus what people just like to talk about on Twitter and star on GitHub, and promise AGI without actually doing anything. So maybe that might be a hot take there. In my course, so we have a seven-day email course, and that's meant to be, I'm never going to make money from that. That's just meant to be, start here if you want to learn AI engineering or you're a software engineer who's never dealt with any of these APIs. So Day 1 would be just try the GPT API for the first time, and just a lot of software engineers just haven't bothered. And I think you'll be surprised at how much control, how many options there are available, thinking about running into the context limit for the first time, which ChatGPT and Copilot can hide away for you. And just get familiar with all those basics. I think it's very important. Day 2 will be prompt tooling and memory, and that's where you get into the vector databases and the LangChains and all that. And those are just frameworks that will hopefully help with the development. And I do encourage people to build some abstractions of these themselves before using something off-the-shelf, so you get a better understanding. Day 3 is code generation. Day 4 is image generation. Day 5 is speech-to-text. Day 6, fine-tuning and running open source models. And then Day 7 is building an agent. And I think once you finish that little sampler course, or tour, you have the base capabilities that I expect every AI engineer should have such that whenever any PM or any CEO comes to you with, hey, I have an AI product that I would like to build. Is this possible? You can figure out if you can build it, or you can just tell them, hey, it's not possible for X, Y, and Z reasons. That way, you'll be useful as an AI engineer.
Nathan Labenz: (12:16) So that's really interesting because this is the old XKCD, somebody's wrong on the internet. Right? I almost have my own version of that with people posting GPT-4 can't do X things when in fact it definitely can. And they're either, in many cases, using the wrong model or prompting it wrong or doing a bunch of things that are wrong. So I actually find that people often come to wrong conclusions about what AI can and can't do today. Obviously, it's a fast-moving frontier. So value of knowledge decays pretty quickly. I mean, seven days definitely seems like enough, especially if you really go for it, and this is a remarkable thing, right? Part of what I love about the AI space in general is just that you can jump right into the frontier. I think that's just extremely cool and super fun. And I would think most technologists would agree with that. So I do think a seven-day intensive is a pretty reasonable amount of time to get mostly up to speed. I guess if there was one thing I would wonder about, it would be, would folks after that kind of intensive have a well-developed sense for what really is or isn't possible? And how would you coach people maybe on continuing to refine their sense of what is and isn't possible? Because I see way too many people just giving up too soon.
Swyx: (13:40) Yeah, this is interesting. So first of all, I actually wouldn't describe the course as intensive. It's just an email course. It's meant to take an hour a day. And then we leave breadcrumbs to go explore with a lot more details and suggestions for side projects and stuff. I think that if you want to be up to speed, there's a vast gulf between, hey, cover the fundamentals and be cutting edge. Because the people who are cutting edge are the people who are hiding in Discords and on Twitter and talking about very niche-y, jargon-y stuff that you won't even see for a few months out. So I don't really know if that's a realistic goal for most people. I think most people want to make sure they cover the fundamentals and then be able to build most projects that you've seen out there that make money. And ultimately, I think that's what most people want. To be cutting edge means you have to go down a lot of rabbit holes, and I'm not exactly sure if I would recommend that for most people because that is a full-time job in and of itself, which is why, by the way, I call it an AI engineer because I do think that this deserves a separate category or a job title because this is a full-time job keeping up on things. But I do have a recommendation, which is I have a list of Twitter people to follow. I have a list of Discord communities that I watch and obviously a list of podcasts and newsletters, which I recommend, which you're definitely on. So all of those are on my GitHub. I have an AI Notes GitHub where I link to all these lists of Twitter, YouTube, podcasts, newsletter people. By the way, we're also running a survey of who people listen to, so a little bit of a market share competition going on. And if you want to Google State of AI Engineering Survey, you can get a sense of who people are listening to. And I think that's important to understand, what drives people's attention towards individual projects. That is something that I should keep note of as well.
Nathan Labenz: (15:34) Yeah. Interesting. I'll have to definitely subscribe to your Twitter list, which I did not know that you had out there. But I do actually get most of my stuff, I think, first from Twitter still, and then definitely have a ton of Discords that I've joined over time as well. Hey, we'll continue our interview in a
Swyx: (15:52) moment after a word from our sponsors. I might check in with you on this. So I have this little project where it basically scans Discords on a daily basis and then summarizes them into an email. And I'm wondering if I should just release that as a thing that people can subscribe to. Because I think it'll be popular but also very noisy. Because people discuss all sorts of things in Discord, and it might not actually make sense.
Nathan Labenz: (16:16) Well, I mean, that's the AI challenge in general. Right? It's like, how reliable can we make these things? I definitely think if you could get it to work well, it would be valuable. I've got to be in, I don't know, 50 different Discords at this point. And it is super noisy in there. And I honestly I join them. I'll often scan around, see what's happening, see what discussion is there. And then often I don't really come back very much. So good God, the number of Discord notifications have become unmanageable. So that just gets tuned out. So I actually wonder if there would be a way to intercept the notifications or use that as a particular signal as opposed to having to grab and summarize everything. I do think if it worked well and could create an auto-GPT style digest of the news from all these different projects of interest, I definitely think that would be really interesting.
Swyx: (17:16) I might release that as a side project. I don't know how popular it would be, but I think it definitely solves a pain point for myself. It's funny because one of our unreleased episodes that we just recorded is with Jeremy Howard from Fast.ai. And in his recent post he had a post recently on single-shot learning or sudden drops in the loss curves, trying to figure out the origins of that. And he mentioned the Alignment Lab Discord. And I always just find these Discords of very elite LM people just emerge from nowhere. Like, EleutherAI Research is another one. I'll just throw these names out there. For people who want to find it, they're all on my list, so you can go check it out. But how do you find them as they're emerging? Because most people found out about EleutherAI after Eleuther was forming and then the core community was already established. And now it's kind of, people have departed Eleuther into what I've been calling this sort of Eleuther Mafia recently. Eleuther is still going, obviously. And I just think, you want to join these communities when they're early and everyone's still trying to figure it out and building something interesting. And Jeremy said most of this stuff is also happening in private channels, not public channels. So it's just an extra walled garden inside of a walled garden. And just joining Discord isn't enough. You have to invest enough to get access into the private channels is what I'm saying.
Nathan Labenz: (18:35) I need to get myself invited to some more private channels.
Swyx: (18:38) For what it's worth, the bar for the Latent Space Discord is you show up, you introduce yourself, you're invited in.
Nathan Labenz: (18:44) Well, I appreciate your inclusive approach. A few more just general state of the field questions, then I want to shift toward the event that you have coming up soon and talk a little bit about where things are going as well. The tooling for AI engineers, if I understood you correctly earlier, it seems like the pattern is mostly identifying the best available tools and then figuring out how to make those work together. As opposed to, obviously, you distinguish this from deep ML research, but even from self-hosting, it seems like most of these things are services that people are figuring out how to put in concert and not too often spinning up their own services at this point. Is that accurate?
Swyx: (19:34) Sure. Even though they absolutely can. Right? This is the power of an engineer that you could decide to build versus buy at every point in time. And that's always a fun topic of conversation whenever you need to build versus buy.
Nathan Labenz: (19:45) When you think about that build versus buy, that's such a common debate, obviously. And it's a tricky one for a lot of folks because the tension that I experience is, I want to be early to market, if not first to market. Right? At my company, Waymark, it's like, I want to be the, we have this video maker. I want to be the first video maker that does A, B, C, X, Y, and Z for you. And to do that, you have to build more than you would have to build if you just waited a little while because, if you have a little patience, then sure enough, the market provides solutions for a lot of these things that you need. So I think it's obviously super contextual, but I wonder if you have any sort of framework or general high-level view of how people should be thinking about build versus buy, especially given how quickly things are coming online for us to be able to buy?
Swyx: (20:41) I generally have a pretty vendor-friendly version of this because I have been a developer relations person for five years. But also I'm very sympathetic to the choose boring technology, try to limit your own dependencies as much as possible mindset. And so basically, what I often say is buy first, and then the moment you understand your domain well enough, be in a position to be able to build yourself. And this way, you get the benefits of people setting you on the right path with best practices that they've learned from everyone else. But you're ready at this point to understand that this field is so immature, you may have to rewrite, you may have to rip out something that you started not liking. So this was the topic of a lot of discussion this summer with LangChain, where a lot of people had this exact journey. Right? They started out building an app with LangChain, then they ran into a lot of problems that maybe LangChain wasn't well designed for, and then they ripped out LangChain, and they said LangChain is crap. And I think that's kind of unfair to LangChain just because they're evolving as well. I mean, the company is less than a year old. So I think it's fine. If you chose to be in this field, you are choosing to be on the cutting edge, and sometimes the cutting edge cuts you. And that's what they do. That's the nature of these things. I would say, basically, buy off-the-shelf first and understand what people are building out there because you plug into existing communities of people who have encountered these problems for far longer and have dealt with them and thought about them much deeper than you have. And then anything that you don't understand or don't appreciate after working with them for a few weeks, then you can rip them out and build your own. Right? But I think at this point, we have on our show notes, Humanloop, Guardrails, and LangChain. These are fairly well-established frameworks among the community that you should at least understand as table stakes for what AI engineering is today just because there's been a few hundred thousand people ahead of you mapping all these things out.
Nathan Labenz: (22:39) What other tools would you put on that list? I mean, we talked a little bit about Humanloop as, just to set the level there, at least the way I think about it, first as a playground in which to develop prompts. So my workflow is I go there, and I workshop a prompt until I get it working reasonably well. Then the reason I do it there as opposed to anywhere else is because I can hit save, and then immediately, I have an API that I can call that basically puts all the prompt management stuff on the Humanloop platform so that all I have to do as a developer from outside is just provide a couple variables, one or more, whatever I set up, and that makes it super easy to develop against. Then I can also update my prompt without having to do code changes, which is quite nice. And they log everything for me and allow me to come in later and post-process it, evaluate it, export certain subsets of data, which I might use to power fine-tuning, based on the most successful data points, whatever. That's really cool. We've had Shreya from Guardrails on the show a while back as well. You want to give your take on Guardrails and how that's used?
Swyx: (23:52) Yeah. Guardrails is, I would say, on the output validation side of the fence. And as the name implies, it puts sort of safety rails on what generative AI does, specifically generative text. And I would say it is competing somewhat with the other orchestration frameworks because there's only one slot for the LLM interface layer, and everyone's fighting for it. Humanloop wants to be in that interface so they can track everything. And so do all the other LLM ops and prompt ops companies is what they'd be called. In my podcast with Raza I called it the foundation model ops companies, but they all effectively have the same thing. Right? We'll track and version manage your prompts, and then we'll eval them, and then we'll track your cost too and check your latencies and whatever. They all have the same roadmap because it's fundamentally an ops product. And Guardrails and LangChain, slightly different. They're much more on the app application framework side of the fence with an open source first and monetization second. And Guardrails would do things like validate that the SQL generated is correct if you're trying to use AI to generate SQL. They used to have output validation. LangChain also has this, by the way, output validation for valid JSON if you're outputting valid JSON. And that went away when, obviously, OpenAI came out with its own version of that. And so I think the roadmaps with these sort of orchestration or application LLM-specific frameworks will evolve, but Guardrails is much more focused on safety and production readiness, let's say, and LangChain much more on the orchestration even though they collide on some features because all of them are in each other's feature space. Swyx: (25:33) Yeah. That's super interesting. Let's come back to that in a second and get a little bit more into the competitive dynamics. But before we do, and this starts to give you an opportunity to talk about the upcoming AI Engineer Summit as well, how would you suggest that people go about looking for AI engineers today? Whether you're whatever kind of company you might be, right? Software application company or, honestly, any company that has a lot of operational overhead could probably make great use of an AI engineer even if they're not putting a customer-facing application out there necessarily. What do you look for if you're hiring that skill set? It's so new. People are lost. People are just looking for somebody to tell them what to do, but obviously, that creates a lot of opportunities for them to hire somebody that doesn't really know what to do. So if you don't have the skillset yourself, where do you go to look for it? How do you evaluate it? And of course, the upcoming summit could be one of those places.
Nathan Labenz: (26:39) Honestly, I really should set up a job board. I've been chatting with a couple of companies like, hey, I need someone to just set up a job board. I don't even have to make money on these things. I just want to make sure that people who are looking for others, whether it's you're hiring or you're wanting to be hired, you have a common place to match. Right now, it's on Twitter. Anyone actively talking and shipping projects is fair game. Within Discord, in the Latent Space Discord, we have a hiring channel, and people post jobs there, and I know people have gotten hired off of there as well. I think just regular channels and communities, posting in the monthly who is hiring job posting boards on Hacker News, that's typically the way I would recommend these things. And, obviously, coming to AI engineering conferences and talking with people who are also attending, that tends to be a very high signal filter for who's very engaged. And I would say, especially if you yourself are not an AI engineer and you want to hire someone who's an AI engineer, the core competencies that we listed out earlier in this podcast, that we have on the About page, that we have on the Latent Space University curriculum, is what I would expect the bare minimum for people to understand for working as an AI engineer. And I don't think it stops there, because I do think that a core requirement that you cannot really test for, you just have to sort of observe in an AI engineer, is that they are entrepreneurial. They're comfortable with things that are not that well defined, because prompt engineering is not that well defined, because the model landscape shifts every single month with the release of a new model or whatever. They have to be able to be on the ball and proactive and not just, hey, I'm a data scientist, and now I do LLMs now, and now I'm an AI engineer. And I don't think that is the kind of person that will put your business at the cutting edge of what you can do with generative AI. They have to be a little bit entrepreneurial. They have to proactively come to you with ideas instead of saying, hey, we'll throw an LLM layer on top of the existing app, which is totally fine. But I do think that, especially for me and especially the high-level people hiring that I talk to, they want someone who's a little bit entrepreneurial. So people who can just ship their own projects and get attention for them or use new techniques and put them in practice. And I do have some, in the Rise of the AI Engineer blog post, some role models that I highlight. I think the entire Vercel team has been doing an awesome job of showing what it is like to be entrepreneurial. Right? Even though they're not a foundation model team, they can put these things to good use in a way that is appealing to enough people that they get millions of users on their free projects. And I think that is something that every company should have, and that is very much the ethos of the kind of AI engineer that I want to encourage.
Swyx: (29:40) Why are not all software engineers jumping at the chance to be AI engineers? It seems like there's something weird going on there. If somebody says to me today, oh, these AIs, they'll never be really useful. They're not that useful. There's no good use cases, whatever. The first thing I would say is code. You can question anything, but you really cannot question the utility of a Copilot, and you definitely cannot question the utility of coding with GPT-4. It is, for me, I would say a multiple speed up safely. And I maybe am not the very best software developer in the absence of GPT-4, but I've got plenty of experience and definitely can make stuff work. Given how powerful it is for developers in their existing workflows and the fact that it's been integrated into their tools before most other tool sets, what am I missing? Why is it not the case that everybody is gravitating this way?
Nathan Labenz: (30:44) Yeah, I do think there's some baseline skepticism about whether or not this is a fad, and a lot of people have been burned by crypto. It's a very fair thing to have some skepticism about anything new. It's a very fair thing to have some self-doubt over your credentials. Like, do you need a PhD to make progress in this field? And very much why I'm trying to promote the AI engineer is to encourage the idea that, no, you actually don't need credentials to make progress in this field because everyone is effectively uncredentialed when transformers themselves are 6 years old. And when GPT-3 itself is 3 years old and many of the techniques and the companies that we talk about in this space are all less than 1 year old and all making tremendous progress in AI. And by the way, one of the biggest promoters of the AI engineer concept is Andrej Karpathy, who very kindly supported the concept with the idea that there should be more AI engineers than ML engineers, and you'll be very successful without needing anything. So I think there's some mix of skepticism about AI being a fad, and then there's some mix of self-skepticism about whether they can contribute in AI. And then there's always fundamental misalignments or some fundamental doubts about the stochastic parrots argument. Right? Whether or not just multiplying a bunch of matrices can actually approach anything regarding simulated intelligence, which we can always talk about in those kind of dorm room hallway style conversations, like, is consciousness? What is intelligence? But I mean, you and I know, especially with coding with AI, these things are actual productivity enhancers, and they've gone from single line autocomplete to function autocomplete to entire code base generation. They can help you as tools of thought with a human in the loop or they can function without humans in the loop, all of which we're going to see as part of the conference. Right? And that leads us the more and more you have this sort of AI in the driving seat, that leads you more and more towards autonomous agents. There's a wide spectrum of AI as sort of autocomplete all the way to AI as autonomous agents. And I do think you just have to pick your line of where you think usefulness currently is and understand that that will probably move over time. We have about 6 orders of magnitude more improvements in terms of scaling according to Nat Friedman until the end of the decade. So take whatever we have today and project forward.
Swyx: (33:00) Yeah. It's going to get wild. Pretty hard to imagine what pops out the other end of 1,000,000-fold more compute than went into GPT-4. So tell me about the speaker lineup. I went through, there's 28 confirmed speakers shown on the website. I was proud to have had 7 on the podcast as former guests, which was pretty cool. And it seems like you're basically creating a lineup that covers all the inputs basically that an AI engineer would need, right, from foundation models to these frameworks to the quality control measures. And then there's a number of folks from agent startups as well. Run that down and particularly on agents, I'd love to hear your perspective. I wonder, is that something that you see as being the next thing that's going to come on for the AI engineer to tap into? Or is that a different sort of thing where it's more the AI engineers that are building the coolest new stuff on those lower levels?
Nathan Labenz: (34:00) I basically put together from my network all the top speakers that I thought were building interesting things for AI engineers, and specifically for a more technical audience because there's other conferences happening all this fall in San Francisco, but they're all very high-level in general, and they spend a lot of time talking about policy, safety, regulation, copyright, and all those things. But for builders, I think there wasn't a builder-specific conference until mine came along. And then, obviously, OpenAI had to top it with Dev Day, which we can also talk about, which I think you and I are also going, which is I'm very excited. We should do a live podcast on Dev Day by the way. Yeah. We have people from Auto-GPT is our presenting sponsor, with OpenAI, with Microsoft speaking, Notion speaking, Amazon speaking, and GitHub. All these are top names, I think, in the AI engineering field, but I also wanted to balance it out with names that you've never heard. Nathan Honsur from the Rust LLM community is speaking. And we also have the first, I think one of the world's first demos of Adept, one of the world's first demos of New Computer, and just trying to get a mix of projects that you've never seen before. So Lindy is a very, very, very hyped agent project that they've never done a public talk for. And so I think we're trying to be the stage where people launch these new features, new products, projects for the first time to provoke some thought, and then balance it out with people who talk more about sort of active production issues. So I think one perception of AI is it's all greenfield. It's all 20-year-olds building toy projects. It's not actually in serious work. And so I want to balance it out with Eugene from Amazon who is running Amazon Books with language models, and Chip Huyen who just finished the O'Reilly book on LLMOps in production, people like that who are actually implementing AI in large-scale production systems. Hex and Perplexity are also speaking with us on how to pivot an existing non-AI company into an AI company, and I think both of them have done a fantastic job of that. So I'm curating this list. I do have very large gaps that I'm very conscious of. Right? So I don't have image generation people, and I don't have that much infrastructure, the BaseTens and Replicates of the world, and I would like to feature them next year. I just had a limited schedule of a 2-day single-track conference. The underlying thing behind this conference is I'm trying to basically create the ICML for engineers. So ICML, the International Conference of Machine Learning, is the ultimate, if you go to one conference a year, that's the conference you go to if you're a machine learning researcher. There's no equivalent for the engineer. And so what we're trying to do is try to offer that. And so this thing's going to grow over time. I kind of see this as a 10-year commitment towards building the ultimate survey of the field at any given point in time for engineers.
Swyx: (36:53) I'm jealous that you have a demo from Flow and Lindy among a bunch of other cool stuff as well. I had him on as an early guest and continue to follow their progress from little hints they give out to the public, but I still have not been able to get into that thing even as an alpha user. So be very, very curious to see what they're about to show off.
Nathan Labenz: (37:14) I'm very excited too. And, honestly, I think you're doing a great job with your reports as well, and I'd love to have you as a speaker in the next iteration. But, yeah, maybe I'll spend a bit of time talking about agents, right, which is a very, very hot topic. So I will say, maybe the surprise now is that Auto-GPT is obviously now a company, being one of the fastest-growing open source projects ever. I think there's a ton of interest in what they're doing, and Toran will be speaking on our stage, I think, for his first time ever as a conference speaker. So that'll be super exciting. I do think that there's a range of agent-type projects. Right? So the open source agents definitely tend to be sort of single-use, I would say. The goal is really trying to optimize for what's the one most impressive thing that you can do in a single, you know, short video recording or Twitter screenshot? And I do think that really is pushing the boundaries in terms of what is possible. And then, obviously, being open source, people can actually go through the source code and copy ideas off of that. And I think a lot of people have done that with both Auto-GPT and BabyAGI. And I think the closed-source agent-type companies like the Adepts, like the Lindys, like the New Computers and the others that people are working on here, they tend to work on very much more mundane but daily-use type of use cases because they're trying to work towards what would you subscribe for, a $20 a month, $100 a month subscription? I think both approaches are valid, and I do think that you need to have representation from both. The amount of human intervention is something that people are trying to get a grip of. Because the way that Auto-GPT does it is they basically ask you for a confirmation step before they do anything. And that's fine, but that's not autonomy. That's just assisted prompting or whatever. What you want is to fire and forget. That is where ultimately these things have to go. Whereas in San Francisco, Cruises and Waymos are now common modes of transportation. And, literally, I just get in a car, I never talk to anyone, and it just brings me to a destination. That's what I want. And we might be permanently, for a while at least, 5 to 10 years away from full self-driving in agents, which is where self-driving was 10 years ago. You know? So I do think this is one of the most speculative areas. But I have seen some of these demos live, because I'm friends with most of the speakers that they're featuring. And I will say they are useful today. So they may look trivial now, but there's active research going on towards making them more and more substantial.
Swyx: (39:52) I'd love to hear more about specifically what use you have been able to get out of any of the agents. I've played around with everything I've been able to get my hands on as well. And I recently called them broadly as a class, still just for fun. Although, I don't think it will stay that way. In fact, if anything, I would say maybe I would put my timeline to AI agents crossing whatever chasm they need to cross, if I understood you correctly, probably shorter than multiple years. I think I'd probably put my money more on, I guess, the Anthropic timeline, if you will. Dario has said in a couple interviews that they're going to...
Nathan Labenz: (40:37) 2 years to AGI, come on.
Swyx: (40:40) Yeah, something along those lines. I mean, I can't confidently rule anything out at this point. I'm not confident that that will happen. It does seem like probably the biggest question in the whole space right now is does the current paradigm get to the ability to do sort of insightful work of the sort that currently humans are the only things that are able to do. You could come short of that though and still have very useful agents, right, that can decompose somewhat complicated tasks and reliably execute on them. There's a lot of different places on this spectrum of AI progress we could zoom in on, but I'd love to hear your take on both, why you think AGI is farther off than that and then coming in a little bit toward AI agents or taking whatever you want. But also on the AI agent side, I would be thinking a year from now, they probably will be working well and reasonably commonplace, but sounds like you're maybe not expecting it to happen that fast.
Nathan Labenz: (41:43) I think this will happen in gradations, and this is one maybe difference between our 2 podcasts. I tend to not discuss AGI. I tend to not discuss timelines even though, obviously, there's some implicit assumption of them in everything that we do. Just because it's so hard to predict the future and it's not falsifiable in any way. So it's just a fun dinner topic conversation. I will say there are some categories which I'm more interested in than others. Right? So the original founding prompt of Auto-GPT was I want to increase my net worth. That kind of category of agents, not interested. That's too general for me. That's too AGI, bro society. But there are very limited-scope agents, which I think are useful today. I have a piece on agents which is relatively popular that I wrote in April. And I said, actually, the most useful agents that I use today is one that has no LLMs in it at all, and that's the SavvyCal or Calendly agents. Right? Because I send you a link, and then you schedule a meeting with me at your own convenience. And if you need to reschedule or cancel, it just happens on your side of the fence, and it happens autonomously. And that is the experience I just want for all my agents. And the fact that we don't have that with LLM-enabled agents is a problem, and we're going to slowly emerge to get there. The second form of agents, which I think have already proven themselves relatively successful, is Code Interpreter or Advanced Data Analysis as it is now known, because it can generate code, run that code, and then use the output of the code to decide whether or not it needs to fix that code or to stop. And that is the beginnings of a loop that is basically required for agents to have some level of autonomy in decision-making. And, if you sort of break that down even further, you need some ability for planning and prioritization. You need a broader set of tools. You need memory to go with it, and then you need ways to interact with the outside world, whether it's generating text or manipulating some kind of UI. I think those are the agents research. For those who are interested, definitely read Lilian Weng from OpenAI. Her blog on agents, I think, is one of the most comprehensive survey overviews of the research in agents to date. So I think some categories of those will emerge. Right? And so one of the people that I forgot to mention, Itamar from CodiumAI, that's an Israeli company that raised an $11 million seed for building sort of coding agents. They only focus on test generation as an agent. To have an agent independently running around generating tests in your code base and then for you to accept or reject them, I think that's a relatively scoped problem that I'm going to be perfectly happy to agree with you. It will probably be useful and commonplace in a year. But things which require a lot more self-driving and have a lot more degrees of freedom to fail, those things I think we'll be waiting for them for quite a while. My last observation is I think you'll be surprised what Lindy and Notion have to show. I can't say more than that.
Swyx: (44:45) Yeah. I can't wait. I feel like you have more insider information than I do, and yet I want to take the bull side of a bet perhaps. We need to maybe think it through to really refine the decision criteria, but I'm trying to zero in on what you think would fail that I think would work at a certain point in the future.
Nathan Labenz: (45:07) Yeah. What do you think I would think would fail? And then maybe we can discuss there. Because I'm positive. I just, I'm much more interested in the engineering challenges than timelines.
Swyx: (45:17) I do think people go well past the point of usefulness on their timeline discussions and their P(doom)s, if you will. Whenever anybody asks me about my P(doom), I always say somewhere between 5-95%. And my quick add-on to that is I'm not sure it's that much worth trying to narrow it down further because 5 percent is enough to be very concerned about it in my mind. If it is 95 percent, then the 5 percent chance of surviving is worth fighting for. So anything in there to me is likely enough to be a problem.
Nathan Labenz: (45:55) 5 percent makes you a doomer, basically.
Swyx: (45:58) Well, the real doomers would go much higher. I don't know that they would have me in their camp as a bonafide doomer with a 5 percent number.
Nathan Labenz: (46:06) Because you're multiplying by infinity, right? So 5 percent rounds to 1.
Swyx: (46:12) Well, it certainly does make it for me the issue of our time. Again, I don't try to zero in on a specific number much more than that. And I'd say same thing for timelines. It seems like things could get really crazy in the next 2 to 3 years. It also seems still pretty plausible that we just hit some sort of top out where it's like, hey, this paradigm closes in on expert performance on a lot of things, but never really achieves that sort of eureka insight capability and we stay there for a while until, you know, something else happens. I definitely think both of those are plausibly true. Going back to the question of where our expectations might differ, I guess the stuff that Lindy has shown would be the kind of thing that I do expect to work in the not-too-distant future. To describe those a little bit, it's text-to-automation is a lot of what the demos are. Flow will post something where he'll have a pretty simple prompt that's like, hey, every time somebody emails me from this domain, check this other thing, draft me a response, put that in a calendar invite, send it here, whatever. And he just says all this stuff and then the platform, just judging from the screenshot that he posts, interprets that, sets up this automation workflow. And in theory, then it's set up, ready to go, almost like you have a Zapier Zap that you've just conjured with 2 sentences. Some of them that he's shown have been reasonably complicated. And I assume that it doesn't always work or he'd probably have launched it by now. But I guess I think that that probably will start to work pretty well in the not-that-distant future. And I guess maybe to refine it a little bit more, it seems like once you can fine-tune GPT-4, probably should be able to make a lot of those things happen. If you can just document the reasoning, the breakdown, the planning. GPT-4 is currently doing everything. If you could really just zero it in on the decomposition and scaffolding of mundane, even if somewhat complicated tasks, feels like that would be enough to me to get it to work pretty well. I don't know. What do you think about that?
Nathan Labenz: (48:40) I think it could work well. We just don't really know, we don't really have benchmarks for planning and prioritization yet. And I've had debates with OpenAI people who are researching this actively. Part of my GPT-4.5 thesis is that long inference is the next frontier of OpenAI. Like, what can you do if you gave it a year to inference instead of a few nanoseconds or milliseconds? Anyway, so I would say that you're exactly right in your mental model of where what Lindy does and how that interacts or overlaps with Zapier. And I do think that that is very useful. It's still not that impressive, I guess, maybe, but to everyone who's ever had an executive assistant or wished for an executive assistant, this is going to approach what you would want as a sort of always-on personal assistant in your life. And I do think that that will unlock a level of productivity and quality of life, to be honest, that we've just never seen before. And the fact that we can pay, you know, dollars for this, I think is useful. For what it's worth, I think what he's working out, it's not just reliability, but also economics and security as well. And I do think that one of the pitches of why AI engineering is a thing is because we, the programmers, are going to be the people who push Shoggoth in a box is what I say. Right? Us fine-tuning things down to smaller models, to domain-specific models is the safe approach. Being able to compose them into usable software systems to put humans first instead of creating sort of potentially ruinous AGI, I do think that that is a movement that I can stand strongly behind and is, nobody's against it, effectively. We all want this to happen. It cannot happen quickly enough. And when it happens, there's no safety concerns, except to the point where you let agents loose on the Internet with no permissioning. And so far, I think everyone, including Flo, who recently had a nice safety discussion turn, I think everyone's being very responsible about it. If you interviewed Connor Leahy, Connor is one of the most safety-minded people. He split from EleutherAI because they weren't safe enough. And he has this very strong criticism of the Auto-GPT team. And I think he's just never met them. All of them are very concerned about safety. And I think it is the engineers who will push Shoggoth in a box. That's my bottom line. And understanding how to wield these tools for human benefit and not human ruin, I think the responsibility lies very heavily on the engineers.
Swyx: (51:25) I think I agree with everything you're saying there. I often describe myself as an adoption accelerationist and a hyperscaling pauser. In other words, let's get the benefits that we can from our current systems, I think we're going to go a little bit farther than current, and that's probably okay. But I'm very nervous about what happens if you put 1,000,000 times more compute into a model than was put into GPT-4 because I just feel like we have zero ability to know. Reminds me a little bit too of Eric Drexler's Comprehensive AI Services vision, just the idea that if you have narrow, perhaps superhuman, but still just fundamentally narrow systems doing all kinds of jobs in a way that can become a buffer for the world against more generalist systems. And as such, it seems really good to try to create these narrow systems that hopefully work really well. Is there something that, going back to the what do we expect to work and not work? What do you expect to not work? Is there something that you would say like, yeah, 2, 3 years from now, I still think, yeah, we're definitely not going to have an AI that you can get to do X.
Nathan Labenz: (52:38) Oh god. That's so, that's so, I have a...Swyx: (52:41) tough time answering this one. So
Nathan Labenz: (52:43) Yeah. Putting upper bounds on these things is very, very hard. Well, one of the reasons I pivoted very, very hard into this thing is I have to sort of update the algorithm. Whenever I've been wrong, I ask what else I could be wrong about, and then I progressively update. It's kind of my sort of Adam optimizer version of momentum based learning. But effectively, I think I'm very skeptical about code agents that do PRs. So there's SweepDev and then there's a bunch of other agent type companies that will promise, you know, file an issue, we'll turn it into a PR. Right? And it always works on very, very simple sort of copy change demos, and it never works on more complicated things. And I do think, basically, the problem of software architecture and translating user requirements into code, I would be happy saying that that's a 5 years out thing. Not an immediate solve thing. I mean, I still think it's a very worthy problem because AI engineer is instead of AI engineer as a job title, but primarily AI engineer. I do think that's a very useful goal and worthwhile goal because engineers are expensive. And if you can pay an AI bot that's a hundred times cheaper than a regular human engineer, but it's maybe 10% of the capabilities, then you'll probably use it for quite a few situations. I just think the whole promise of issue to PR is probably going to be overhyped for a long time just because of the complexity of what software is. You know, I'm very happy to eat my words because there are quite a few friends pursuing this goal, and more power to them. Right? All human progress is based on unreasonable people. I think MetaGPT, I don't know if you've covered that paper, recently. I think Microsoft also had a similar paper come out where effectively your AIs play different roles in a sort of agile scrum process in working towards a common outcome in software. Has it shown better human eval results than GPT-4? But, again, you're fundamentally not describing the full complexity of what software systems want to optimize for, and I don't think we have nearly enough training data to do that. It's very much in the world of intelligent design rather than evolution.
Swyx: (54:58) I think we can maybe refine this a little bit more offline, but I think I would take the other side of that bet. I mean, there's how big of a pull request and how big of a code base. I think obviously for it to be interesting, you'd have to have enough tokens that it would be beyond what a Claude 2 can currently handle. But even so, it does feel to me like, I would say in 2, 3 years, it does feel like I would expect us to get there. And still on the basis of maybe there's no eureka moments. I had an episode not long ago with the MedPalm 2 authors from Google where it's like they're really closing in on expert level medical question answering, the ability to take an x-ray or another scan and provide a radiology report at very close to a human level. 40% of the time, MedPalm 2 x-ray report was preferred to the actual radiologist. So it's still under 50, but it's like, damn, that's getting really close. You know? So if we can get that far already on problems that hard, I do feel like in a 2 year time frame, some pretty complicated kind of prompt to PR type things should also be possible. So we can maybe firm that up and put ourselves on record for a little friendly wager.
Nathan Labenz: (56:23) Look, there's a bit of an undefined thing here, which is how much do you have to specify things in order for a decent PR to come out? And will that prompting effectively approach a new programming language? English is the new hottest new programming language. But to spell things out in some level of detail to which the LMs will get it, are you basically just programming but in a different high level of instruction that is kind of like a domain specific language for LMs? So, yeah, that is open for debate. And I do think some of the approaches, I think I've seen GPT Engineer kind of approach that with their sort of pipeline YAML thing. Does that count? I don't know. Right? Is this kind of pseudo prompting that's pseudo programming already? In my mind, I want a nontechnical person to just manage a software project by themselves, right, just filing issues. I will say that's my bar for issue to PR is solved. Nontechnical person. But if you need a technical person to come up with some interesting YAML file and prompts in specific ways, then it's not your hand is too much on the way you scale it at that point.
Swyx: (57:28) Yeah. Okay. I think that makes sense.
Nathan Labenz: (57:30) I have one more thing to discuss, one more topic, which is probably going to be one of the most talked about talks that is coming up, which is OpenAI just confirmed their talk with me. And we're gonna have the first public DALL-E 3 and GPT-4 Vision talk. I'm pretty excited about that because that is the signal of the finally, we're moving into multimodality for everything. It's gonna be a wild world, and I don't really know how to handle it because most of the people I have on, they're all in the text domain, and we're gonna be very off balance when new modalities come up.
Swyx: (58:04) Yeah. I got a little glimpse of that last week in trying to add a feature to a React app, and it was a fascinating experience. I have never, prior to this session last week, programmed in React at all. I've done JavaScript before, but it's been a while since I've really been into it. And so my first question was, can you explain to me the general structure of a React app? And then it did, and then there were a couple of file types in my repo that I asked it about, and it explained to me what those are and said, oh, well, that means you've got these libraries in place and that's what their purpose is. And then I asked it if it could give me a command that I could use to print out the tree structure of all my files so I could paste that back in so it could see the structure of my project, so it helped me do that. And then basically just kind of worked my way through to a working module. Along the way, because the new image I didn't really have to do this, but along the way, took a couple screenshots, fed those in as well. And I was like, man, this is really amazing. It's kind of understanding what the defect is in the screenshot and modifying the code based on, in part, on the screenshot. It definitely is still at the point where if you look through this transcript, you'd be like, Nathan, you qualify as a technical person here. This is not at the level that you're describing. But you can kind of squint at it and see your way there with these screenshots where I want it to be doing something else, and here's what it is doing. And if it can start to deduce or infer the right fixes based on just the visuals of the app not working, then it does seem like you're not too far away from a nontechnical person. Maybe not as efficiently, but still kind of being able to get, you know, something along the lines of what they want, which today is just unthinkable. So
Nathan Labenz: (1:00:01) For those interested, I think McKay Wrigley has been making a lot of us jealous with the GPT-4 Vision access with some examples of what you can do from the vision to code to the modality. It's not just vision. Right? It's also voice. And part of my update that I did for September, and this is something that Ben Thompson of the Stratechery podcast, I think he's now talking on the Sharp Tech podcast. He had a private demo of both modalities, and I've seen both in person as well. And I kind of agree with this point, which is vision is more impressive individually, but voice is the thing you're gonna use every day, all the time. It's kind of an interesting inversion of expectations because I would count vision as my most important sense out of the 5 senses. But voice, the ability to hear that's hands free and then just always on and understanding what you want. I think that's kind of a thing that I see a lot of people here in San Francisco building with. And so my perception I used to dismiss voice. I was like, oh, Auto-GPT has 11 labs. That's just for fun. Who wants an agent talking back when I can just read the text by myself way faster? But the fact that it's hands free, the fact that it's just always on, and the fact that you can imitate sort of human expressions, I think it's gonna be remarkable. And, I don't know if you've seen have you seen the Russian language teaching demo of GPT-4 voice mode or ChatGPT voice mode?
Swyx: (1:01:55) No, I haven't.
Nathan Labenz: (1:01:56) So Greg Brockman tweeted this, I mean, maybe 2, 3 days ago. Duolingo should not exist anymore. This thing is gonna teach you languages on a personalized basis much better than any other app in existence. Any human teacher just is way more patient and way more forgiving and probably can understand, you know, multiple languages way better than you can. So I think it's gonna be a very, very interesting time for understanding all these new modalities, because I'm sure they're working on more that they haven't told us.
Swyx: (1:02:26) Yeah. So you're kind of the Duolingo shouldn't exist anymore. That is kind of what I wanted to take your temperature on next, because I feel like in some sense, I've seen a version of this movie before where, going back to early Facebook days, for example, I just happened to be in the same dorm as Zuckerberg in college. And I actually thought it seemed pretty dumb at first. I was like, I'm going to go online and put a picture of myself. So I was a late adopter actually of Facebook on campus.
Nathan Labenz: (1:02:57) Fun fact. Mattan Griffel from One Month. He used to be Zuckerberg's CS 50 professor. I don't know if he's ever told you that.
Swyx: (1:03:06) That did not come up, actually. That's fascinating.
Nathan Labenz: (1:03:09) So in the social network, the class that Jesse Eisenberg walks in on and then solves the question, then he walks out. That was him. That was Mark Zuckerberg's CS class.
Swyx: (1:03:21) That's really funny. You know, Facebook initially had this come one, come all, build on our platform. You know, you get access to everybody's friends, everything's better with friends, it's all going to be social. And then it kind of became clear over time that actually we're going to build the most high value use cases natively on the platform and the app developers can kind of get a little bit of stuff around the edges. But today, basically you can log into Facebook and that's really it. There's not a lot of utility in the Facebook platform. I am wondering if we're starting to hit a moment where OpenAI is going to go a similar direction and start to kind of eat these adjacent use cases that are currently other companies. And so companies we've talked about, like HumanLoop, I love the product, we're both fans, but it sure seems like from an OpenAI standpoint, they just kind of released the fine tuning jobs page.
Nathan Labenz: (1:04:11) I was like, well, what are you doing there?
Swyx: (1:04:13) That's starting to look a little like HumanLoop. Right? Then so there's a ton of stuff that they could do there, and it makes total sense. And it's not they wouldn't obviously be doing it because they're hostile to anybody else in the ecosystem, but just because they're asking their users, what do you need? And they're just executing on that. And these are things that people need. And then I'm kind of the same thing might start to happen with, for example, text to speech. We've got some really good companies that have created amazing stuff. I've cloned my voice with both Play.ht and ElevenLabs. They both sound awesome.
Nathan Labenz: (1:04:45) Do you have a favorite? I don't know if you're willing to go on the record.
Swyx: (1:04:49) I think it does depend on use case. I would say ElevenLabs is probably the favorite for ease of use right now, but if I was doing something more artistic or kind of creative or
Nathan Labenz: (1:05:02) I wanted more emotion
Swyx: (1:05:03) Yeah, Play.ht has a lot of diversity.
Nathan Labenz: (1:05:05) Yeah, the creative direction that you can give it is a little bit more open.
Swyx: (1:05:09) So I think if you're analyzing this from kind of a who's OpenAI posing the biggest problem for in that analysis, it would be ElevenLabs probably because OpenAI presumably is going to be more reluctant to create the voice that's going to feed into your horror movie or your shooter video game or whatever where you're going to want kind of a wide range of emotions, at least in the short term. But it seems like quite clear that they're headed for I would imagine, tell me if you think this is off, but it seems like when I think about this Dev Day coming up in a month, I'm like, well, geez, they just launched Voice in the app. They just launched Vision in the app. They just launched this preview of fine tuning. They've got vector database built in natively to ChatGPT Enterprise. It seems like all these things are coming to the API.
Nathan Labenz: (1:05:53) Sorry, I haven't heard about the vector database. What details there?
Swyx: (1:05:57) I don't have a lot of details, but they have said publicly that you're going to be able to connect ChatGPT to your information as an enterprise customer. Basically, what I understand that translates into is you can connect your Google Drive or your Notion or your Dropbox or whatever to their system. I understand they're going to support a number of them out of the box kind of with their own connectors, then they presumably will have some way for people to connect their own random data sources into the system as well. And as a ChatGPT enterprise user, then you can query all your company's Google Drive assets as part of the ChatGPT enterprise experience. I don't know exactly where that is in terms of whether it's launched or not, but they've sort of said it's coming. And that much, I think, is not super secret. Beyond that, I don't have a lot of details. But I guess I just feel like all this stuff is coming to the API. Right? And it seems like they're gonna do this to serve the AI engineer because the AI engineer doesn't really want to have to go piece 5 different services together. Today, if I want to create a voice assistant, I've got to transcribe, send something into the language model, get text back, convert that to audio. And what would be a lot better is if I could just send audio into OpenAI and get audio streaming back at me. Right? And similarly, today, and I've done this and you've done it too. Right? We've got to go select a vector database provider and figure out how we want to chunk our stuff and how it gets loaded in there. And then what's our query strategy? And, you know, that is its own art, you know, to figure out how do we generate the query for the vector database to get the right stuff? Because often the question the user asks and the actual contents that you're trying to find similarity with, those maybe don't line up so well. So there's all these kind of different tricks and techniques. And again, what I really just kind of want is one place to be like, here's my docs, here's my question. I'm putting it in as an image and an audio file. And by the way, I want my response back as streaming audio and just have kind of OpenAI handle everything. Does that seem right? It seems like everything they're gonna kind of just eat all these sort of things around them and the AI engineer benefits. But a lot of these companies, I've had on or a lot of the speakers that you have, as much as they have led, how do they end up not having kind of been the community R&D department for OpenAI?
Nathan Labenz: (1:08:31) Look, every large enough platform eventually comes into this tension with their developers, and OpenAI is just coming into that right now. Apple has this long history of Sherlocking, which is an official term, I think, where some very successful community app gets killed effectively by an official Apple version. And I do think, you know, OpenAI has promised to not compete with its own developers. They have said that, funny enough, in the deleted transcript of the interview that Sam did with Reza in this which is still publicly available. They promised to not compete with developers, but I don't think their promise is ironclad. I don't think they wanna stand behind that.
Swyx: (1:09:17) That reminds me of the Dumb and Dumber scene, not to call anybody dumb, but the you know, these are just as good as money. These are IOUs. And it's like, yeah, that was easy to say at that roundtable, but because they're at a thousand people now and they've got product managers.
Nathan Labenz: (1:09:35) 600.
Swyx: (1:09:36) 600? Oh, okay.
Nathan Labenz: (1:09:38) Yeah. I mean, I happen to have very live stats, so I know. I would say that they always want to focus their people on the bigger things, right, where, you know, there was this recent discussion about the OpenAI phone. This is what I've been calling it, with Jony Ive. Working on a phone, that's big and ambitious. You know? Working on A/B testing, fine tuning infrastructure for the enterprise, not that ambitious. Leave that to the lesser engineers in the community to pursue, and, you know, I think go for the big things and obviously try to solve problems that everybody has. So I would say, I think, yeah, OpenAI does exist in tension with the developers that build on top of it. Probably some of the companies will be Sherlocked. I don't think ElevenLabs would be a threat. The best thing that startups could do building around the sort of foundation model lab ecosystem this is not just OpenAI. It's also Anthropic, all the others. It's just prove that you're a really good engineering team. Because if you're good enough, they'll buy you and pull you into their orbit. Money is free to these people. It doesn't actually matter. Are you choosing to work on interesting problems, and do you ship fast enough that you're making an impact? And I think ElevenLabs has definitely shown that they can do that. And, hopefully, you know, enough of the teams that I'm seeing can do that, but not everyone will make it. Right? There's definitely gonna be collateral damage. OpenAI, you know, has such a huge footprint that it cannot help but realign people. A lot of what LangChain and Guardrails used to do was JSON output validation. Right? And now that's completely useless because of the API, and they just have to roll with it. And I think everyone's completely fine. We all understand we're building on top of shaky foundations, and it's kinda moving, but that's what you get for being first. You know, in terms of the multimodality stuff that you talked about, I do think that will be a focus for Dev Day. I don't know if they'll have the sort of APIs ready by then, but we'll be able to see what you can build with it in the talk that Simon and Logan will be giving. But this comes back to Roon's post on text being the ultimate universal interface. I do think text is sort of the king modality. Yes. Whatever speech synthesis is being done, whatever sort of text to image or image to text stuff that they're doing, everything just kinda just passes through text. And being working on text is the central pipeline, I do think it has some links to it for the time being. Very happy to eat my words. Ultimately, everything is just tokens within a token space, and text has no particular domain of it in there, except that we have much more of an understanding of what high quality data looks like in text than in everything else.
Swyx: (1:12:13) Yeah. It does seem like if I had to guess, I think things are gonna get a lot easier for the AI engineer over the next couple months with just some really integrated kind of high quality things, such that kind of end of this year, you could probably feed in multiple modalities. You could kind of have them manage your custom dataset powering retrieval, but powered by them. And then they can kind of stream out to you text, yes, of course, but audio on top of that, perhaps even images integrated into the responses. And that is going to be quite a difference and quite a simplification. We've got GPT-4 fine tuning as well, which is another huge one because to the degree that you have a use case that you can't get to work well yet with any current model, this could be really the thing that allows it to become possible. Things that OpenAI might launch that wouldn't compete with anybody in the ecosystem, but which might actually just be a win for everyone. One concept that I had there is login with OpenAI. And I think this is potentially really good for a lot of developers if you are struggling with this tension between I want to show off how sweet my product is, but to do that, I'm taking a nontrivial hit in aggregate in terms of token cost. At Waymark, for example, my company, we estimate it's about $0.15 per user that tries the free product in hard cost to us. Some of that is OpenAI language models. Some of it is image understanding. We have some self hosted stuff on Amazon. We try to process all their images that we possibly can to create the best profile we can of the user, etcetera, etcetera. I'd rather that the user could kind of pay that free cost instead of us having to pay it. But obviously, they would need some mechanism for that. Do you like that idea? And how would you riff on that idea? Do you see that as a plausible thing that they might introduce?
Nathan Labenz: (1:14:28) Absolutely. I think it makes sense. I just don't know if it meets their bar for interesting work. I think logging with Google is effectively this equivalent thing for logging with OpenAI. And but also, implementing SSO provider, I think it's relatively commoditized to begin with, and then it gets very complex over time as you're starting to manage nontrivial amounts of user data, which maybe OpenAI doesn't want to get into a business of. It's all more just about choice or alignment with the mission, and I don't know if that helps them build AGI. I think it's a nice quality of life thing for them. Maybe they can get more information as people sort of use sort of universal login with OpenAI. I do think that if they do build this, then they have effectively decided to become the AI cloud. I think they maybe own ai.com. Don't quote me on that. It might be them or Elon. This would make them the fourth major public cloud, which is very exciting, but also maybe not quite a research organization. And I do again, I still don't know if they wanna do that. I do think it's kind of a choice of path, and they're fortunate enough that they have that luxury and capability. There's many things that they could point their engineers at, and I don't know if building SSO for AI is one of them. Could you just kinda offload it to Okta or whatever? And someone else could do that as well. So I will say one thing which I think I like as an idea, which is sort of context that kinda lives with you. Right? And so I think you maybe mentioned this a little bit. That does seem to help, but probably actually Character.ai that does that, which has by far the longest history with working with this. Maybe not so much an OpenAI maxi in this regard. I tend to describe myself as quite an OpenAI maxi with regards to LM progress. But with regards to products, I feel like Character has just kind of earned its stripes in trying to be a chatbot much more than OpenAI has done even though they haven't had, you know, sort of state of the art new models to ship.
Swyx: (1:16:28) I don't necessarily want to be an OpenAI maxi, but it's hard not to be.
Nathan Labenz: (1:16:35) I write a monthly recap. Right? And then my OpenAI news section takes 2 screens, and then my other frontier model, you know, in the frontier model form. I have other frontier model updates, and it's 5 bullet points. It's not a competition. OpenAI is just running away with it so hard. Yeah. I mean, it's just being objective makes you an OpenAI maxi.
Swyx: (1:16:55) Competition right now is just not that deep. We said just before we started recording, we've got to give love to Claude 2. I would say Anthropic has definitely shown that they can create really high quality models that are not qualitatively outclassed by OpenAI, even though they do fall a little bit short on the MMLU benchmark and whatnot. But it's definitely a very good experience to use and something that I do in fact use, especially when I need long context or whatever. But they kind of have a strategy of not leading the market in product releases as far as I know. And so there really aren't that many other competitors out there right now that bring a similarly robust and just useful model forward. I mean, Google is not there. They are maybe gonna get there in the not too distant future, but as of now, models are just not as good. And, you know, who else is there? Right? So
Nathan Labenz: (1:17:55) So I haven't spent enough time on PaLM 2 to actually definitively make that statement. I want to believe in Google. I've met people there. They're very smart. They have all the resources in the world. They just don't have the institutional capacity to ship something interesting for people, unfortunately, so far.
Swyx: (1:18:15) I think the research is not the limiting factor there if I had to guess right now. It's certainly not the compute either. So and the fact that they didn't participate in the last Anthropic fundraise and allowed Amazon to kind of set that.
Nathan Labenz: (1:18:30) Amazon just bid way higher than anyone else is willing to bid. Anthropic's probably worth $10 billion now or something, and Google, I don't think they probably wanna invest in house. I mean, I've talked with people involved in the previous rounds, and they said it was much more hands on. So, Google in no way owns that relationship as strongly as Microsoft and OpenAI were.
Swyx: (1:18:50) My biggest update from that was just that Gemini must be on path to work pretty well because if it weren't, then they would be bidding, you know, as much or more than Amazon. Right? It seemed like they must have had some inside track based on their early relationship, even if it was fairly just kind of friendly, arm's length, whatever. If they didn't have confidence in their upcoming releases, then hard to imagine how they'd let that one go. Nathan Labenz: (1:19:17) Yeah. Who knows? Battle of the Titans at stake. Hopefully, the tell-all books that someone will write in 10 years, maybe Walter Isaacson is still around.
Swyx: (1:19:26) People are wearing their Rewind pendants, so they're capturing everything.
Nathan Labenz: (1:19:30) One of the themes that I featured for my monthly recap is yesterday was the AI Horcrux is what I was saying, or AI pendants, which I saw a demo from Avi Schiffman of Tab yesterday. I've also talked with a number of people also working on similar projects. I would be willing to wear one of those. So I have an Aura on me right now, and it's just logging my heartbeat and whatever, but I'd love to log all my conversations and be able to talk with it. These things are coming. And, yes, they will be multimodal, but also, I think all of us having an effective digital twin that we can talk to and use as at least just a note-taking thing, if not exposed to the wider world, I'm very excited by, and it will be built by the AI engineer. So, yeah, I will feature probably an entire stage next quarter on—the next time I do my summit on these things. Because I think in terms of the B2C frontier, if we're half a form factor that is beyond the phone, it's going to be the Humane clip or the AI pendants or the watch or whatever it is that's just always on you.
Swyx: (1:20:35) Yeah. That definitely seems like it's coming before too long as well. So last thing I wanted to ask you about, and I've been asking everybody about this. Listeners to the show will have heard me pitch it before, so I'll try to do it super briefly. Concept is the AI bundle. And I'll give you just brief inspiration from the consumer side and from the app developer side. Consumer side, pretty obvious. I've got ChatGPT Pro, I've got Claude Pro, I've got Copilot, I've got Replit Ghostwriter, I've got Perplexity Pro, I've got other stuff, Rewind and all these things. My monthly AI bill is getting up to the point where it's at or north of what my cable bill used to be. It notably doesn't cover hundreds of other apps that I might like to use, but they're all another $20 a month or whatever. And I'm like, man, how many of these $20 a month things do I want to buy? So that's the consumer side. On the application developer side, just representing myself as Waymark, we have this easy-to-use video creator. It used to be DIY. Now it's done for you with AI. You tell us what kind of video you want, tell us who you are. We make the video. You just get to sit back and watch it. It's awesome. But we do have this weird tension where, as I said, it costs us 15 cents for our user. We don't want to put it behind a paywall because we want to show it off. People are most likely to buy if they can try it. But then the flip side of that is we have the classic, whatever, $30 a month plan, and we need one person to buy for every 200 free trials just to break even on our tokens, before we cover any other cost or make any money whatsoever. So that's a tough situation because it puts us in this weird thing where we're like, when do
Nathan Labenz: (1:22:21) we gate it? How do we
Swyx: (1:22:22) gate it? And then we also see, of course, the behavior that SaaS app developers hate to see, which is where people do the thing they want to do and then immediately cancel. And there's been increasingly a lot of talk about this. Churn is pretty high. Everybody's getting a lot of new customers. Interest is really high, but churn is also pretty high. On both ends of this, it seems to me like there could be real value in some sort of bundle. And what I want, I think, as a consumer and potentially also as an application developer is something where it's like, hey, pay $100 a month and get 1,000 AI apps. And those would be your flagship apps, like maybe your GPT Pro, and then maybe a bunch of long tail apps like a Waymark. What I figure is if I'm Waymark, which I am, and I'm one of 1,000 apps, and I'm like, okay, look, I know that there's a lot of things going on. Most people don't need us that often. We've got some power users, but we also got a lot of people that just, hey, just one-off here and there. And they buy and they immediately cancel. And it's not because they didn't like the service. It's just I don't want to pay for this every month. So if it was a $100 bundle and we were one of 1,000 apps, and let's just say we got the average, which is 10 cents a month per app subscriber, in the same way that ESPN Classic gets a very small fraction of my cable bundle. I think we would do that. And I think we could still upsell the power users into higher tiers or whatever. But for every million, if that was the economics, every million subscribers that the bundle had would represent $1 million in annual revenue to an app like Waymark, even just getting 10 cents per month per user. Can you see that happening? On the supply and the demand side, it seems like there would be a lot of utility there. Obviously, takes some doing to pull that trick off. But do you see that being something that could make sense? Or if not, where would it break down?
Nathan Labenz: (1:24:28) The closest equivalent would be Setapp. Are you familiar with Setapp?
Swyx: (1:24:32) No, I don't think so.
Nathan Labenz: (1:24:33) So there are existing software bundle type things out there, and they do bundle subscriptions. And I do think that they provide some value for a portion of the population, but I don't know how big this is. People are pursuing the sort of bundle economics play in traditional SaaS, and maybe you can have an AI bundle SaaS play. I would say some of the core questions would be, do you get paid regardless of whether or not people use you? And I think that would very much affect whether or not the extra million comes to you or not. I don't know if you have a quick answer to that one.
Swyx: (1:25:07) Yeah. I mean, I think it could be managed in any number of ways. Obviously, for whoever is managing the bundle, you'd want to have some rules or framework in place to keep it somewhat sane. Just using the cable bundle as the jumping off point. There's certain channels I never watch, and other people don't watch the channels that I watch. And my understanding is they all get a per subscriber share regardless of whether or not I tune in. But then when it comes up for renegotiation, then there's different levels, of course, depending on how popular you are. I imagine something like that, perhaps managed even purely algorithmically could work. I think you'd have to have some kind of baseline. If you're in, you're going to get something from this to make it appealing to the app developers. And I definitely would see it where the actual money flows, I would see shifting around over time with shifting usage patterns.
Nathan Labenz: (1:26:02) Yeah. Exactly. So, I would say there's difficulties there. There's difficulties with regards to the differences between what a bundle of a newsletter or a bundle of content would look like. I would say that the people who have probably thought about this the most is Nathan Baschez, who has now started Lex Page as an independent startup, and Ben Thompson of Stratechery, who's written a lot about the sort of media and cable bundle. So they will be much clearer thinkers on this than I am. And I do like the magic of bundle math. Right? They say you get more choice, you pay one fee, and then the people also have the suppliers also have more predictable income, and the share and access to the income without and sort of pooling that distribution in terms of getting people to pay and then getting people to understand what they are and try them out. I will say for SaaS, the incremental difference between SaaS and content businesses is that SaaS people typically really want to own their customer base. When I want to change pricing, do I have to get your approval as Mr. Bundle Guy? And I'm not going to be happy if you push back on something. And so there I think there's a lot of these sort of internal politics that happen when it comes to sort of B2B. And then pile that on top of, you know, I'm very immersed in the B2B or enterprise infrastructure environments in San Francisco. That's where my background is coming in in terms of developer tools. And all of them are not going after the $10, $20 a month subscription. Right? They want to go after the $10K, $100K, million dollar a year contracts that they want to get from other businesses. So effectively, this is around the year over them.
Swyx: (1:27:47) Yeah. I would not expect your Human Loop kind of infrastructure players probably to get into something like this. Yeah. It would have to be the Waymarks.
Nathan Labenz: (1:27:57) Consumer, B2C. Yeah.
Swyx: (1:27:59) Yeah, all these—there's millions of these. I recently had one of the founders of Gamma on the show and just—it's like, I don't make that many slides, but when I do, it's cool, right? But I feel bad if I'm just using their free version and not paying them anything, but I'm also like, I don't necessarily have hundreds of dollars to drop on every app that I'm interested in trying on an annual basis. So I either sign up and cancel or I just don't sign up. It's probably not going to happen, but something about this feels like it should happen to me, so I can't let go of it just yet.
Nathan Labenz: (1:28:32) Well, the real question is, what is the minimal viable version of this that you can test to find out? Right? And that's something that one of my mentors has always given me as just life and career advice, which is whenever you feel like you have this idea or whenever you're stuck with a fork in the road, what is the minimal viable step to gain more information for you to be sure? Because I think you could sit there and have this, it could work, it could not work, I don't know. The only way is to take steps. And so I don't know how to size this down because there is a certain appeal to scale of these bundles, but maybe get five things together and then see if that works.
Swyx: (1:29:11) Yeah. I think one of the big challenges with this is it does feel to me like you would need anchor products in the bundle to make it compelling. If somebody said, hey, for $10 even, you can have 300 apps you've never heard of, then I'd be like, I'm not sure I need that. But for $100, if I could have ChatGPT Pro and Claude Pro and Perplexity Pro and a bunch of other things that I've never heard of, then I'm like, yeah, I might go for that because those three, I really—you know, I already know I want. And then, you know, whatever else, I'll discover along the way. Anything else you want to talk about today before we break?
Nathan Labenz: (1:29:52) That's a real pleasure. I have just been enjoying the podcast. I would say just keep up what you're doing. There's always room for these sort of long-form deep dives into AI, and I love the passion that you're taking to approach this space. I'd love to have you on at my next conference.
Swyx: (1:30:08) Well, thank you very much. Obviously, the feeling is mutual. Why don't you give us the details of the summit and what people can do if they still want to get a ticket and get there?
Nathan Labenz: (1:30:19) Yeah. I mean, tickets are completely sold out, but you can get an online ticket where it's basically the livestream plus the Slack community discussion at ai.engineer. I really love these short domains, by the way. So just—engineers at TLD say, we bought ai.engineer. So we have the AI Engineer Summit, newsletter, job board. The podcast is latent.space.
Swyx: (1:30:39) Cool. Well, this has been a lot of fun. Swyx, thank you for being part of the Cognitive Revolution.
Nathan Labenz: (1:30:44) Thanks for having me.
Swyx: (1:30:46) It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcr@turpentine.co, or you can DM me on the social media platform of your choice.