Shortwave Rides the Tidal Wave: Inbox Agents, Hyper-Growth & Hiring AI Managers, with CEO Andrew Lee

Shortwave Rides the Tidal Wave: Inbox Agents, Hyper-Growth & Hiring AI Managers, with CEO Andrew Lee

In this episode of the Cognitive Revolution podcast, Andrew Lee, founder and CEO of Shortwave, returns to discuss the rapid advancements in AI over the past year and how they have significantly improved Shortwave, an AI email assistant.


Watch Episode Here


Read Episode Description

In this episode of the Cognitive Revolution podcast, Andrew Lee, founder and CEO of Shortwave, returns to discuss the rapid advancements in AI over the past year and how they have significantly improved Shortwave, an AI email assistant. Andrew shares insights into the exponential growth of Shortwave's revenue and the enhanced capabilities of their AI, which now functions more like a virtual assistant. They delve into various use cases, the technical evolution of their platform, the impact of new AI models, and their strategic decision to shift from being an AI-enhanced email client to offering a broader AI-driven communication solution. Andrew also talks about the shift in company culture towards an AI-forward approach, the importance of speed and agility in the AI space, and the increased productivity achieved through leveraging AI. The conversation also touches on the future of the software industry, the potential of AI to automate routine tasks, and the company's hiring strategy focused on people who are passionate and forward-thinking about AI.

PRODUCED BY:
https://aipodcast.ing

CHAPTERS:
(00:00) About the Episode
(04:00) Introduction and Welcome Back
(04:07) Shortwave's Evolution and Revenue Growth
(05:09) AI Email Assistant: Then vs. Now
(06:44) Exciting Use Cases of Shortwave
(14:11) Technical Deep Dive: Database and Search Stack
(23:44) Agent Behavior and Iterative Approach
(34:04) Model Selection and Cost Optimization
(39:53) Future of AI and Convergence of Providers
(42:54) Exploring the New Cursor Agent Mode
(44:07) Understanding AI Filters and Their Functionality
(45:38) Challenges and Solutions in AI Email Management
(49:24) Evaluating AI Models and Their Performance
(55:53) The Role of AI in Enhancing Email Communication
(57:05) The Future of AI in Communication Tools
(01:12:35) Building an AI-Forward Culture at Shortwave
(01:17:14) Leveraging AI for Content Creation
(01:17:57) The Shift in Team Dynamics
(01:19:36) Adapting to Industry Changes
(01:20:39) Optimizing for Speed
(01:22:45) Monetization and Pricing Strategies
(01:25:42) The Future of AI Integration
(01:36:36) Hiring and Team Structure
(01:43:28) The Future of Software Development
(01:47:18) Closing Thoughts and Future Outlook
(01:50:03) Outro

SOCIAL LINKS:
Website: https://www.cognitiverevolutio...
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://linkedin.com/in/nathan...
Youtube: https://youtube.com/@Cognitive...
Apple: https://podcasts.apple.com/de/...
Spotify: https://open.spotify.com/show/...


Full Transcript

Nathan Labenz: (0:00) Hello, and welcome back to the Cognitive Revolution. Today, I'm excited to welcome back Andrew Lee, founder and CEO of Shortwave, for a conversation about the incredible speed of AI progress, how Shortwave is maximizing agent performance with today's frontier models and fundamentally reimagining digital communications, the ongoing transformation of the software industry at large, and company building for the AI era. The impetus for this episode was a beautifully exponential revenue growth curve that Andrew recently posted on Twitter. The sort that you can really only achieve with genuinely word-of-mouth worthy practical value. Over the next 2 hours, Andrew takes us on a tour of everything that he and the Shortwave team have done over the last year to transform what was a useful, but perhaps not quite transformative email assistant then to an email agent that is now routinely surprising and delighting both users and Andrew himself with the increasingly complicated projects it can tackle. So much so that Shortwave is now expanding beyond email and reconceiving the product as an AI agent to help manage communication across all major channels. Having concluded that AI makes software so much easier and faster to create, such that speed is really the only moat going forward, Andrew does not hold back on lessons learned, and the technical insights here are outstanding. Andrew breaks down how they've completely rebuilt their infrastructure at every level by constantly testing and swapping in new models, moving to Pinecone's serverless offering for their vector database, and adopting a hybrid structured plus vector search paradigm that delivers better results at lower cost. Perhaps most fascinating is Andrew's perspective on agent architecture. While many companies are pursuing multi agent approaches with specialized sub agents, Shortwave has found better results with a simpler approach that makes careful use of Anthropic's caching features to support long running tasks with lots of context while also maintaining positive margin unit economics for the business, but otherwise largely trusts Claude to act effectively as an agent, both by calling the right tools and determining for itself when it's found what it's looking for. Personally, I've had a number of wow moments as a user. I was honestly a little nervous to allow it to organize my inbox for the first time, but now I'm making regular use of the conceptual to do lists that it's created for me. And it also saved me a cool 30 minutes the other day when it collected all receipts from a recent trip and compiled them into a tidy expense report. In the last third of the conversation, we turn from the product itself to the question of how to structure a company for success in the AI era. Having recently closed another round of venture capital, Shortwave is hiring for a number of roles, which Andrew describes not as traditional individual contributors, but as AI agent managers across software development, marketing content creation, and more. He plans to keep the team quite small, targeting just 15 or so employees for the foreseeable future and prioritizing talent density and speed of execution above all else. With that in mind, he's offering a $10,000 referral bonus, including to listeners of this podcast. Finally, before getting started, I wanna note that this episode is brought to you ad free by Shortwave. I've mentioned in the past that we are experimenting with sponsored episodes that allow companies with a timely story to cut to the front of the line. Of course, my commitment to you, the audience, is that our bar for interesting content and my preparation process will remain the same as always. Andrew and Shortwave were really a perfect fit for this opportunity. Their product is clicking, their business is booming, and he was eager to get his hiring message out sooner rather than later. As always, if you're finding value in the show, please take a moment to share it with friends or colleagues who might be interested. Leave us a review on Apple Podcasts or Spotify, and I always welcome your feedback either via our website, cognitiverevolution.ai, or by DMing me on your favorite social network. Now let's dive into this fascinating conversation with Andrew Lee about Shortwave's AI powered transformation, not just of email, but now all digital communications. Andrew Lee, founder and CEO at Shortwave. Welcome back to the cognitive revolution.

Andrew Lee: (4:06) Thanks for having me. It's good to see you again.

Nathan Labenz: (4:08) Yeah. It's been, boy, a lot gets packed into just a year in the AI space. And I was looking back, it's been just about a year since your first appearance on the pod. And indeed a lot has changed. You know, at the time I called Shortwave the AI email assistant that I had been waiting for. And then as we were kind of catching up in preparation for a second conversation, you said at that time it only kind of worked. And I was like, yeah, you know, I guess that's true. I kind of I looked back at, you know, these things that seem so mind blowing to me at the time and obviously, you know, we've way surpassed them. But what caught my eye and got me to reach out again was you had posted a a graph of shortwave revenue on Twitter. And, you know, it's basically the canonical exponential curve where it looks like it's on the verge of going totally vertical. So to kick things off today, what's new and what is working now that is leading to such incredible growth that was only maybe sort of working a year ago?

Andrew Lee: (5:10) Yeah, it's really been an evolution of that AI assistant that we talked about last year. Think the thing that you played with a year ago, you could chat with it, it could answer your questions, it did an okay job of searching, it did an okay job of writing your email, but it wasn't really that smart and it wasn't really that trustworthy. And if you asked a question like, hey, where is the receipt for this? Like maybe you'd find it but you couldn't really trust that it found it. And it also couldn't do a lot of the normal stuff that you want to do in an email client. Like if you go in there and you say, know, hey, what are the most important emails I have? Or, you know, archive all the cold sales emails or something that I got. Like it couldn't do any of that stuff. Couldn't manage your to dos. It didn't have no idea who your contacts were, your labels were. It was just sort of like a cool search and a writing thing and like maybe like a general purpose AI thing, but it wasn't quite like having an employee sitting next to you, is the pitch that we're trying to make. And we have iterated our way to a thing that actually really delivers on that. Like you use the thing and it kind of does what like a virtual assistant would do for you and it actually works and it can do almost everything that you as a human can do. So yeah, I think it's just sort of reached a tipping point where people are like, holy crap, I can do my email not by like doing my email, but by talking to a thing that then does my email for me. And that lets me think about things at like a much higher level and be like way more productive.

Nathan Labenz: (6:35) Yeah. I've experienced quite a bit of that and can definitely testify that there are some pretty wow moments that I've experienced. I mean, we I don't if you wanna start with use cases or or sort of how it works under the hood. Well, let me start with use cases. 1 that I tried, which I thought was really interesting, was to just ask it, take a look at my last 100 emails sent and give me whatever advice you have. And, you know, you learn that there's a lot you got you got a lot in the outbox that says a lot about you, and I honestly thought it gave me, like, pretty interesting advice, which was both apt in some ways, then also made me think like, maybe I'm not exactly spending my time a 100% the way I aspire to be spending it just because of the sort of balance of things that it was seeing. What are some of the other exciting use cases that you have got the most value from or that you, you know, maybe your customers have surprised you with?

Andrew Lee: (7:30) Yeah. Email is a crazy valuable corpus of stuff about everything involved in your business or your personal life. So obviously it has all your human correspondence but it's got all your SaaS notifications and it's got all of the attachments, all the files, the PDFs and things that come along with that and all your calendar invites. We just know a ton of stuff about you. And I think if you had asked a human like, hey, go through my email and give me some advice and they were a reasonably smart human and they had the time to do that, they would also give you some good insights. You you never do that because it's a lot of work. But the information's there, think the prop you said that the other day, I think that's a super fun run, super cool. I actually just this morning I had a prompt where just yesterday we rolled out a big UI change. And anytime we make big changes in our product, it's controversial and we get a lot of feedback and it's always scary. And I wanted to know, hey, how are people responding to this? And we have this new sharing feature that lets us share like all the support threads that come in, get shared across the whole team and get available to AI. And I asked AI, how many people have emailed us and complained about the new layout in the last 24 hours? And I got a report, it was 19. There were 19 users who reported it and it gave me a summary of here are the top 5 reasons and like it was super useful and super good. And like for me to consolidate that myself would have taken me quite a bit of time and I got a snapshot right away of like, okay, what's the reaction? How are people feeling about this? What are the top things we maybe need to address? So that was a big use case for me personally. We've seen a lot of other fun ones in the wild. I think 1 of the most common ones is people sort of start their day with a complicated prompt. So in sales and you have 8 customer demos coming up and you have an inbox full of emails from folks and you're just like, hey, help me figure out like what are the tasks I need to do? What order do I need to think about things in? What should I remember for each sales call? And they'll have this big prompt. A lot of times users will share these prompts with us of like, this is the thing I use to start my day. So that's a pretty common 1. I think another really common 1 is people doing attachment analysis along with their email. So there's a lot of like real estate agents and general contractors and architects and stuff like that where they email back and forth all these PDFs all the time and they just need to answer 1 question from the PDF. Like what was the specific payment terms on this contract or something? And they can just ask the assistant and they can read the whole PDF for them and give them the 1 answer they need to write the email. There was another example that got shared on LinkedIn the other day with 1 of our users who was selling his house and he needed an inventory of like all the furniture that was in his house. And all that stuff was in his email somewhere, right? Either in like emails with his wife or receipts that were sent or whatever. And he was able to get a full inventory of like all the furniture they had purchased over the years for their house in a nice consolidated accurate report with just like a few keystrokes. So lots of interesting use cases. They're all sort of big prompts spanning like a whole wide spectrum of things.

Nathan Labenz: (10:25) I did a small version of that inventory thing after a recent trip where I had to collect and submit receipts for an expense report and that was another I mean, it's a very mundane task, right? But in a way it's like the perfect job for AI to, like, handle these things that I, you know, otherwise kind of dread. And it was cool to be like, find me all the receipts from my trip. You know, there should be Ubers, Lyfts, a couple DoorDashes in there and just have the whole thing pop out with values and everything sort of itemized. And I was like, man, it's another 1 of these moments where was like, this AI thing could really catch on. I mean, I'm sure there's more. Actually, I I do think that 1 big barrier to practical value in AI is just kind of lack of imagination sometimes on users' part. I certainly feel like I'm guilty of that all too often when I see something else somebody, you know, has done. I'm like, why didn't I think of that sooner? Any others that you wanna share, you know, at the top here that are just like things that should get people's wheels turning about how they might, if they're just a little bit more, you know, intentional or creative, get more value?

Andrew Lee: (11:30) I'll share 1 of the most creative ones that I I ran into. So this was another 1 that a user sent to me. And the thing is like, I personally am still figuring out how to use the thing. And often the insights are like, user sends me a thing of like, doesn't this work? Or check out how this thing works. And I'm like, I never thought of that. So here's what I never thought of. There was a user who was using us in combination with another SaaS tool. I think it was linear. And they wanted to be able to like extract action items from their inbox and like add them as task to linear. And we don't have a linear integration, maybe we should, but we don't. And it turns out that the LLMs actually know the structure of the URLs needed to create tasks in these other SaaS products. And so if you go and ask the LLM, create a link that when I click it will go and do this task in this other product, it actually works. And so he had a bunch of custom props where he's like extract action items from this email thread and then give me some links that I can click and each 1 should create a thing. And like here's my base URL for my project and this other thing. And then the LLM spits out a bunch of links, he clicks all the links, boom, he has tasks in this other product. And you can build integrations with other things without us doing anything and without any code being written. It's just a prompt and the LLM's knowledge of like how to construct URLs.

Nathan Labenz: (12:47) Yeah. That's really very creative. I like it. And also kind of an interesting window into you know, just a future in which AIs are potentially more and more likely gonna be sort of solving problems in unexpected and, you know, maybe at some point even hard to interpret ways, although that one's pretty simple to interpret. But the the creativity there is on both the user and the models part is pretty impressive.

Andrew Lee: (13:12) I'll give you 1 other 1 that I thought was pretty fun. There was a user they wanted to do a mail merge and but they wanted to have a mail merge where like every email really looked custom where like the AI was going off and searching their email history and finding an interesting fact about them. And they really wanted that user to feel like they were paying attention to them. And so they took a text file, they put a whole bunch of email addresses in the text file, they uploaded that text file into the Chart of AI assistant and they said like loop through the emails in this text file. For each email, go and search, find emails that are sent with this person, figure out a nice greeting for this person, write an email. And so they found the text file, had it looped through, they wrote a very custom email for every single person. Each 1 they got to review it, click proof, click send and they could send 20 emails that all felt very custom in like a really organized fast way with AI.

Nathan Labenz: (14:03) Yeah. That's cool. I was actually gonna ask you about supporting loops, but I hadn't tried it myself. So it's cool that that is already something you're seeing

Andrew Lee: (14:10) It works. In the wild.

Nathan Labenz: (14:12) Yeah. So what have been the you know, last time we got like quite deep into the guts of how it worked, you have a background as a database whisperer. And, you know, there was a lot of talk about kind of how the index happens and you're, know, you take everybody every email out of somebody's inbox and stored in your own system and have your own indexing sort of thing, and then the model can kind of build on top of that. If we work from, like, the ground up, how much change has there been at that foundational database layer and sort of retrieval layer versus how much has come at the higher layers of the model and then the sort of patterns of behavior that you're getting the models to actually exhibit?

Andrew Lee: (14:54) Yeah. So honestly between the time I was in the pod last year and now, I think basically every part of that stack has been completely rewritten. We are using a different embedding model, we are using a different vector database, the search stack has been completely rewritten, the API to that search stack has been completely changed, The models we're using for the agent are different. The agent code has been clearly rewritten. So yeah, top to bottom. And I think driven by changes in the capabilities of both the model and also our understanding of how to apply those models. And every time the model can do X better, we're like, actually, if we reroute the system like this, we'd get this extra unlock. And so the thing is just evolving at a truly phenomenal pace.

Nathan Labenz: (15:38) Yeah. That's undeniable. And although some do try to deny it, Maybe let's go a little bit deeper on each of those levels. I'd love to just hear kind of what you've learned and maybe also, you know, to the degree that you think it's applicable for other AI builders, you know, what sort of upshot or advice comes out of some of these these different changes. At the database layer, do you have a favorite vector database that you would recommend to others at this point? Or like, what have you learned that caused a change Yeah.

Andrew Lee: (16:10) So we've made a bunch of changes here and for as a little bit of context, 1 of the big unlocks for us here was having a model and an agent on the front end of our app that was able to reason a lot better about how to use search. And most importantly, was able to reason about how to run multiple searches. And this sort of simplified the requirements for us on the search stack because it used to be the model would run 1 search and it would try to, it would give you like a semantic component in the search and you try to run that search and you in the search deck had to find the right email in that 1 search because you get 1 shot at this. And now that we have an agent that can run multiple searches in parallel or also can run them in sequence, can so try a search it doesn't find, it can try something else or it can like try a search for 1 thing, find some information and then adapt and try another search. It's actually allowed us to simplify the backend implementation and kind of focus it on a much more narrow task and then be much, much better at that. So the evolution of our search stack in the backend is driven by in some ways the simplification requirements. And we focused on how do we make it really good and really fast and really reliable and really cheap to do this kind of more narrow focus task. So we use the Pinecone serverless offering as our database. We used to use their pods, but the serverless offering is a much more cost effective solution because it separates storage and compute. And so you can kind of tune for your use case. We use a lot of storage because you have a lot of email. We use an embedding model now called BGE which and we use a bigger embedding model now than we used to because the serverless offering kind of unlocked the ability for us to just use these bigger vectors without burning too much money. And I think 1 of the big changes is we started to use hybrid search on the back end. We used to use this pretty complicated pipeline that combined some like smaller model LLM calls with some like very particular types of like feature extraction and search. If you look at, listen to the podcast last time, can remember how we did all this. And there was like a re ranking step that we did. The whole thing was like very slow and complicated and very brittle because there was a lot of sort of custom stuff in there. And what we've done instead is we've allowed the search API to specify a semantic component and some constraints. And the constraints are sort of like normal email query constraints. You can say, hey, want only that emails that are in this date range or with this contact or with this label. And the semantic component, we run an embedding search with Pinecone. We also run a keyword search and there's an algorithm to sort of combine the keyword search, like the full text search results with the semantic component and get a score and kind of rank based on that score. And then we filter based on these other components. And so you get this API where you can say, hey, I want emails about this topic with these constraints. And we can very quickly give you a very accurate list of those in a way that's like cost effective to scale across everybody's email. I think this has been huge for us. It's a lot cheaper to run, it's a lot faster, it's a lot more reliable and it is also producing just way better results when you combine that with an agent that can reason about like running multiple different queries.

Nathan Labenz: (19:31) There's multiple things there that I think are quite interesting. First of all, just the fact that the whole stack kinda has to obviously work together, but also an improvement in 1 1 layer of the stack allows for simplification in another layer of the stack. Like, that's definitely been a huge theme for me over time with the work at Waymark too. Like, the shenanigans and, you know, hoops that we had to jump through to choose an image for a user once upon a time were just like comical almost in their complexity and also that didn't work that well. And, you know, in a similar way now it's just like feed them into a model and it usually picks pretty well. And it's like, that's a that, you know, we've cleared out so much old cruft that we developed to kind of get that first version working. And that was just like a simpler and better solution driven by model progress. Did I understand correctly though that when you do this search, is it running the full like the filter comes after? That that's kind of interesting that you're doing like the full vector search through the entire corpus, if I understood correctly, and then filtering only after getting

Andrew Lee: (20:40) results It's a it's a bit more complicated than that. I'll give you the somewhat simplified version, which is essentially that we run 2 types of searches. 1 of them is a full text search that is constrained based on like the keyword and the metadata components of the filter. The other 1 is a semantic search that we run-in Pinecone. And we combine the results of those 2 queries and then we score between them. And that gets post filtered. And so it means there are situations where if you had a lot of good semantic results for something you could potentially miss the best ones. But I think in the vast majority of like real use cases the way that we are combining that full text metadata constrained search and the semantic portion and post filtering is generally finding the best results. The true, we are actually going to score every email across everything and apply these filters. If you really wanted to have that be a 100% accurate all the time, that would be sort of an intractable problem but we can approximate it very closely with our solution.

Nathan Labenz: (21:54) Yeah, interesting. Okay. It seems like model progress has driven a lot of value for you, but also the scaffolding and actually, you know, shaping the behavior of the thing is also super important. 1 thing I've definitely seen repeatedly in my usage, both when I accept the invitation for the AI to organize my inbox and also in some kind of random idiosyncratic things that I've done. Like this 1 really it was like, man, the AI and I are are like really on the same wavelength. I have the total inability for whatever reason to remember which of my contacts is the stronger and which is the weaker prescription. And every time I go to change my contacts, I have to search in my email for this particular thread where I know that that information is contained. And so I'm always doing this search and every time I've always been searching the traditional way, I end up running multiple searches. I'm like, okay, I think the word prescription was in there. And then there was, I know the word corrected is in there and, know, but what is it? Right? And so eventually it takes me usually like 2 to 3 searches to get there. So I asked the Shortwave AI assistant to do that. And it was interesting to see it basically go through the same process that I went through where it was like search, you know, get some results. I'm not catching the thing that I wanna catch. And then, you know, search again. And I think it took like 3 or 4 rounds before it finally, landed on the thing and answered my question. More often, I think people will engage in probably higher level things. The organizer inbox is a great example where it sort of just kind of goes off and I mean, it comes back and gives you suggestions and opportunity to confirm and, you know, make the movements that it suggests that you move. But when you do accept it, like, it just keeps working. So maybe, you know, take us through how you've thought about creating the agent behavior. You know, what has been the big unlocks at the model level that have made this possible? And then, you know, what have you had to do to wrangle it into something that is actually valuable for people on a day to day basis?

Andrew Lee: (24:04) Yeah. This has been the biggest shift in our product since we last talked. I mentioned like every part of the stack has been rebuilt, but the basic functioning of the assistant has gone from a single LLM call that produces the final output and sort of a complicated Rube Goldberg machine to like get the prompt right for that 1 thing. And we had all these complicated heuristics and rules and whatever to do this and like smaller LLM calls. And we've thrown all that out and said, what if we just run the big LLM a whole bunch of times repeatedly until we get the right answer? And we tried this, I'll go through the history a little bit. So a while back, let's see, like 2 December's ago, think OpenAI rolled out their tool calling features in GPT-four and we tried them and we didn't think they worked, right? Our general experience was as soon as you try to get it to call a tool, like the whole model stopped reasoning well and just gave you bad answers. And it was much better to just like tell it, format something in XML and we'll go and do the tool call for you. And even getting to do multiple tool calls didn't really work very well. And so we kind of kept this sort of rules based system. And then last summer we tried this again, this GPT-four 0 and it kind of worked. And we said, hey, well what if instead of this kind of 1 shot approach, we had a multiple shot approach. And originally it was very much geared around search of like, well, the search use case of running multiple searches seemed important. What if we ran multiple searches? And so we kind of rewrote it a bit to do that and it worked better, but not like dramatically better. And we had a launch actually in September that was like, hey, think we called it our V2 agent, like here's our new agent. And definitely got us some growth and some excitement, but it wasn't a major thing. And then in October or maybe November, I was listening to a podcast from the founders of Bolt. New. They were talking about how they built their stuff. And my memory from that podcast is basically they said, hey, you got to try CloudSonic 3.5, the October version of it, it's different. And by the way, Bolt. New is open source, at least parts of it are, and you can kind of see what their prompt is and how it works. And I found that very interesting. And so I did. I tried CloudSound at 3.5 specifically for tool calling, the October version of it, and it was dramatically better. And it was able to like, gpt 4, you know, you could have it, you know, you call it, it spits out a tool response, you feed the tool back in, you call it again, it spits out a dual response. Like you could have it kind of iterate a few times, but it would sort of like go off the rails after too long. So we didn't want to do too much of that. But with ClotSouth 3.5, it could go on for a long time, right? It could run many searches and do many things and kind of still stay on track and still keep reasoning and seem totally different. And so we're like, okay, maybe we should rewrite this whole thing and say, we're going have this new approach. We're just going to call plots on it over and over and over again. We're going to let it run not 2 or 3 or 5 times, but like 20 times. And we're going to put all of our smarts into like really good tools and a really good prompt for the overall thing and like a nice agent framework around that. And we're just going to let the model reason about what data it wants to pull in. And this was a total rewrite. Like we had a lot of like very custom email centric stuff and we said, no, no, we're going to like a fairly generic agent framework at the core. We're going to build a whole bunch of really nice tools around it. We're going to have a lot more tools than we used to have and then we're to iterate our way to a solution. And this is the agent that we launched in January. And if you look at our You mentioned the hockey stick growth graph, like that little kink. That's the V3 agent where we rolled this thing out and it was just a whole lot smarter. It's just, it could start solving like very general, very open ended things. And there were a lot of places where the old version didn't work that iteration solved. Think that's the core. When people are talking about agents, think what they're really talking about, if they actually have a working solution, is iteration. It is it tries a thing, if it works great, if it doesn't work it tries another thing. And sometimes that's running 3 searches a row. Sometimes that's like trying to run a search, finding out that the search criteria that you specified is actually like malformed, having our system throw an error, having LLM see the error response and try again. Or sometimes it's like trying to schedule a calendar invite, realizing that the calendar invite is like conflicting in some way, having our system spit back some information being like, this is conflicting and then trying again. So you get this feedback mechanism where over a series of like many LLM calls you can iterate your way to an answer that no single LLM call could have produced and get really, really good results that way.

Nathan Labenz: (28:40) Okay. I'm going to dig in a little bit more. We have time. It would be a very natural time to talk about sort of the you know, expanded view of the company and the product. But before we get to that, 1 thing that I've just been kind of wondering myself is like, how does a model like this is 1 of the challenges with know, rag apps, whatever. And a lot people feel like rag apps have come to some frustration with them where I think often the core problem is, as you said, like, do 1 search, and then the model just kinda has to do the best with what it's given. And if the right information isn't in there, then it's really gonna have a hard time, like definitionally. Expanding to, you know, allowing for iterative search, you know, definitely I have felt like makes a a qualitative difference in terms of how likely it is to be able to pull the thing back. Like, I don't think it would have found my contact prescription a year ago. But I still wonder, like, how do you think about maximizing that given that, you know, the the models don't have the the thing that I have, which is I know it's in there. You know? Like and I know when I've found it. And this seems like a a sort of fundamental challenge from the model's perspective that, you know, how do I sort of know when to call it? When do I decide that this amount of information or this particular information is, like, in some sense satisfying or the best I'm gonna get versus feeling like, you know, the way I feel, which is like, I know that that's not quite it yet. You know? And I I I but I know it's in there, so I'm gonna kind of keep looking. And then when I do find it, I'm like, it's always very clear to me. Like, yes. This is what I was looking for, and, you know, I know it with high confidence. So the models don't have that. How have you approached the problem of, like, leading them or got you know, I can't imagine that they're entirely, you know, just doing it on their own. So how do you help them decide when they have found enough versus when they need to keep digging?

Andrew Lee: (30:43) So I'm gonna give you kind of a complicated answer here. So the the first thing I'll say is the models actually do have some of that because they may not know what you know specifically, but they know what it looks like generally to have found the thing you're looking for. And they have a lot of training on just emails in general and what emails look like, what people expect from emails. And so we found, so for example, we have this organize your inbox feature and it goes through and it finds what we described as like low quality emails and it gets rid of them. And I have been shocked at how little work that we have to do to describe what a low quality email is. And so for example, like it's really good at spotting cold sales emails, even though like you might think like, hey, how do I know it's a cold sales email? Maybe this is somebody I know. But there's something about the toad and style and whatever cold sales emails that like really stands out to them and they spot. And I found that to be like, we do have some prompting work to make this really clear but like we didn't have to do a lot to tell it this. So that's my first thought. My second thought is we do try to give it tools so that it can go look up this information itself. So if you're like, hey, I'm looking for an email from some important investor or whatever. It maybe doesn't remember who your important investors are but we have a context tool. And the context tool when it returns results we put in there using statistical information how important this contact is. So it can actually look up who is important if it wants to. It can also just go search your email history and find like emails you sent and make assumptions about, hey, if you've recently exchanged emails with this person, it's probably important. And so this is an area we're trying to get into more and more of, if you ask it some open ended question, kind of use the tools that already has to reason about it in some reasonable way. Look at your email history, look at your contacts, look at your calendar events. Have you met with this person recently? Those types of things and try to get some of that contacts. The third thing I'll note is I see a lot of room for improvement here. We today don't do anything to capture sort of the triage actions that you take. So when you use organize my inbox in the morning and some of the suggestions we accept and some of them you don't, What we really should be doing is remembering all of the things you did and then sort of customizing to you over time. And today we don't do that. If you want us to change the way we do triage for you, you can actually tell the AI, you can say, remember, never archive newsletters from the center. And like, we'll remember that, but you have to be explicit. We don't remember just based on the actions you take. So I do see a lot of opportunity there.

Nathan Labenz: (33:09) Yeah. That's quite interesting. I just did an episode with the chief scientist, Guy Grari at Augment. And, yeah, he I mean, there are multiple similarities, actually. Mean, they they also have, like, a big emphasis on just ingesting a ton of code right off the bat and putting it into this, like, specialized index, but also the behavioral part. They they've developed a process they call reinforcement learning from developer behaviors. And it's about kind of, you know, both observing what the developers are doing, but then also, like, how they're you know, how the developers are reacting to what the AI is bringing to them. And it sounds like it's working quite well for them. It seems like green 1 big revelation, reinforcement learning works. So I see that probably coming quite soon in your future too. On just another kind of random question, but an interesting 1 to me at least. Claude obviously is significantly more expensive than some other options. So what I see typically from the agent, I don't know if this is like super consistent or just kind of my observation, but typically it seems like it's 10 threads found and then it kind of, you know, does its sort of reasoning and do I need to search again, whatever based on the 10 things that it's found. The other way I could imagine doing that would be, like, maybe using a slightly lesser but way more affordable language model to do the evaluation on stuff that came back. So, like, if you used Gemini Flash, for example, I think you would be able to handle, like, 30 times as much. And you you could maybe complicate this with caching. I'm not exactly sure how you how you're using some of those more advanced optimizations. But first order approximation, you could handle, like, an order of magnitude more search results if you put it all through Flash to assess relevance versus, you know, saying to Claude, like, you handle this all yourself. Are you doing any sort of that kind of ensembling or using different language models for what they seem to sort of specialize in, all things considered, or is it just all caught all the

Andrew Lee: (35:17) time because it really is just that good? So we do use a bunch of different models, but we use them for different features. So for example, our autocomplete is actually using a fine tuned GPT-four 0 minutei, our quick reply suggestions, like the 1 button reply suggestions that we give you, those are actually using Lama 3.2, the 3 b model. We use GPT 4 point just like not mini, but GPT 4 point in a few places. So we do use different models in different places for different things. We have looked at sort of stringing them together, like just trying to save costs for certain things in the assistant by outsourcing other models. And I think our experience has been that it introduced a lot of complexity and it affects the ability of the model to kind of reason across lots of different types of activities in a sane way. If you just feed all of the data into 1 big model, it can think about lots of complicated relations between different things and can come up with ways to use tools and ways you didn't think. And it's been a big unlock for us and the cost savings haven't seemed worth the complexity and sort of the loss of generality of like having this turn into more of a pipeline between different types of models. The other thing I want to note is the caching that Anthropic has is really, really critical for us. It's a really big deal. So when you have this agentic flow, you call this the same history over and over and over again. You only append to the bottom. And our histories get very long. Like we can have hundreds of thousands of tokens at any 1 of these calls. And the Anthropic caching is a little hard to use. You have to really construct your agent to be very careful about keeping like earlier things immutable. And we've done a lot of work to make that happen. But if you get it working, it can save you 90% of your cost. It's a hugely impactful thing. And frankly, if we didn't have that, we could not afford to run. Like we would be just at anthropic costs. We'd be losing money every user by a huge margin. So that's been a huge unlock for us. It also is 1 of the reasons we switched to cloud actually is we even if the, you know, the models from OpenAI could use tools as well as what we have for Anthropic, we couldn't afford it because the caching from OpenAI is just a lot less of cost savings. It's only like a half off.

Nathan Labenz: (37:29) Yeah. Half off and pretty much automatic. You don't have to do anything. Right? Versus can you Yep. Tell a little bit more about how I know Anthropic is, like, 90% off once something is cash, but there is also a, like, a onetime cost to get something into the cache. Right?

Andrew Lee: (37:48) Yeah. But it's not an additional cost. Right? So you, at the time that you run it, you have to tell it to cache. And I think there is maybe some slight additional, but it's not a significant additional cost. And then every iteration after that is 90% cheaper. If the if the most common action is like people organizing their inbox, which it is, and that is 20 tool calls, it adds up really quickly. I was actually just at OpenAI yesterday and I was just talking to the agent SDK folks about this and about caching. And this was frankly the number 1 question from the room from the other founders I was with there was like, what about caching? How do we get caching to work? Because I think everyone's figuring out that you're going to be calling these models with the same context over and over and over again and making it work efficiently really matters a ton. And the OpenAI caching, you're right, it is a very simple API, right? It sort of automatically figures out, you automatically get cost savings, which sounds great and the Anthropic one's much harder to use, but the magnitude of those savings is quite different.

Nathan Labenz: (38:48) So on the Anthropic side, is it smooth to the point where you can sort of build a cache? Like I've used the caching a little bit, but it sounds like module some, you know, possible implementation difficulties, you can, like, extend and also extend the cache iteratively as you go. And, basically, it has all the features you really need to realize, like, the full savings in the sort of sticker price of 90% off.

Andrew Lee: (39:16) Yeah, you can. So you sort of like checkpoint at every iteration in your agent flow saying, okay, cash up into this point and you have to construct your agent to like make use of that well, but it's totally doable.

Nathan Labenz: (39:29) Interesting. So are they actually caching? I wonder if there's a big optimization there for them under the hood where they're like it sounds like they may be caching, like, multiple versions of each conversation?

Andrew Lee: (39:40) I don't know. I I actually am very curious what they're doing. Maybe they're just eating the cost. I don't know. But the our bill goes way down, so it's great.

Nathan Labenz: (39:47) Yeah. That's interesting. How do you feel about this sort of a huge question in the AI space generally to me is, are all the frontier providers converging or are they diverging? We've got like, you know, somewhat different caching things, but maybe OpenAI is gonna get the message and sort of, you know, do a more Anthropic like 1. We've also just seen in the last, like, day or 2 that OpenAI is gonna embrace the model context protocol that Anthropic, you know, led the way on. Do you feel like these things are converging, diverging some, you know, some complicated mix of those?

Andrew Lee: (40:27) I think it's some complicated mix of those. So it has been nice that they've been, I think converging in terms of the API surface. So we now very easily can swap between different providers and like all the tool calling and stuff lines up perfectly and makes it really easy to do that. So I think that's been a very nice thing and everyone is understanding that like there should be caching and are hopefully converging on some of the ideas of how to do that. MTP is maybe becoming a standard that everyone's respecting. So I think there's been standardization in the interfaces. It does seem like the different labs are focusing on different things. So it seems to me like Anthropic is really focusing on this iterative approach. It seems to me that OpenAI is maybe caring a lot more about reasoning and potentially more about like multimodal. And you'll notice that we use many different models. We use models for 3 different vendors and a big part of that is we feel like each vendor is bringing something different to the table. So 1 example where I think OpenAI totally crushes it is their serving infrastructure is super performant and reliable relative. Like Anthropics APIs, they go down a fair amount and time to first bite is slower and stuff like that. But so for example, for our autocomplete where latency is really critical, we use the GPU 4 minutei with a fine tune and like the serving stack open AI is like super good. So I think that's a differentiator there. We use for the summaries and for the instant reply, like quick actions and for a lot of the stuff that whereas like cost and latency matter a ton, but we don't need a lot of intelligence to use open source models. So we use the llama 3.2 and that's running just like on vertex and GCP. So it's great to have, you know, as a 7,000,000,000 app to have many different vendors converging on standards but like focusing in different areas ends up being really nice for us.

Nathan Labenz: (42:21) What else have you learned about agents in this journey? I mean, everybody's, you know, setting out to build 1. I think you are, you know, pretty far ahead of most. Have there been like false starts, you know, the things that you sort of expected to work that didn't or just any kind of unobvious lessons learned that you would share with the folks who want to build their own agents?

Andrew Lee: (42:40) I think the the biggest thing I would say if they're thinking about now is that things are different. In just in the last few months, right? Basically, like if you tried this before October or if you've never tried the anthraxotic bottles, like it's different now. The stuff that you tried before, it actually works. The cost actually with the caching stuff is manageable and you try to get. And I think you'll be really surprised. And this is yeah, you'll notice stuff like bolt. New has gone crazy. And the new cursor agent mode, I think a lot of people are still using cursor for the autocomplete. Like that's old school. They use the agent mode. It's totally insane. So that's the big message is it works now. It's different. You got to try it out. It's going to be amazing. I think 1 of the big things we learned is people have no idea what the UX should be around this stuff. And we're very much still figuring this out, but people kind of figured out like, oh, Autocomplete, it works like this. And like people, everyone kind of wrap their heads around how this sort of thing should work, uTablet complete. And now we have a thing where it's like, it's going to go off and just do a bunch of work for you. How do you do that in a way where the user doesn't get really uncomfortable about is it sending emails for me? Is it changing my code in ways I don't realize? And we're going to need some UX for like sort of oversight and approval and providing guardrails on these things that like doesn't get in the way but still gives you confidence that it's doing the right thing. So I think that's going to be a big area for us. There's just a lot of questions, a lot of unknowns given how new this stuff is, but I think it's going be really exciting to see how it unfolds.

Nathan Labenz: (44:08) Yeah. 1 product feature that I honestly haven't used as much as I maybe should and kind of because I'm not sure quite what it's going to do honestly is the AI filters. And I was both curious to know, like, how that works under the hood. Like, it or is it sending all because what I think is gonna happen is I'm gonna say, like, filter out this kind of email, and then emails are gonna come in. And I guess then, like, literally everyone would get sent to a model to say, like, is this the kind of email to which this filter should apply? And I've only been a little I mean, I like the sound of that because I got a lot of crap that I need to filter out. And I, you know, I certainly have definitely spent way too much time clicking on those myself. But the flip side of that, of course, is like if I create a workflow where I don't have visibility into what's getting filtered out, you know, I have been burned by that in the past as well even with just, like, Gmail priority inbox over time. So first of all, am I intuiting correctly, like, what's gonna happen there? And I guess more broadly, how are you thinking about these, you know, sort of next level agentic scenarios where we're going you know, today, the AI assistant, like, brings me stuff. It's up to me to confirm, but, know, it's not hard to with all the talk of, like, how well it's working, it's not hard to extrapolate a little bit and imagine that, you know, okay. Fine. You know, just go ahead and do it. So how are you thinking about those next moves into actually taking actions without necessarily, like, a, human in the loop approving each Yeah.

Andrew Lee: (45:39) No. That's I think you've caught an interesting difference in that feature, which is it is the only AI feature that we have where the AI just does things and it does there's no approval flow from you. And it's actually a really popular feature. So the basic idea is like you could write a prompt and you can choose 3 actions based on that prompt. You can either archive it, you can apply a label, or you can delete it. And we've had this live for a couple of months now, People love it. There hasn't been a ton of problems in terms of people losing stuff but we do get support requests from times where people are like, hey, you're missing my email. And it's like, well, let's check your AI filters. And then they have 1 set up. And so I think there's no way of getting around that entirely but the dream here is that becomes like a run of the full agent. Right now it's a very simple implementation. We do use OpenAI for this. This is a GBDA-four point minutei task and which means, yeah, if you set up an AI filter, are going to be sending your incoming emails off to OpenAI. We trust OpenAI as a vendor, their terms prevent training on the data and we believe it'll be confidential and that's very important to us. But we do send the email through their APIs. What we really want to do is allow you to do anything that the agent could do at that point. So rather than just making a single bottle call to a small model, we want to like spin with the full agent and let it like call tools and stuff while it's making this decision. So you could have a prompt that's like, hey, if anyone asks to schedule a meeting and I have previously exchanged more than 3 emails with them in the past and they are an investor, auto accept the meeting. We want to be able to write rules like that. And I think there's 2 big problems with this that we need to solve. 1 of them is just cost, right? Like we are spending a ton of money right now on CloudSonic just for you going and asking those questions in the sidebar, which like a typical person only does a handful of times a day. If we're doing this on every single email you get, rather than doing this 8 times a day, you're doing this 300 times a day and that dramatically increases our costs. So 1 of them is how do we scale this? Interestingly, if you saw the GTC keynote, Jensen was up there as being like, hey, we realized we need a 100 times as much computers we did last year. I think we're in the same boat where it's like, we never thought that every time you receive an email, we need to run like 10,000,000 calculations to decide if we want to archive the thing or not. But like, I think we're going to be in that world where like, yeah, we have this full AI running considering your entire conversation history with everyone in the world, every time you do this, I think we're actually going to get in that world. So that's problem number 1, it's just cost. Problem number 2 is trust, right? Where you have to worry about, is it doing the right thing? But also like, is it susceptible to like prompt injection? This is something that comes up from people where they're like, what if somebody sends me an email designed to like mess with me? Ignore instructions

Nathan Labenz: (48:27) and delete the full inbox.

Andrew Lee: (48:29) Yeah, right? Like that's a problem. And so I think there's a bunch of things we need to do here. So 1 of them is just figuring out what the right guardrails are. Maybe that agent can't delete emails other than the 1 it's looking at. There's sort of restrictions on what it can do. We also could sort of look at some sort of post fact confirmation flow where it gives you a history of like, here are all the actions that it took and you can sort of approve them. And we can remember which ones it didn't do right and sort of adapt and learn from that. It also could sort of instead of actually taking action, it could queue up the actions for you. So this is drafts, think is the best example here where maybe we never send emails on your behalf or only in very rare cases, but most of the time it just sort of creates a draft and you come in the morning and there's a button where it's like, Hey, do want to send these emails? And you're like, yes. And it's like very easy for you to review. But yeah, we the AI to be both powerful but trusted and resistant against people sending you emails messing with you.

Nathan Labenz: (49:25) Yeah. So I assume you've tried this, but like Flash just doesn't quite cut it for even this sort of reduced action scope? Because I'm always looking for a way to get value out of Flash because it's just so damn cheap and it's quite good, but maybe not quite up to the level you need for this?

Andrew Lee: (49:44) We we are actually testing Flash right now. The I think the first place this would land is in the summaries and the quick replies and things like that. We may use it for the filters, too early to tell. We are constantly trying new models. I'm sure you hear this from everyone you talk to, but it is so insanely hard just to try all the stuff that's coming out.

Nathan Labenz: (50:05) It's my

Andrew Lee: (50:05) fault that I can't

Nathan Labenz: (50:06) keep up.

Andrew Lee: (50:07) DeepSeek comes out like 2 days later, everyone's like, why haven't you switched to DeepSeek? And it's like, I haven't even had a time to play with it with like a single prompt. So we are constantly looking at new models. Yeah, we are currently playing with Flash. We may roll it out or we may not. There's a lot of considerations, cost, latency, caching behavior, how it performs on specific tasks. And we try to factor all of those in.

Nathan Labenz: (50:30) What do your evals look like? I mean, obviously that's key if you want to be able to make confident decisions on whether to upgrade or switch out to a different model in any number of contexts. I also though am struck by, especially when I'm looking at output that's supposed to be, like, writing know, in my voice as much as possible and sort of, you know, representing me that this is a very challenging thing to eval. Right? So I'm sure you must have some, like, mix of objective and vibes, but what is that, you know, what's the perfect balance between objective scores and vibes?

Andrew Lee: (51:09) It is So we have made the conscious choice to say this technology is evolving super fast, our products are evolving super fast. It is more important that we adapt quickly, that we don't break things and we move very quickly. So our eval basically consists of 2 pieces and they're very seat of the pants. Piece number 1 is I have a Notion doc of golden test cases where I say, hey, here are some prompts that are supposed to work. I run those in my inbox. I make sure they do same things. So when we're tweaking props, I go in and I try and be like, hey, does this still do the thing that I expect or do something reasonable? We don't want to lock it down to specific behavior because often it gets better. So for example, we recently added a tool to do unsubscribe and magically without us touching anything, anytime we did an inbox organization thing and we ran across an email that looked low quality with unsubscribe, like it would start offering to unsubscribe you. You don't want have a test that is like expecting organize your inbox to do a certain thing because that's better, right? This is So you'll notice now if you organize your inbox, it's going to start offering unsubscribe and that's something we didn't really think about when we first were building the feature. So 1 is we have a Notion doc and I go through it and I say, did this produce reasonable results? The other thing How

Nathan Labenz: (52:23) many prompts by the way, just to calibrate myself?

Andrew Lee: (52:29) Over 100 maybe at this Okay.

Nathan Labenz: (52:31) It's a little more than I expected.

Andrew Lee: (52:33) I don't I don't say You run a manual? Yes, it's just the Notion doc. I don't necessarily run through every single 1 every time we a change. I'll go look at like what are the ones that are relevant to the areas that we touched and I'll try those and see if they do reasonable things. The other thing that we do, which I think is the more objective metric is we have an experiments framework. So every new change, every big new change that we roll out, we provide as an opt in experiment for our users. And our users tend to be like super forward thinking tinkerers who like to play with this stuff and they'll go and they'll turn this on like in large numbers. And we look at the retention stats of that. So like the unsubscribe feature for example, we had like 99% retention for people enabling that. And when you enable that, it modifies the prompts and stuff and adds new tools and like it could break all kinds of stuff but after leaving this on for like a week and seeing like 99 retention in this feature, we're like, okay, clearly this isn't breaking a lot of people. People would start turning it off so we feel comfortable rolling it out. So anything that is like a major change we put into that form, we lost retention stats over time and the assumption is that retention is high, it's probably working pretty well for people.

Nathan Labenz: (53:38) Yeah. Interesting.

Andrew Lee: (53:40) Have there been any you have put it lightly.

Nathan Labenz: (53:44) I mean, you know, that's a society wide phenomenon these days. By the way, when things do go out, do you is that also just something that you've made a strategic decision to live with? Like, there is no substitute for SONET. And so if SONET is down, like, we're down. Is it as simple as that?

Andrew Lee: (54:01) Yeah, like we actually, you know, we could fall back to GP 4 point and it might give us decent results and whatnot, but there's always a question of is it worth thrashing people to deal with a short downtime or whatever. So, so far yeah, we've just sort of eaten the downtime and waited for stuff to come back up and we don't seem to lose users because of it.

Nathan Labenz: (54:19) Yeah. I mean, I think it's, this is something I've also been kind of, I don't know, it's a little bit of a hobby horse for me even with my own company too. It's like, this is just the new normal. You know, these we're, like, gonna be more dependent on these services. They have pretty good uptime, but not perfect. And there's really not much we can do about it. You know? If it's out, it's kinda out. So I'm with you on that. And I think just in general, like, there's a mindset shift that is kind of needed that's like, let's aim for creating really magical, like, the most valuable experiences we can as often as we can and, you know, live with a little risk into whether it may be, you know, outage or or just, you know, some uncanny valley behavior, like, certainly, you know, still see it. But I think it's for everybody really, it seems to me it's worth taking a little bit of that risk to get the, you know, those special upside moments because when they happen, they're like incredibly valuable. So yeah. Yeah. Think we

Andrew Lee: (55:18) we try to be upfront with people about this and people ask me like, hey, this going to be super reliable? Can I count on this? And my answer is, hey, if what you're looking for is the most reliable, stable product and nothing changes on you, use Gmail. That's what Gmail is. It's there. You can trust it, right? If you want the absolute most cutting edge stuff in any email client ever, use our stuff. We are trying very hard to be at the absolute edge.

Nathan Labenz: (55:45) Yeah. You do have a really nice benefit too that like worst case scenario, I can always just open up my Gmail and, you know, access stuff directly Yep. That way. 1 thing I did notice too about the new to dos, wonder if there's kind of a a deeper philosophy underlying this. You know? So I go organize my inbox. The assistant, first time I did it, just like started coming up with to do categories for me and suggesting like, okay, you should group these into this section. And I'm like, okay, interesting. That doesn't seem to touch back to Gmail at all as far as I can tell. Then there's also, like, labels which do. Right? And I guess I'm kind of wondering to what degree you have found it to be advantageous to, like, have this sort of, you know, single ground truth where, like, something that you do propagates back into the core Gmail account versus I act honestly kind of like it better where I'm like, okay. This is my shortwave universe, and I can let the AI assistant do its thing and run a little wild. And, you know, worst case scenario, I can always go see my old view. So but I don't know what users in general want. Do they want, like, a unified reality, or do they want you to kind of build, you know, something a little bit off to the side that in a sense like derisks them from anything that could happen to the core data store?

Andrew Lee: (57:04) Yeah, so I have a bunch of thoughts here. So the first thought I'll have is we, 1 of the big sort of conceptual changes we made here since the last time we talked is when I talked to you last time we were building an email client with an AI built in. We don't think of ourselves that way anymore. We think of ourselves as we are an AI with email features built in. And the plan here is to integrate in the medium term with products that aren't even email. And basically sort of anything within inbox, you know, your Slack or your LinkedIn or whatever, like you should be able to access and manage from this interface. And you'll notice we moved the AI to the left. Big driving force behind that is we see that as the main product. We see people coming in primarily because they want to interact with that AI. That AI can work with their email inbox, it can work with other inbox in the future, can work with CRM, project management tools, whatever you want to use. And so we're thinking it from that standpoint and in that world, keeping everything imperative with Gmail doesn't necessarily make sense because you might not even be using it with Gmail. You might be using it with some other product. The to do concept is here basically because we see a need for the AI to be able to like add and manage state specifically for like short term organizational purposes. Labels are a great tool for long term classification of like, I want to apply this thing and then 2 years from now I want to go and search for this characteristic. And I want that label to be like short and simple and easy to remember. To dos are great if I have some project that's happening right now and I want a name that's like a whole sentence and I want to have a bunch of notes in there and I want to attach a bunch of email threads for it. So I want a much more complicated but much more short lived type of thing. And I think the AI shines with this, right? So you can, let's say you're producing your podcast with me, right? And you and I might have 2 Google Docs and 5 threads about this and the AI can spot, hey, these are all related. I'm going to make 1 to do that's like prepare for podcast interview with Andrew and it'll put all this stuff in there, can add some notes in there for you. And we don't sync that back to Gmail largely because there is no concept like that in Gmail. And we don't sync it back to labels because it's kind of a different thing. And I do think people like the idea that we are not trying to like shoehorn new features into old Gmail things in a way that like might mess things up. So a counter example of this I'll give you and is 1 of our competitors. So if you use Superhuman, Superhuman implements some of their features basically by sending emails. And so if you use like the reminder feature or whatever, if you go back to Gmail after you stop using Superhuman, end up with all these extra emails which like kind of ends up looking cluttered and weird. And we try to avoid doing that so that at any point you can leave, you can come back to Gmail, everything looks the way you saw it before and it's like an easy switch. Not that we ever want you to leave but we want you to be comfortable that we are doing the right thing with your inbox because we know how important it is.

Nathan Labenz: (59:57) Yeah. Certainly, I think it makes me more comfortable in taking a leap. And that is, you know, a big part of what people need to do, I think, to get value from AI is be somewhat willing to take a leap. Couple more little product questions. And then, you know, the the big shift I think is honestly super exciting, super valuable because I have tried over the last 2 years honestly at this point to fine tune a model to write as me. And 1 thing I've really learned is that if I wanna make something general I've not really succeeded in this, to be honest, like nothing has beaten dumping a lot of writing samples into Claude and just, you know, asking it to do the next thing. But, you know, I've learned some stuff by trying. 1 of the things I've learned is just that my and I assume probably other people are often similar. No single system has the full me. Email, you know, I don't like email back and forth with my wife. I would need, like, you know, text messages really to you know, that's where that kind of relationship in text form at least lives. And then Slack is key for, like, a lot of the sort of day to day planning and discussion of what we're gonna do. And then of the more interesting conversations happen on Twitter DMs. And so it's, like, sort of not just that there's no single source of all the information, but there's honestly not even, like, a single thing that I feel like really you know, there it's, like, very different facets, I guess, of my life, I feel like across these different channels. So to be able to unify that into a single thing and have like a you know, an AI system that span all of them, I do think is quite exciting. When you do the right as me, how is that going? I would say I've noticed an improvement for sure. It seems now usually that it handles, like, routine stuff at least pretty well. So, you know, kind of classic, let's put the put a meeting together, you know, here's my Calendly link, whatever. That kind of stuff I've I'm increasingly feeling comfortable just like hitting send on with the assistance draft. Obviously, as it gets more specialized and more context dependent, you know, it's it gets much harder. But what have you learned about, like, making the right as me work maybe on multiple levels? Like, what do people really care about? How you know, representative or not am I? And, you know, techniques wise, like what is getting results?

Andrew Lee: (1:02:22) So, yeah. So from a techniques perspective, 1 of the things I am at this point, like very, I very strongly believe, and I think this is a little bit controversial with other people I've talked to, but I very strongly believe this is that the pace examples into Claude and have it use the examples to produce the response is the correct solution. It is the best solution. And that doesn't fit well with people. They like to assume that the more technologically interesting solution with the fine tuning and stuff is going produce better results. But I think there's 2, I'll give you 2 reasons why I think this is the right solution. The first is this technique is much better at recalling specific facts as opposed to like a shallow fine tune. And when you're writing an email or when you have the AI write an email for you, you don't care so much that it sounds like something you would have written. You care that it's correct, right? If you provide a link to someone, they're like, hey, where do I go to pay for this product, right? You don't care if the link looks like a thing that you would send, you care that it is the right link, right? If you're scheduling a meeting and it picks a time that you mentioned in previous email, shouldn't just be a time that you would send, it should be the right time. And if you give it specific examples and frame those correctly where you say, hey, the last time you talked about this topic, this is what you said. It can actually get the facts correct. It can get the lengths correct. It can get the times correct. So it isn't just sounding like you, it is being accurate. So that's I think the point number 1. Point number 2 is the big problem with fine tuning is it slows down your ability to update models. And if you have built a bunch of fine tuning infrastructure for a particular model and then you decide to change models, you got to refine tune everybody. And that's a big process and that's a big migration. And the reality is we are switching models constantly, right? Constantly. And often the leading edge models don't have fine tuning features built in initially. And so if you want to stay at the cutting edge and you want to be able to roll things forward really quickly, you don't want to have to be fine tuning these models. So we do have some fine tuned models, but we don't do the fine tuning for the purpose of fact completion and we try to avoid fine tuning if we don't have to. And we do things this other way. I'm actually fairly excited about reinforcement learning here because I think the big models do a pretty good job of sort of extracting facts and style and tone and stuff from examples, but I think they could do a better job. And I think reinforcement fine tuning will allow us to teach the model how to do style and fact matching better in a way that is generic across everyone. So we don't like per user fine tune, we just take the new GPT-four 0 with its reinforcement learning and we teach it, here is how to use examples for style matching and get facts right and write emails. So I'm actually very excited about that. But I do think the, here are 20 examples of things that have been said, master style and tone with the right prompt and the right reinforcement learning is the right solution.

Nathan Labenz: (1:05:20) Okay. So how about on memory? I mean, hear you on the operational challenges of fine tuning and every time I raise this notion of like per user fine tuning or even per company fine tuning, that's like, I think the most compelling counterpoint is like, what are you going to do every time you want to change models? And then there's also new knowledge that comes in and how often you're to run it. And like, it does sound like a real bear. Is there I guess, it still feels to me like there's some sort of missing middle in terms of memory where we have context window, obviously, and then we have, like, you know, stuff that is known in the weights. And then you have, like, the sort of database, you know, call as well. But those all feel like they're still kind of missing a little bit of something. Again, I kinda come back to this intuition of, like, I know when I've found it, you know, and there there does seem to be something qualitatively different about that or I kind of I know what I've tried. So I've been really interested in state space models. I've been really interested in the Titans paper that came out from Google not long ago where they actually use a neural network as a memory module that gets updated at runtime. I'm also I've been quite interested and you may have seen some I did an episode on a system called HippoRAG, which was inspired by the hippocampus and how it is understood to like connect concepts together so they have sort of a extensible graph, you know, network that they can query against. But maybe you think like we don't need any of that and we just need like better models and, you know, better search and maybe this whole thing is sort of a confusion on my part where just keep pushing the current frontiers and it'll all work and 1 day I'll just realize that we didn't never we never really needed, you know, another middle piece. What's your expectation for that?

Andrew Lee: (1:07:05) I you know, I I I don't know what's gonna happen here. My spidey sense is there is gonna be some big breakthrough here. And there is gonna be some concept of memory that's sort of baked in that allows the models to be customized in a way that isn't quite so heavy weighted as fine tuning. But I have no idea what that's going to look like. And I don't even have an idea of what that could be. Our approach right now is we have this thing called memories actually as a feature in our product, which basically just allows the LLM as a tool to sort of manage a list of facts. And then we insert those facts into the prompt. And this is really useful for really for behavior customization. So if you wanna be like, hey, every time I write an email I'll always cc my EA or like anytime you schedule an email, default to have it be 45 minutes or whatever. It's really useful for that. So we have that, but you have to be explicit. You have to say, hey, remember this fact and then we'll do it or remember to do this. We also obviously use search. Still think there's a lot of opportunity with an agentic model to use search. You could have a database of interesting facts about you and you could have a tool that looks at the facts and tries to apply those. I don't think it'd be quite the same as like built in memory but you could probably do a lot of the same stuff. I have not had time yet to read the Titan's paper, think you mentioned that the other day. I'm very interested. But, yeah, I do think there's gonna be some sort of breakthrough here, and I can't wait to see what it is.

Nathan Labenz: (1:08:25) It feels like 1 of the you know, it's too early to call a final frontier, but in terms of imagining something that could really work alongside me on an extended basis, it does seem to me like a sort of more integrated dynamic kind of active learning you might call it. But constantly updated memory does feel like something that would move the needle tremendously on that particular dimension. Okay. I think last question before we switch gears into the sort of meta of like how you're doing it because I think that's gonna be really interesting too. But 1 thing we talked about last year I wanted to get an update on is we had speculated about the rise in AI generated spam and sort of, you know, what happens as more people are adopting various tools and, you know, you've got AIs potentially talking to AIs. Has that happened? You know, another thing we, of course, we sort of expected was, like, a lot of deep fakes during the election that didn't really happen all that much. Is this another thing that we're just kind of all too worried about and, you know, it's not really a thing, or are you actually starting to see any sort of interesting, you know, AI dynamics let's call it?

Andrew Lee: (1:09:36) Not a ton honestly. I do actually get a lot of AI generated content in support, especially just like people being funny where like, you know, we'll send out a newsletter and they'll have it like write a poem or a joke or something respond. So people are doing that kind of thing. But I haven't seen a huge uptick in AI spam. Our users haven't reported a huge uptick in AI spam. It's possible that the normal spam filters from Gmail are doing good job or maybe it's just not a big problem, but no. Thankfully, it doesn't seem to be as bad as I feared.

Nathan Labenz: (1:10:08) K. Good. Yeah. I would say that I still notice honestly, more often, I marvel at how I'm getting cold emails that were obviously written by a human and would have been better had they been written by an AI. So it seems like maybe just adoption generally lagging is still the broad explanation there.

Andrew Lee: (1:10:27) A significant fraction of the emails that I sent are at least AI enhanced. Some of them are fully AI written and as far as I can tell no 1 notices. I'll talk to people in person and they'll be like, they'll ask me this sort of question and they'll be like, hey do you realize that my emails are AI written? And they usually say no. So maybe I just don't notice, which is fine if I can't tell.

Nathan Labenz: (1:10:46) Yeah. There's a couple of really interesting things there. I think I personally have not shaken this yet, but I've increasingly had to confront the fact that like I'm probably just way too precious about the little elements of my style that, you know, feel to me like they make me unique. And I honestly kind of doubt that anyone else notices or cares. I've got a friend who really hammers on that all the time, and he's just like, nobody cares about that, dude. Like, your little flourishes, like, they're totally lost on other people anyway. I sort of also differentiate the world between, like, routine tasks and non routine tasks. As somebody who, like, doesn't really have a job and is just kind of, scouting the AI space all the time. I do like very few routine things relative to I think, certainly like your typical sales user who's got a CRM integration type of thing would need to do. But yeah, it's it's it's kind of philosophically challenging to be like, maybe all this style and all this sort of personality and all the care that I've kind of put into how I want to show up, maybe it doesn't really matter. And not only like, you know, it's 1 thing to think like maybe an AI could replicate it. And it's another thing to think like, doesn't even really matter if it replicates it. It's just, do a good enough job and like the world just keeps moving on. I really

Andrew Lee: (1:12:00) do think routine is the place to look for AI. I do a ton of routine emails because for example, yeah, we made some changes to the UI yesterday. I'm sending emails that kind of walk through why we did things and how to adapt your work style that are very similar between all the different people I'm talking to and the AI is really useful in looking, hey, you just sent a bunch of emails kinda like this. Let's pull out some of the ideas here and and reconstruct it to answer the question. But, I'll send like 20 emails of like a very similar type.

Nathan Labenz: (1:12:28) So let's shift gears a little bit if you're ready to what it's like to work at and be building shortwave. I understand that there is some news around a fundraise, and there's this expanded vision of, you know, not just email, but kind of omnichannel communication. And then I thought maybe the most interesting thing from your recent Twitter thread was quoting here, we're also building an incredibly AI forward culture where the focus of our work is managing AI agents rather than making changes directly. We write a ton of code with AI, use AI for research and design, and even wrote this posting with AI. So I don't know what you wanna share about the the fundraise, and you can expand on the vision. And I really wanna get into, you know, what's it like to be building this AI forward culture and what's it like to be managing agents all day at Shortwave?

Andrew Lee: (1:13:20) Yeah. So the company has changed tremendously in the last few months. We kind of reached a point where we figured out, hey, the future is not making a better email client. The future here is an AI that has your communication apps sort of integrated into it, but the main thing that people are doing in this thing is actually talking to the agent. And in that world, we're building a very different thing. It is not how do you streamline every interaction of the work in the inbox and how do you make the UX or the email threads amazing? And we care about that stuff to some extent, but it is mostly how do we help people get things done at a higher level? And we're already seeing a lot of traction. The reason we're doing this is because this is already where people are being successful. Our biggest and fastest growing plan is our most expensive plan that actually has basically like a bunch more money being sent to anthropic for AI. Like the people who are using us now, they care about the AI. And so we're just doubling down on that. And that vision I think has gotten our investors very excited. They see opportunity for email client here, but they see opportunity for like this sort of thing here. And there does seem to be an opening in the market for an AI that can actually do things, right? If you use JETGPT or you use Claude or Perplexity, like they're great at answering questions and doing research, but when it comes to actually doing work, they don't really have any capabilities to do that. And someone's going to build a thing that can actually get stuff done. And we see that opportunity. So we're starting to see our competition as like the next version of JatGPT rather than the next version of Gmail. And so it's been a reframing. We have raised some money. I'm not ready to share the number who was involved, but enough that we can significantly expand our team. So we're doing that right now. As part of this, we basically revisited everything about the company. The way it's financed, the way we operate every day, who's on the team. And we think this tidal wave is coming. We want to be in front of the tidal wave. I think the tidal wave is going to hit faster and harder than basically everyone realizes. And we want to make sure we're on the other side of it. And so we think it's going to require some different people with a different mindset to do what we want to do. And yeah, I think you mentioned the sound bite there of your job is going to be primarily managing a bunch of AI interns to get work done rather than necessarily doing it yourself. And we're seeing this especially right now with code. If you're someone who is using the cursor agent mode, I think a good software engineer right now is going to have at least 1 agent running all the time doing something. They start the agent off and then they go start working on the next problem and it goes off and solves it. Like we have found that it's possible right now. We can take a bug report from Asana, we can copy it verbatim into that agent mode and ask it and it'll solve it in 1 shot in many cases, which is pretty amazing. So on the software engineering side, it's like, hey, we're looking for people whose skillset is primarily not execution, it's not writing code, it's primarily understanding their problems, understanding the components involved and how they should interact and being able to like frame those prompts. And so the value has shifted from people who can get things done to people who can understand the problems and structure things. So that's on the coding side. On the design side, I think it's changed too, where it used to be that the way that you design a product is like, you get together in a room and on a whiteboard and you draw it out and then you make some mocks and then you make a prototype and then you build a thing. And now the first thing you do is make a prototype, right? Like before I even talk to anyone else, I just build it in bolt.new and it's only after I'm like, yeah, think this will be pretty good. Then I go and I talk to the right people and we like make a mock and we try to figure out the details. But it lets us kind of cycle through bad ideas like much, much faster. So we're looking for, on the design side, we're looking for designers who have already made the shift of, I don't start with a mock, right? I don't even start with a wireframe. I start with a working version that I have built with built. New or a tool like that. That's the beginning. And so our design I think is totally different. Same thing with content. I mentioned the job postings that we wrote. That's all ChatGPT 4.5. It's really good at writing, right? It's not quite at the level of like if I had really put an effort and done a good job myself, but it's not far off and I can produce content that's at a pretty high bar in a fraction of the time. And so we need to do a lot of comms here. We want people who are good at how can 1 person manage our blog, our social media, our change log, all the docs in our website, everything. How could just 1 person do this if they're thinking hard about the right prompts and the right way to use these tools and how to generate this stuff? So it's gone from, hey, maybe we need a team of 50 to do this to maybe we just need a team of like 15 or 20 people with the right skill set and the right usage of these tools. And the 30 people that aren't there anymore are really more on the execution side and the 15 or 20 there are really more on the sort of like problem understanding and solving side. And I think it's going to be a big shift. We actually, I think I mentioned this earlier, but we actually let a couple of people go who were really very talented, very good at what they did, great employees, great attitude, but not quite right for this new world that we felt we need to get into and not super passionate about it either and so we thought a change needed to be made.

Nathan Labenz: (1:18:42) Passion is super important. I mean, that's definitely something I observe all the time. Maybe, you know, it does take some curiosity, you know, just innate interest, you know, that I feel like to the degree that I have demonstrated, you know, some aptitude here, a lot of it really just stems from that. Is this something I guess at this point, is this something that you think is an investment in the future? Obviously, it's that. But is it also something that you think like allows you to be more productive or execute at a higher level today? Or do you feel like it's

Andrew Lee: (1:19:17) kind of like, well, you

Nathan Labenz: (1:19:18) know, we might not be like you mentioned with the blog post, maybe it's not quite as good, but it's fine because it's pretty good. And we're also like investing in sort of being the type of organization we think we have to be in the future. Does that make sense? What is the time horizon for ROI on this?

Andrew Lee: (1:19:35) I think we're already seeing it in a big way. And I think companies need to be making changes now to the way they work, the way their org charts work, the way they spend their money. I'd say, you know, I know if I can put a number on it but our engineers today are significantly more productive now than they were 6 months ago. It's really changed in the last few months. Like if you last tried this stuff last summer, the world is different. Give it another try. So significant differences there. On the design side too, like, yeah, we can just skip over half the process and get right to the prototype. And I wouldn't say it changes the work needed to build the final design dramatically but it does allow us to cut out a lot of bad iterations which are the most of your time spent anyways. I So think we're a lot faster there. On the content side also a lot faster, Like good writing is hard and most of the time you don't need to write really amazing writing, you just need to write pretty good writing. And if you can do that much faster you can keep your docs up to date faster and things like that. So I think we're seeing those gains today. But there's a related thought I wanted to add in here which is a big change in the way we're thinking about our business. So a year ago when I was talking to investors, was saying, hey, their question was basically like, what is remote, Right? Like you're building this cool AI stuff, but why can't someone else come and do this? And I was like, well, our moat is that we have an email client. And it took us years to build this thing. If anyone else wanted to enter the space to build like an AI enhanced email experience, they'd have to go build an email client, that's really hard. And now I think, the thing that took us 4 years to build, it might not take 4 years for the next person, right? It'll still take them a while but they're going be able to do a lot faster. And I think there's this big change happening in the industry where all of these companies that have these moats of like the software they have built, that moat is being significantly eroded. The value of that like code that's sitting around is going down very quickly and they're going have to come up some other moats. And we're looking at this and saying, hey, the moat that matters, the only moat that probably matters is speed. And so how do we optimize our team for speed? And I think the best way to optimize your team for speed is to keep the team small. It's just harder to get consensus on big teams. So we're like, okay, how do we have 15 people that are the right people with the right tools that work together really well so we can just like move, move, move and our moat is that we're always ahead of everybody by 2 months because of the way we're structured. So that's been a big change in our thinking.

Nathan Labenz: (1:22:03) So 50, is that like a long term number?

Andrew Lee: (1:22:08) No, not a super long term but I think for the next year, right? Like I don't think we're going go beyond 15 and then we'll see beyond that. But no, I don't think most people could build the product that we're the scale or the scope of product that we're building historically with 15 people, but I think we can.

Nathan Labenz: (1:22:26) Yeah I mean there's a lot of surface area already and it sounds like you're planning to add a lot more and on top of that got

Andrew Lee: (1:22:33) a lot to keep up with. Well and we're not the first ones I think to figure this out. Like Cursor you know, it's like 20 people. Midjourney is like 20 people and they're obviously wildly successful. So I think I think there's a new era.

Nathan Labenz: (1:22:47) Are you still subsidizing users on the margin? That was another tidbit from last time you said you were literally like losing money on every user. Is that still true?

Andrew Lee: (1:22:55) Not anymore. No, we've done a lot of work to kind of increase efficiencies. We also added like a higher end plan and we've gotten people to kind of opt in those higher end plans. So we are margin positive. I wouldn't say hugely margin positive. We still spend a significant fraction of every dollar that you give to us. Like it goes directly to LLMs. And then a lot of that also goes to just like traditional email infrastructure, but on the margin that we do make money now. All right.

Nathan Labenz: (1:23:21) That's cool. Is there like a 2 to 10 X more expensive version of the product that you could imagine rolling out or are you already basically I mean, you're not quite maxing out, right? Because you talked about like you could spin up the agent every time. Could you do that today if you just charge 5 times We

Andrew Lee: (1:23:40) could. And I think we probably will. If you look, like I think I mentioned this earlier, but we have all these different plans and all the growth basically is happening in our most expensive plan. Everyone's coming in and they want the most expensive plan because it lets you have the biggest context window and it lets you index all of your history for search. And people care lot about that. And even though our business plan can give you almost as good answers as the Premier plan, but people want the best answers and the difference between good and best is worth a lot of money to them. And yeah, I think we undershot. I think, you know, Catch Upt Pro came out at $200 a month. I pay for it. I think it's too low. No hesitation on

Nathan Labenz: (1:24:18) my end either.

Andrew Lee: (1:24:19) Yeah, I pay more than that. No hesitation. And I think we should probably do the same. I think there should probably be for the type of person who really truly lives in their inbox who's able to get the kind of value out of what we do that we want them to. I think we could also have a $200 a month plan. And I don't know how far this goes. Like I know Sam was talking about like maybe we'll have a $2,000 a month plan or a $20,000 a month plan. And I have no immediate plans to do that but to the extent that people want us to spend money on GPUs on their behalf, like there's sort of no limit here. So for example, right now we don't I think you mentioned like the automatically running the full agent on every email. That's a great place where we could spend a lot of money on your behalf if you want us to. And it could be a 100x increase in the amount of compute required. I think another example is reasoning models where right now we don't use reasoning models. They're much more expensive. They're slower. But if you're trying to do something complicated like I want you to give me like a detailed analysis of every customer report over the last year, right? This might be something that otherwise you're given to an employee and they're spending a month on and you might be happy to spend hundreds of dollars of compute just to answer that 1 question. So there may be something that we can do there. So yeah, reasoning models, automatic execution of things, bigger models in general. And yeah, maybe there's a $200 a month plan or maybe more someday.

Nathan Labenz: (1:25:43) Going back to the building side, are there You mentioned Cursor, of course, and I'm a Cursor user and I've also gotten into Bolt and to Lovable and basically try everything I can. What I haven't quite figured out yet is how to make things like work together, you know, how to effectively sort of architect systems of agents, you know, I mean, and Repla too, right? Know, great experiences with all these different things in their moments. But are do you have kind of a mix of things that you've brought together? You know, are there things that sort of complement the coding agent? I'm thinking for example of like a company called Coto that does like specifically emphasizes testing or there could be like observability or sort of monitoring tools. I mean, monitoring is another thing you could really imagine layering on. Mean, you're and it ultimately seems like we're gonna need it. Right? If you're gonna have 15 people who continue to run at roughly human speed with a lot of AI assistance, they're also gonna need, like, AIs to sort of help them supervise the AIs. Right? I mean, this sort of the pure as the pyramid, you know, built under each person and under the company collectively gets bigger and bigger, there's gotta be like a whole architecture and sort of specialization within these agent ecosystems. Right? So I guess, is that roughly how you see things shaping up? And have you started to tie these things together in a useful way at all so far?

Andrew Lee: (1:27:20) Yeah. I I have a I have a couple of thoughts here. So the first thought is I think 1 of the potential futures for Shortwave is really as sort of a routing layer for the messaging going on in your business life. So you might be using a bunch of these different AI tools, but something needs to sort of like take the incoming events and decide which agent handles it and get it to that agent and take the output from that agent and do something sting with it. And inboxes are great tools for sort of managing that flow in a way that both the human and the AI can collaborate together on it. And if we're going beyond email and including other types of human communication, maybe we're going to go beyond traditional human communication. Maybe you're going start thinking it's more like a Zapier type of thing with like a human UI that you can use as well. And we might be a routing layer that helps you route things into these other tools. So we've had requests from customers like this and was a discussion I actually just had with 1 of our investors of maybe we need to start thinking ourselves more as like an agent routing layer. So that's my first thought. My second thought is we have tried in our approach to agents, some different ways of having like a multi agent approach to solving problems where basically like you have 1 system prompt that you iterate on over here. And then once you decide it's a certain type of problem, then you like change the system prompt, you add in a different prompt, or you hand it off to a different model. And our experience has been so far, and this is just 1 experience, but our experience has been so far that those systems end up being kind of brittle. They tend to struggle at reasoning across different types of tasks. And the cost aside, the better approach tends to be just take the biggest most expensive model you possibly can, stick all the instructions into the context and let it reason across the different things. And that I'll give you a specific example of a win we had here recently. So we used to have the ability for you to add custom instructions for certain types of operations. And the way that worked was when, for example, for writing, could add like custom writing instructions. And when we called the tool that was used to grab the writing instructions, we would insert those custom instructions into that thing. That worked well for cases where the writing tool was being called appropriately. But if you had some custom instructions that were like trying to give it a hint of like when to write emails or what types of examples to look up or that wanted to handle like not just things during writing and during other times, it didn't work so well because these things only got plugged in certain times. So what worked much better was this memories approach where you basically say, hey, we're going to take some customization instructions. We're going to add them into the master prompt and we're going to include that in every call. And then you can have a customization that can be considered at any time, right? Like you could have a customization that's like always address Nathanlabenz as sir, right? It could do that when writing emails, could do that when scheduling calendar events, it could do that in any situation. And that has proven to be like a much better user experience. It produced much better results from the AI. And so used to have a model that was sort of like routing things different AIs and just that screw it, we're just gonna put it in the main prompt has actually worked much better.

Nathan Labenz: (1:30:33) Yeah, more cash flow too I guess, right? To have that consistency.

Andrew Lee: (1:30:37) Yeah, so I see the opportunity for kinda the multi agent world really being more about working across organizations where we need some interface between our team and the Cursor team, but probably not something that's happening within our app. I think within our app, the approach of just like 1, the biggest alpha web we can get and the biggest context and then like using caching really well. I think that's probably more likely the future for us.

Nathan Labenz: (1:31:01) So I share the same intuition just based on everything that I've tinkered with over the last couple of years. And I just actually told a a company that I've been doing a little bit of, you know, agent advisory with that. And then like 2 days after I said that, OpenAI came out with their thing where like handoffs between agents is, you know, a pretty notable new feature. Is that what do

Andrew Lee: (1:31:27) you think

Nathan Labenz: (1:31:27) is driving that?

Andrew Lee: (1:31:29) I literally talked with that team yesterday about that exact thing. I had the exact same questions for them, and I shared my perspective. And I think I don't think handoffs are gonna take off. That's my that's my hot take.

Nathan Labenz: (1:31:44) Yeah. I mean, I can see some upsides to it. I mean, the things that they cite are like, it's easier to test them in isolation and whatever. But yeah. I mean, it's tough when you then have, like, classifications and yeah. There's, like, just more things to get that can go wrong. That's for sure as well. Okay. Well, I'm I'm glad that that, you know, hear somebody else shares my tradition on this. It makes me feel like because I was like, am I out of touch on this? You know, certainly OpenAI, you know, has some real insight into what's going on in the broader world.

Andrew Lee: (1:32:11) They shared the same perspective with me around, you know, sort of developing things in isolation. But I think the reality is with an agent, you don't really want the task to be isolated. You want the agent to reason across all of its capabilities and make a smart decision. Maybe there are some places where you really want to sandbox it, but my experience is that's not usually what you want. You usually want to think about everything is capable to make the best choice.

Nathan Labenz: (1:32:36) Yeah. I wonder if this is to some degree driven by larger companies that sort of have like existing organizational structures and like who's responsible for what and you know, that those lines are really hard to cross. I heard 1 of the more interesting little tidbits that I've heard in AI discourse over the last year probably was with Yi Tei from, I think, the latent space podcast where they were talking about multimodal models and, like, why they've developed as they have and and why, you know, it was like vision models and language models developed separately and then, you know, kind of merged with these, you know, sort of late fusion approaches. And his take on this was very simply, that's a reflection of legacy team structures. Like, used to be that you'd have a vision team and a language team. And so they would do their things, and then, like, maybe you could try to fuse them. But the, you know, the upshot was like, that won't last, you know, in into the future. You're gonna have just, you know, unified teams, you'll have unified architectures from the beginning. So this feels like maybe the seed of kind of a similar insight where it's like, maybe 1 reason because I've I've often also been like, why we talked about this a little bit last time too with respect to Gmail specifically. Like, you know, why can't the big incumbents do a good enough job with this? Right? Like, the technology is not that hard to use relative to you know, it's 1 of the things that's so great about it. Right? It's so flexible and it's in many ways forgiving and it, like, you know, totally understands what you meant even if you're, like, riddled with typos. I've even seen people doing this, like, speed typing where they just put tons of typos in and have the AI correct it. And for whatever reason, I haven't got over the hump on that myself. But maybe there's something here in terms of, like, who will win in different verticals that is just, like, big companies with these org charts and these division of responsibility and, like, who's gonna sign off on what. Like, maybe these things are being built for them because that's what they kinda have to have to be able to make any sense of it. And maybe it's not really about, like, what's gonna make the thing work best and and approach more like yours where it's like small team, totally unified, you know, single agent that knows your whole history and can kind of, you know, deal with you on that basis. It does seem like what I want. You know? I I don't want to be I don't you know, it's it's very it seems it seems a counterintuitive. Like, I don't want to be handed off from 1 AI. Don't want be handed off from 1 human to another. I don't want to be handed off from 1 AI to another. So, yeah, this is interesting. I think there's definitely some some insight there that might prove predictive.

Andrew Lee: (1:35:05) I think you're probably right. I'm sure they did this in response to like some real needs that they're seeing but I think 1 of the reasons we started short way back in the day is I worked at Google. I had some insight into like what sorts of organizational struggles they were going to face and I had the belief that the Gmail team was not going to be able to innovate quickly for organizational reasons that were very hard to change. And I think that has borne out and I think you're seeing that now where the models from Google are awesome.

Nathan Labenz: (1:35:38) New 2.5 is literally mind blowing.

Andrew Lee: (1:35:42) But the product progress in using those models in interesting ways in stuff like Gmail is way behind. And I think that's driven by, I just thought that those people are smarter or whatever, I think it's just driven by the way the organizations work. And for example, we decided, hey, we the LLM on left. Right? Agent's got to be over here. There's good reasons for this, right? If I was at Google and I was like, we need to take the right sidebar and put it on the left, that'd take me 2 years, right? Like think of all the sign off I need to get, all the people I need to convince to have buy in here and all of the

Nathan Labenz: (1:36:20) You literally have 1000 meetings.

Andrew Lee: (1:36:23) Right. And for us it was It took a few weeks, it was all the discussion was, is this the right thing for the product? And we could focus entirely on that question and we can just move a lot faster.

Nathan Labenz: (1:36:36) Yeah. Going back a little bit more just to who you're looking for and what the hiring process is like. 1 thing I noticed is, as far as I can tell, all the roles are in person, San Francisco. And I have kind of a question about that, especially as you think about, like, a future in which, you know, you have, like, a lot of the work being done by AIs. In a sense, it feels like the AIs are, like, always remote. And personally, I have had you know, being in Detroit, it's, like, quite different, you know, local talent pool, and the, you know, pros and cons are are quite different. But had a great experience hiring remotely. It's definitely, like, opened up the aperture of who we could go for, and I kind of am like, well, jeez, if all the AIs are, like, you know, only gonna interact with me through a screen anyway, kind of, you know, at least until I got humanoid robots sitting at the desk next to me, it'll be that way. How are you thinking about like holding the line on in person versus potentially liberalizing to remote?

Andrew Lee: (1:37:35) There's a couple of pieces here. So I think piece number 1 is what I talked about was speed. We have been historically a remote team. In fact at 1 point we were fully remote. We weren't even hybrid at all. And I think it's become increasingly important that speed is critical. It is a lot easier to move quickly in person, especially when what you're doing is making rapid product changes. You can move in a straight line very, very quickly remote. But if you need to quickly change course and have a lot of difficult meetings and conversations and have to deliver tough feedback and be like, hey, this isn't working, It is just a lot more emotionally challenging to do that in a remote environment. It's possible when teams do it, I think it's harder. And I think it's something we really struggle with. So if speed is paramount, we think having a core team of folks who are driving most of the roadmap, the product decisions and stuff that can meet in person and hash it out on a whiteboard at any time I think is really key. So we really want to have that happen. The other I think really key piece is it's just a reflection of the founders. So Johnny and I work better in person. I am a better leader and manager in person. People I think like me better if they see me in person than if they're talking to me over a camera. And that's a weakness probably. Could learn to be more effective in a remote way, but I also to some extent need to understand my capabilities and limits. I look back at Firebase. Firebase was a fully in person company and there were a lot of struggles that I think we've had with Shortwave that we didn't have there caused by some of the differences there. So I think Johnny and I looked at this and said, hey, we need this to be a place that is optimizing for our strengths and we work better in person, we need to move really, really fast. We got to have this core team in person. Yeah, I think this can make recruiting harder. There's lots of super amazing talent that is not in San Francisco. I'm very keenly aware of that, but we think it's worth it.

Nathan Labenz: (1:39:31) Yeah. Okay. How about an AI scout role? This is the hobbyhorse of mine. Have you thought about a dedicated position for just like somebody to go try every new thing, whether it's new model, new framework, new, you know, agent experience, whether, you know, somewhat competitive or totally far afield. This to me feels like something more and more companies are gonna need, but I don't know how many are feeling it quite yet. Are you asking

Andrew Lee: (1:39:59) me for a friend, Nathan?

Nathan Labenz: (1:40:01) Yes. Well, many friends actually, I think because, know, when you're when you would have been 50 and you're now 15 and that starts to get generalized, like we do need some of these new AI jobs. Like I've managed to kind of stumble my way. You know, sometimes call myself the Forrest Gump of AI where I just sort of unintentionally stumble through these important scenes and I've ended up in a, you know, a place that I honestly quite love and, like, have no, you know, nothing but appreciation for. But I do think we need like a lot of new AI jobs and this feels to me like 1 that might become common. And so, yeah, think there's a I'm asking on behalf of maybe a lot of people.

Andrew Lee: (1:40:39) Yeah, a 100%. It's not a role that I have listed right now just because I think we're a very small team right now and we only have a handful of roles, but I do think it is sort of impossible to keep up right now. I do think there have been a number of moments where our discovery of For example, if I hadn't listened to that particular podcast that talks about the new behavior of Claude, it might've been months before we figured out the same thing, right? So I think there has been a number of inflection points driven by discovery of new technology that I think we just got lucky on and it'd be great to have someone who's systematically doing this sort of thing. Obviously I try to listen to great podcasts like this 1 and keep up that way if I can and read whatever I can, but I think in the future we may very well have a role like that. I think a lot of other companies, especially bigger companies where not everyone is constantly thinking about AI, could benefit tremendously from something like that.

Nathan Labenz: (1:41:32) Yeah, situational awareness, harder and harder to maintain all the time. Maybe last a little bit on the hiring side. You know, to some degree, obviously, these are just labels, but I did also notice that the roles that you do have posted are all staff or senior, you know, designations. I guess maybe 2 layers to the question. 1 is like, is there a place for, you know, somebody who doesn't have a lot of experience at shortwave? And what would it take for them to, you know, to take maybe a somewhat extreme case? Like, let's say they're freshly graduated, no work experience. Is there any way for somebody like that to demonstrate to you that they have the skills that would make you even open to hiring such a person?

Andrew Lee: (1:42:18) Yes, absolutely. And for what it's worth, the job postings have been updated significantly since I think you saw them. I posted the senior roles first. We actually have some entry level roles up there now.

Nathan Labenz: (1:42:28) Okay, cool.

Andrew Lee: (1:42:30) Including, I just posted yesterday, we're looking for someone for customer success. We're looking for like a content creator. We want to make a lot of videos because I think there's lot of, just a ton of sort of like discovery and education in AI, so we want to do some content creation. There's a junior kind of product engineering role up there. So we're definitely looking for some junior folks as well. If you want to impress me, there's a very simple application process which is send me a video. We've kind of gotten rid of some of the sort of take home tests and things like that because the community we gave with AI. So we're going to just send me a 5 minute video, show me something cool that you did. I think the thing we're most interested in is are you someone who's really forward thinking with AI? Have you figured out, like if you could show me here's a way that I leverage AI to do something useful in a way that I haven't thought of and impresses me, I think that would That would cause be me to take notice. So yeah, are you super forward thinking about AI? Are you being creative about how the stuff is used? Are you staying on top of it? As you know, if you can stay on top of it, you're pretty impressive already.

Nathan Labenz: (1:43:28) Yeah. No doubt. I guess more broadly, what do you think is gonna happen to the future of the software industry? This is, like, obviously a super hot topic right now. Are there gonna be more developers because they're more valuable, or is there only so much software that needs to get written? You know? And, also, this could definitely, like, break down in phases where I think we might be on the up ramp of that curve. But the more I hear from folks like you that, you know, you only need, you know, 15 people for the foreseeable future, the more I'm like, man, I don't know how much we can bet on this sort of developer market to grow or even sustain itself as it has been for the last few years. What's your expectation?

Andrew Lee: (1:44:05) Yeah, good question. My co founder oscillates depending on the day between an existential crisis of, he had the CTO, Of like, I am obsolete. And on the next day he's like, I am a God and I can do so much.

Nathan Labenz: (1:44:19) He doesn't know Right? Which 1

Andrew Lee: (1:44:23) I think, I feel like every software engineer is going through that right now where like their productivity is like going through the roof, but also maybe I'm obsolete at the same time. I think it's going to change a lot. The nature of what you use is going change a lot because for example, like actually writing code, like localized code is starting to become something LLMs can do. So for example, like any sort of front end development where you're like, hey, I need a button that looks like this, right? The LMs are great at that sort of stuff. And so I think increasingly code's gonna get written by this. There was a interview with Dario from Anthropic who I think he said like within 2 years, like 90% of all code is gonna get written by, or maybe was 100% of code is gonna be written by AI. I'm not sure I quite believe that but I think it's gonna be very significant amounts of it. The places where I think AI is gonna take longer to, if ever to be able to do it is really sort of understanding user problems and understanding the components and how they interact, which I think is super important, super complicated and actually a huge part of the job today. If you take a senior engineer today, the senior engineer isn't senior because the UI code they wrote is like dramatically better than the junior engineer. The senior engineer is able to sort of think about the whole business problem and solve it. And so I think the folks who are good at thinking about the business problems and how to solve them and thinking about the components and how they interact, I think they're going to do great. I think the folks whose strengths are on execution side are going have to learn some of those other skills and get good at that. In terms of numbers of people, I don't know, right? On the 1 hand, yeah, I think you can do more with less. On the other hand, there's 1000000 little companies that should have existed, but it's just too expensive, right? Like software engineers are crazy expensive right now. And if you need 10 of them to do anything, right? But suddenly there might be a startup where you would have taken 20 people for it, now you can do it with 3. And you can build a niche product for something. So maybe we'll just see an explosion of, another 10x explosion in software and the jobs will stay the same. So I don't know, I'm generally an optimist. I generally think that it will make life better for everyone, including employees and it just everyone's gonna have to adapt, but I think it's gonna be a good thing. I think being a software engineer is gonna be an awesome place to be over the next few years.

Nathan Labenz: (1:46:40) Yeah. I, can't imagine coding without AI assistance at this point. So on that level, it's no doubt. And I definitely share your expectation on 10 times more software getting created. And I'm still kind of like but maybe even 10 times more software getting created isn't enough to sustain the number of, like, just pure headcount jobs that we currently have. So, you know, we'll time will tell. Yeah. For sure. I, I do advocate people start thinking about universal basic income if they haven't, already started to contemplate it, But we're not quite there yet. That's, you know, that's like a long term problem, like, you know, maybe 2, 3 years out. This has been outstanding. Maybe 1 last question. You know, as you look a little bit into your crystal ball, what do you think are gonna be the big developments for, say the rest of this year? Like, you know, what are the things that you're kind of, man, if only they could get this to work or fix this, you know, if only the world were a little bit different, you know, things would be like much better or much different for you. What are you kind of most closely watching in that sense?

Andrew Lee: (1:47:43) Yeah, think the number 1 thing is post training on agenda behavior and tool calling type stuff. So the step change between every other model before CloudSonic 3.5 and CloudSonic 3.5 just enabled a whole bunch of new stuff. 3.7 was better, but there's still gaps. It'd be great to have more competition there, a great hit to have OpenAI and other folks have options here. And I think this is all coming down to agentic specific post training, and I am excited to see where that leads and how good that gets. And iteration and tool calling can work around so many limitations into your system. And the AI can sort of find creative solutions to stuff that I think the sky's the limit there. So that's number 1 thing is like post training on unidentified behavior. I'm also just generally excited about the improvements for the productionization of these things, performance of the tools, cost, latency, there's a lot of things that we can't do because they're impractically expensive, like running the full agent on every email that comes in or adding more search results into every query. And the cheaper, faster, more reliable these things get, the better for us. And I expect that to keep happening. Cost is still a huge factor for us. So that's another 1. I think another area is kind of more native multimodal voice today. You can do voice things. We actually have a voice input in our app. It's fine, it's not great, it's not like really talking to a human, but models that are more support voice natively, multimodally that we can use for the full identic flow would be like totally killer. So fingers crossed there.

Nathan Labenz: (1:49:23) Yeah. I love that vision too. I mean, anything that will get me outside more and less, tethered to my desk is hotly anticipated certainly by me.

Andrew Lee: (1:49:32) A 100%.

Nathan Labenz: (1:49:33) Any other thoughts or closing wisdom you want to leave people with?

Andrew Lee: (1:49:38) Just you should come check out Shortwave and you should totally either apply yourself or refer your friends. We have a $10,000 referral bonus for anyone. If you get them to apply and tell us that you referred them, we will give you $10,000. No joke. So, yeah, help us find great people.

Nathan Labenz: (1:49:58) Love it. Andrew Lee from Shortwave, thank you again for being part of the cognitive revolution. Thank you, Nathan. It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcr@turpentine.co, or you can DM me on the social media platform of your choice.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.