E41: [Bonus Episode - Latent.space] Building the AI x UX Scenius with Linus Lee of Notion AI

Listen to Episode Here


Show Notes

[Bonus Episode] Latent.space hosts, Alessio and Swyx, sit down with Linus Lee of Notion AI to discuss Linus’ experience starting the AI/UX community, prompt engineering at Notion, and designing AI interfaces and agents.

The latent.space podcast aims to be the first place where AI engineers hear about the latest AI news and technology trends. We’ve had several guests in common – including Shreya Rajpal of Guardrails, Jonathan & Abhi from the recently acquired for $1.3B MosaicML, and also Riley Goodside from Scale AI – and also explored similar themes, including the question of which AI companies have durable business moats, and the flurry of experimentation going into AI agents right now. We recommend checking them out!

Keep up to date with Latent.space here:

We're hiring across the board at Turpentine and for Erik's personal team on other projects he's incubating. He's hiring a Chief of Staff, EA, Head of Special Projects, Investment Associate, and more. For a list of JDs, check out: eriktorenberg.com.

RECOMMENDED PODCAST:

The HR industry is at a crossroads. What will it take to construct the next generation of incredible businesses – and where can people leaders have the most business impact? Hosts Nolan Church and Kelli Dragovich have been through it all, the highs and the lows – IPOs, layoffs, executive turnover, board meetings, culture changes, and more. With a lineup of industry vets and experts, Nolan and Kelli break down thenitty-gritty details, trade offs, and dynamics of constructing high performing companies. Through unfiltered conversations that can only happen between seasoned practitioners, Kelli and Nolan dive deep into the kind of leadership-level strategy that often happens behind closed doors. Check out the first episode with the architect of Netflix’s culture deck Patty McCord.

https://link.chtbl.com/hrheretics

TIMESTAMPS

(05:52) Starting the AI / UX community

(12:23) Most knowledge work is not text generation

(18:43) Finding the right constraints and interface for AI

(21:27) Linus' journey to working at Notion

(25:51) The importance of notations and interfaces

(28:29) Setting interface defaults and standards

(34:58) The challenges of designing AI agents

(42:00) Notion deep dive: “Blocks”, AI, and more

(53:22) Prompt engineering at Notion

(01:04:22) Lightning Round

SPONSOR:

Shopify is the global commerce platform that helps you sell at every stage of your business. Shopify powers 10% of ALL eCommerce in the US. And Shopify's the global force behind Allbirds, Rothy's, and Brooklinen, and 1,000,000s of other entrepreneurs across 175 countries.From their all-in-one e-commerce platform, to their in-person POS system – wherever and whatever you're selling, Shopify's got you covered. With free Shopify Magic, sell more with less effort by whipping up captivating content that converts – from blog posts to product descriptions using AI. Sign up for $1/month trial period: https://shopify.com/cognitive

This show is produced by Turpentine: a network of podcasts, newsletters, and more, covering technology, business, and culture — all from the perspective of industry insiders and experts. We’re launching new shows every week, and we’re looking for industry-leading sponsors — if you think that might be you and your company, email us at erik@turpentine.co.



Full Transcript

Transcript

Nathan Labenz: (0:00) Hello, and welcome back to the Cognitive Revolution. Today, we have a special bonus episode to share from our friends at the Latent Space podcast. Hosted by Alessio Fanelli, partner and CTO in residence at Decibel Partners, and Swyx, writer, speaker, developer advocate, and creator of the popular Smol Developer project, the Latent Space podcast aims to be the first place where AI engineers hear about the latest AI news and technology trends. Since launching in February, right around the same time that we started the Cognitive Revolution, they've followed a remarkably similar and at times outright overlapping path in their simultaneous quest to understand everything going on in AI. We've had several guests in common, including Shreya Rajpal of Guardrails AI, Jonathan and Avi from the recently acquired for $1.3 billion MosaicML, and also Riley Goodside from Scale AI, the world's first staff prompt engineer. We've explored some similar themes, including the question of which AI companies have durable business moats and also the flurry of experimentation going into AI agents right now. My belief, as you probably know, is that the AI capabilities overhang—that is, the gap between what is possible with current AI models and what has actually been made to work in practice today—is huge and arguably still growing, and that much more high quality AI education is badly needed to help close the gap so that we can better understand how AI systems work and begin to realize the benefits of AI deployment before even more powerful systems are created. With that in mind, I'm happy to introduce the Latent Space podcast with a guest that I've also tried to book but haven't been able to so far, Linus Lee of Notion AI. Linus is a prolific developer and writer whose work has been very influential for me and many others in the space as he's prototyped and publicly shared a variety of writing assistants and other creative user interfaces and experiences all designed to help integrate large language models into practical workflows. As you'll hear, he's one of the deepest and most sophisticated thinkers about the future of knowledge and creative work. Beyond this, you can subscribe to the Latent Space podcast feed for weekly updates, including a very lively recent discussion with George Hotz of the Tiny Corporation, which among other things contained some juicy rumors about the nature of GPT-4. I hope for now you enjoy this introduction to the Latent Space podcast, and we'll see you back next time for another edition of the Cognitive Revolution.

Swyx: (2:33) Hey, everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO in residence at Decibel Partners. I'm joined by my cohost, Swyx, writer and editor of Latent Space. And today, we're not in our regular studio. We're actually at the Notion New York headquarters. Thanks to Linus. Welcome.

Linus Lee: (2:50) Thank you. Thanks for having me. Excited to be here.

Swyx: (2:52) Thanks for having us in your beautiful office. It is actually very startling how gorgeous the Notion offices are, and it's basically the same aesthetic.

Linus Lee: (3:01) It's a very consistent aesthetic. It's been the same aesthetic in San Francisco and the other offices, and it's been for many, many years.

Swyx: (3:08) Yeah. You take a lot of craft in everything that you guys do.

Linus Lee: (3:12) Yeah. I think it—we can, I'm sure, talk about this more later—but there is a consistent focus on taste that I think flows down from Ivan and the founders into the product. So I'll introduce you a little bit, but also you're a very hard person to introduce because you do a lot of things. You got your BA in computer science at Berkeley. Even while you're at Berkeley, you're involved in a bunch of interesting things at Replit, Catalyst X, the Hack Club, and Dorm Room Fund, which I always love seeing people come out of Dorm Room Fund because they tend to be very entrepreneurial. You're a product engineer at IdeaFlow, residence at Betaworks. You took a year off to do independent research, and then you've finally found your home at Notion. What's one thing that people should know about you that's not on your typical LinkedIn profile? Just on the personal side.

Linus Lee: (4:01) Wow. Putting me on the spot. I think—I mean, just because I have so much work out there, I feel professionally at least, anything that you would probably want to know about me, you can probably dig up. But I'm a big city person, but I don't come from the city. And so I went to school—I grew up in Indiana, in the middle of nowhere, near Purdue University, a little suburb. I only came out to the Bay for school, and then I moved to New York afterwards, and which is where I'm currently. I'm in Notion New York. But, you know, I still carry within me a love and affection for small town Indiana, small town flyover country.

Swyx: (4:33) Okay. We do have a bit of indulgence in this. I'm from a small country, and I think, Alessio, you also identified with this a little bit. What's something that people should know about Purdue? Purdue chickens.

Linus Lee: (4:45) Yeah. Purdue has one of the largest international student populations in the country, which, I don't know exactly why, but because it's a state school, the focus is a lot on STEM topics, Purdue is well known for engineering, and so we tend to have a lot of folks from abroad, which is particularly rare for a university in, I don't know, the rest, predominantly white American and Midwestern state. That makes Purdue and the surrounding area like a younger, more diverse international island within the, I guess, broader world that is Indiana.

Swyx: (5:21) Fair enough. We can always dive into flyover country or small town insights later. But you and I, all three of us actually recently connected at AI UX SF, which is the first AI UX meetup. Essentially, what just came out of a Twitter conversation. You and I have been involved in HCI Twitter. It's how I think about it for a little bit. And when I saw that you were in town, Jeffrey Litt was in town, Maggie Appleton in town, on the same date, I was like, we have to have a meetup, and that's how this thing was born. What did it look like from your end?

Linus Lee: (5:52) From my end, it looked like you did all of the work.

Swyx: (5:57) Well, you got us the Notion office.

Linus Lee: (5:58) Yeah, yeah. It was also in the Notion office. It was in the San Francisco one. Then thereafter, there was a New York one that I decided we couldn't make. But yeah, from my end, and I'm sure you were too, but I was really surprised by both the mixture of people that we ended up getting and the number of people that we ended up getting. There was just a lot of attention on—obviously, there's a lot of attention on the technology itself of GPT and language models and so on—but I was surprised by the interest specifically on trying to come up with interfaces that were outside of the box and the people that were interested in that topic. And so we ended up having a packed house and lots of interesting demos. I've heard multiple people comment on the event afterwards that they were positively surprised by the mixture of both the ML AI focused people at the event as well as the interface HCI focused people.

Swyx: (6:47) Yeah. I see you as one of the leading, I guess, AI UX people. So I hope that we're maybe starting a new discipline maybe.

Linus Lee: (6:52) Thank you. Yeah. I mean, there is this growing contingency of people interested in exploring the intersection of those things, so I'm excited for where that's going.

Swyx: (7:04) I don't know if you want—if it's worth going through favorite demos. It was a little while ago, so I don't know if—

Linus Lee: (7:11) There was—I forget who made it—but there was this new document writing tool where you could apply brushes to different paragraphs. Amelia's—

Swyx: (7:18) Yeah. Yeah. Where you could set a tone and both in terms of writer inspiration and then tone that you want it, and then you could drag and drop different tones into paragraphs and have the model rewrite them. It was the first time that it's—you know, it's not just autocomplete. You know? There's more to it, and it's not ask it in a prompt. It's like this funny dragon emoji over it.

Linus Lee: (7:43) I actually thought that you had done some demo where you could select text and then augment it in different moods. But maybe it wasn't you. Maybe it was just someone else.

Swyx: (7:51) I had done something similar with slightly different building blocks. I think Emilia's demo was there was a preset palette of brushes and you applied them to text. I had built something related last year. I prototyped a way to give people sliders for different semantic attributes of text. And so you could start with a sentence, you had a slider for length and a slider for how philosophical the text is and a slider for how positive or negative the sentiment in the text is, you could adjust any of them in the language model to produce the text. Yeah. Similar, but continuous control versus distinct brushes, I think, is an interesting distinction there.

Swyx: (8:26) I should add—for listeners, if you missed the meetup, which most people will have not seen it, we actually did a separate post with timestamps of each video, so you can look at that.

Linus Lee: (8:35) Sorry, Linus. This is unrelated, but I think you've built over a hundred side projects or something like that. I think there's a lot of people—

Linus Lee: (8:43) I don't know. Is it a hundred?

Swyx: (8:45) I think it's a lot of them.

Linus Lee: (8:46) Oh, yeah. A lot of them are small.

Swyx: (8:49) Yeah. Well, I mean, it still counts. I think there's a lot of people that are excited about the technology and want to hack on things. It's like, do you have any tips on how to scope what you want to build? You know? It's like, how do you decide what goes into it? Because all of these things, you could build so many more things on top of it. What do you decide when you're done?

Linus Lee: (9:06) So my projects actually tend to be—I think especially when people approach project building with a goal of learning, I think a common mistake is to be over ambitious and not scope things very tightly. And so a classic failure mode is you say, I'm really interested in learning how to use the GPT-4 API, and I'm also interested in vector databases. I'm also interested in Next.js. And then you devise a project that's going to take many weeks, and you glue all these things together. And it could be a really cool idea, but then especially if you have a day job and other things that life throws your way, it's hard to actually get to a point where you can ship something. And so one of the things I got really good at was saying, one, knowing exactly how quickly I could work, at least on the parts of the technologies that I knew well, and then only adding one new unknown thing to learn per project. So it may be that for this project, I'm going to learn how the embedding API works, or for this project, I'm going to learn how to do vector stuff with PyTorch or something. And then I would scope things so that it fit in one chunk of time, like Friday night to Sunday night or something like that. And then I would scope the project so that I could ship something—as much work as I could fit into a two day period. So that at the end of that weekend I could ship something. Then afterwards if I want to add something, have time to do it and a chance to do that. But it's already shipped, there's already momentum, people are using it or I'm using it, and so there's a reason to continue building. And so only adding one new unknown per project, I think, is a good trick.

Swyx: (10:36) First came across you, I think, because of Monocle, which is your personal search engine. And I got very excited about it because I always wanted a personal search engine until I found that it was in a language that I've never seen before.

Linus Lee: (10:47) Yeah. Yeah. There's a whole tower of little tools and knowledge stacks that I built for myself. Oh, one of the other tricks to being really productive when you're building side projects is just to use a consistent set of tools that you know really, really well. And so for me, that's Go and my language and a couple other libraries that I've written that I know all the way down to the bottom of the stack. And then I barely have to look anything up because I've just debugged every possible issue that could come up, and so I could get from start to finish without getting stuck in a weird bug that I've never seen before. But, yeah, it's a weird stack.

Swyx: (11:20) It also means that you probably are not aiming for, let's say, open source glory or whatever. Right? Because you're not publishing in the JavaScript ecosystem.

Linus Lee: (11:29) Right. Right. I mean, I've written some libraries before, but a lot of my projects tend to be—the way that I approach it is less about building something that other people are going to use en masse.

Swyx: (11:38) And make yourself happy.

Linus Lee: (11:39) Yeah, more about here's the thing that I built if you want to look at it. And often I learn something in the process of building that thing. So with Monocle, I wrote a custom full text search index. And I thought a lot of the parts of what I built was interesting, and so I just wanted other people to be able to look at it and see how it works and understand it. But the goal isn't necessarily for you to be able to replicate it and run it on your own.

Swyx: (11:59) Well, we can dive into your other AI UX thoughts. You've been—as you've been diving in, you tend to share a lot on Twitter, and I just took out some of your greatest hits. Yeah. This is relevant to the demo that you picked out, Alessio, and what we're talking about, which is most knowledge work is not a text generation task. That's funny because a lot of what Notion AI is is text generation right now. Maybe you want to elaborate a little bit.

Linus Lee: (12:23) Yeah. I think the first time you look at something like GPT, the shape of the thing you see is, oh, it's a thing that takes some input text and generates some output text. And so the easiest thing to build on top of that is a content generation tool. But I think there's a couple of other categories of things that you could build that are progressively more useful and more interesting. And so besides content generation, which requires the minimum amount of wrapping around ChatGPT, the second tier up from that is things around knowledge, I think. So if you have—this is the hot thing with all these vector DB things going around—but if you have a lot of existing context around some knowledge about your company or about a field or all of the internet, you can use a language model as a way to search and understand things in it and combine and synthesize them. And that synthesis, I think, is useful. And at that point, I think the value that that unlocks is much greater than the value of content generation because most knowledge work—the artifact that you produce—isn't actually about writing more words. Most knowledge work, the goal is to understand something, synthesize new things, or propose actions or other kinds of knowledge-to-knowledge task. And then the third category, I think, is automation, which I think is what people are looking at most actively today, at least from my vantage point in the ecosystem, things like the ReAct prompting technique, and just in general, letting models propose actions or write code to accomplish tasks. But that's also moving far beyond generating text to doing something more interesting. But, yeah, so much of the value of what humans sit down and do at work isn't actually in the words that they write. It's all the thinking that goes on before you write those words. And so how can you get language models to contribute to those parts of work?

Swyx: (14:05) I think when you first tweeted about this, I don't know if you already accepted the job, but you tweeted about this and then the next one was, this is a Notion AI subtweet. Right?

Linus Lee: (14:15) So I didn't realize that.

Swyx: (14:17) It's so funny. The best thing that I see is when people complain and then they're like, okay, I'm going to go and help make the thing better. Right. So what are some of the things that you've been thinking about? I know you talked a lot about some of the flexibility versus intuitiveness of the product. The language is really flexible, right, because you can say anything. And it's funny. The models never ignore you. They always respond with something. Right. So no matter what you write, something is going to come back. But sometimes you don't know how big the space of action is, how many things you can do. So as a product builder, how do you think about the trade-offs that you're willing to take for your users? Or, okay, I'm not going to let you be as flexible, but I'm going to create these guardrails for you. What's the process to think about the guardrails and how you want to funnel them to the right action?

Linus Lee: (15:09) Yeah. I think what this trade-off you mentioned around flexibility versus intuitiveness gets at one of the core design challenges for building products on top of language models. A lot of good interface design comes from tastefully adding the right constraints in place to guide the user towards actions that you want to take or just—as you add more guardrails, the obvious actions are a bit more obvious. And one common way to make an interface more intuitive is to narrow the space of choices that the users have to make and the number of choices that they have to make. And that intuitiveness, so that source of intuitiveness from adding constraints, is directly at odds with the reason that language models are so powerful and interesting, which is that they're so flexible and so general. And you can ask them to do literally anything, and they will always give you something. But most of the time, the answer isn't that high quality. And so there's a distribution of—there are clumps of things in the action space of what a language model can do that the model's good at, and there's parts of the space where it's bad at. And so one high level framework that I have for thinking about designing with language models is there are actions that the language model's good at and actions that it's bad at. How do you add the right constraints carefully to guide the user and the system towards the things that the language model's good at? And then at the same time, how do you use those constraints to set the user expectations for what it's going to be good at and bad at? One way to do this is just literally to add those constraints and to set expectations. So a common example I use all the time is if you have some AI system to answer questions from a knowledge base, there are a couple different ways to surface that in a hypothetical product. One is you could have a thing that looks like a chat window in a messaging app, and then you could tell the user, hey, this is for looking things up from a database. You can ask a question and it'll look things up and give you an answer. But if something looks like a chat, and this is a lesson that's been learned over and over for anyone building chat interfaces since 2014, 2015, if you have anything that looks like a chat interface or a messaging app, people are going to put some weird stuff in there that just don't look like the thing that you want the model to take in because the expectation is, hey, I can use this like a messaging app. People will send in hi, hello, weird questions, weird comments. Whereas if you take that same, literally the same input box and put it in a thing that looks like a search bar with a search button, people are going to treat it more like a search window. And at that point, inputs look a lot more like keywords or a list of keywords or maybe questions. But that simple act of contextualizing that input in different parts of an interface reset the user's expectations, which constrain the space of things that the model has to handle. And you're adding constraints because you're really restricting your input to mostly things that look like keyword search. But because of that constraint, you can have the model fit the expectations better. You can tune the model to perform better in those settings. And it's also less confusing and perhaps more intuitive because the user isn't stuck with this blank page syndrome problem of, okay, here's an input. What do I actually do with it? When we initially launched Notion AI, one of my common takeaways personally from talking to a lot of my friends who had tried it—obviously there were a lot of people who were getting lots of value out of using it to automate writing emails or writing marketing copy. There were a ton of people who were using it to write Instagram ads and then paste it into the Instagram tool. But some of my friends who had tried it and did not use it as much, a frequently cited reason was I tried it, it was cool, it was cool for the things that Notion AI was marketed for, but for my particular use case, I had a hard time figuring out exactly the way it was useful for in my workflow. And I think that gets back at the problem of it's such a general tool that just presented with a blank prompt box, it's hard to know exactly the way it could be useful to your particular use case.

Swyx: (18:44) What do you think is the relationship between novelty and flexibility? I feel like we're in a prompting honeymoon phase where the tools are new and people just want to do whatever they want to do. You know? And so it's good to give these interfaces because people can explore. But if I go forward three years, ideally, I'm not prompting anything. You know? The UX has been built for most products to already have the intuitive happy path built into it. Do you think there's merit in a way? If you think about ChatGPT, if it was limited—the reason why it got so viral is people were doing things that they didn't think a computer could do. You know? It's like write poems and solve riddles and all these different things. How do you think about that, especially at Notion where Notion AI is a new product in an existing thing. How much of it for you is letting that happen, you know, and seeing how people use it? And then at some point, be like, okay, we know what people want to do. The flexibility—it was cool before, but now we just want you to do the right things with the right UX.

Linus Lee: (19:49) I think there's value in always having the most general input as an escape hatch for people who want to take advantage of that power. At this point, Notion AI has a couple of different manifestations in the product. There's the writer. There's a thing we call an AI block, which is a thing that you can always re-update as a part of a document. It's like a live little portal inside the document that an AI can write. We also have a relatively new thing called AI autofill, which lets an AI fill an entire column in a Notion database. In all of these things, speaking of adding constraints, we have a lot of suggested prompts that we've worked on and we've curated and we think work pretty well for things like summarization and writing drafts to blog posts and things. But we always leave a fully custom prompt for a few reasons. One is if you are actually a power user and you know how language models work, you can go in and write your custom prompt and if you're a power user, you want access to the power. Another is for us to be able to discover new use cases. And so one of the lovely things about working on a product like Notion is that there's such an enthusiastic and lively community of ambassadors and people that are excited about trying different things and coming up with all these templates and new use cases. And having a fully custom action or prompt whenever we launch something new in AI lets those people really experiment and help us discover new ways to take advantage of AI. I think it's good in that way. There's also a complement to that, which is if we wanted to use feedback data or learn from those things and help improve the way that we are prompting the model or the models that we're building, having access to that fully diverse, fully general range of use cases helps us make sure that our models can handle the full generality of what people want to do.

Swyx: (21:29) I feel like we've segued a lot into already the Notion conversation, and maybe I just wanted to bridge that a little bit with your personal journey into Notion before we go into Notion proper. You spent a year on a sabbatical, on your own self-guided research journey, and then deciding to join Notion. I think a lot of engineers out there thinking about doing this, maybe don't have the internal compass that you have or don't have the guts to basically make no money for a year. Maybe just share with people how you decided to basically go on your own independent journey and what got you to join Notion in the end?

Linus Lee: (22:05) Yeah, what happened? Yeah, so for a little bit of context for people who don't know me, I was working mostly at seed stage startups as a web engineer. I actually didn't really do much AI at all prior to my year off. And then I took all of 2022 off with—less of a focus on—it ended up in retrospect becoming Linus pivots to AI year, which was beautifully well timed. But in the beginning of the year, there was one key motivation and then one key question that I had. The motivation was that I think I was at a privileged and fortunate enough place where I felt like I had some money saved up that I had saved up explicitly to be able to take some time off and investigate my own questions because I was already working on lots of side projects, I wanted to spend more time on it. I think I also, at that point, felt like I had enough security in the companies and folks that I knew that if I really needed a job on short notice, I could go and I could find some work to do, that I wouldn't be completely on the streets. And so that security, I think, gave me the confidence to say, okay, let's try this experiment. Maybe it'll only be for six months, maybe it'll be for a year. I had enough money saved up to last a year and change, and so I had planned for a year off. And I had one big question that I wanted to explore. Having that single question, I think, actually was really helpful for focusing the effort instead of just being like, I'm going to side project for a year, which I think would have been less productive. And that big question was, how do we evolve text interfaces forward? So much of knowledge work is consuming walls of text and then producing more walls of text. And text is so ubiquitous, not just in software, but just in general in the world. There are signages and menus and books. And it's ubiquitous, but it's not very ergonomic. There's a lot of things about text interfaces that could be better, and so I wanted to explore how we could make that better. A key part of that ended up being, as I discovered, taking advantage of these new technologies that let computers make sense of text information. And so that's how I ended up sliding into AI. But the motivation in the beginning was less focused on learning a new technology and more just on exploring this general question space.

Swyx: (24:17) Yeah. You have the quote, text is the lowest common denominator, not the endgame.Linus Lee: Right. I mean, I think if you look at any specific domain or discipline, whether it's medicine or mathematics or software engineering, in any specific discipline where there's a narrower set of abstractions for people to work with, there are custom notations. One of the first things that I wrote in this exploration year was this piece called Notational Intelligence, where I talk about this idea that so much of—as a total sidebar, there's a whole other fascinating conversation that I would love to have at some point, maybe today, maybe later, about how to evolve a budding scene of research into a fully fledged field. So I think AI UX is kind of in this weird stage where there's a group of interesting people that are interested in exploring this space, but how do you design for this newfangled technology? And how do you take that and go and build best practices and powerful methods and tools?

Linus Lee: We should talk about this.

Linus Lee: We should talk about that at some point. But in a lot of established fields, there are notations that people use that really help them work at a slightly higher level than just raw words. So notation for describing chemicals and notations for different areas of mathematics that let people work at higher level concepts more easily. Logic, linguistics. And I think it's fair to say that some large part of human intelligence, especially in these more technical domains, comes from our ability to work with notations instead of work with just the raw ideas in our heads. And so text is a kind of notation. It's the most general kind of notation, but it's also, because of its generality, not super high leverage if you want to go into these specific domains, and so I wanted to try to improve on that frontier.

Swyx: You said in our show notes, one of my goals over the next few years is to ensure that we end up with interface metaphors and technical conventions that set us up for the best possible timeline for creativity and inventions ahead. So part of that is constraints, but I feel like that is one part of the equation. What's the other part that engenders creativity?

Swyx: Tell me a little bit about that. What are you thinking there?

Linus Lee: I feel like, we talked a little bit about how you do want to constrain, for example, the user interface to guide people towards things that language models are good at. And creative solutions do arise out of constraints. But I feel like that alone is not sufficient for people to invent things.

Linus Lee: I mean, there's a lot of directions I think we could go from that. The origin of that thing that you're quoting is when I decided to come help work on AI at Notion. A bunch of my friends were actually quite surprised, I think because they had expected that I would have gone and worked on—

Swyx: I was eyeing that for you. I mean—

Linus Lee: —at a lab or at my own company or something like that. But one of the core motivations for me joining an existing company and one that has lots of users already is this exact thing where, in the aftermath of a new foundational technology emerging, there's kind of a period of a few years where the winners in the market get to decide what the default interface paradigm for the technology is. So minicomputers, personal computers, the winners of that market got to decide what windows are and how scrolling works and what a mouse cursor is and how text is edited. Similar with mobile, the concept of a home screen and apps and things like that, the winners of that market got to decide. And that has profound—I think it's difficult to understate the importance of, in those few critical years, the winning companies in the market choosing the right abstractions and the right metaphors. And AI to me seemed like it's at that pivotal moment where it's a technology that lots of companies are adopting. There is this well recognized need for interface best practices, and Notion seemed like a company that had this interesting balance of it could still move quickly enough and ship and prototype quickly enough to try interesting interface ideas, but it also had enough presence in the ecosystem that if we came up with the right solution or one that we felt was right, we could push it out and learn from real users and iterate and hopefully be a part of that story of setting the defaults and setting what the dominant patterns are.

Swyx: Yeah. It's a special opportunity. One of my favorite stories or facts is, it was a team of 10 people that designed the original iPhone. And so all the UX that was created there is essentially what we use as smartphones today.

Linus Lee: Right.

Swyx: Including predictive text, because people were finding that people were kind of missing the right letters, so they just enhanced the hit area for certain letters based on what you're typing.

Linus Lee: I mean, even just the idea of, we should use QWERTY keyboards on tiny smartphone screens. That's a weird idea, right?

Swyx: Yeah. The QWERTY is another one. So I have RSI, so this actually affects me. QWERTY was specifically chosen to maximize travel distance. It's actually not ergonomic by design because you wanted the typewriters to not stick. But we don't have that anymore. We're still sticking to QWERTY. I'm still sticking to QWERTY. I could switch to the other ones. I forget.

Linus Lee: Oh, Colemak.

Swyx: Yeah. Anytime, but I don't just because of inertia.

Linus Lee: I have another thing like this. So going even farther back, people don't really think enough about where this concept of buttons come from. So the concept of a push button as a thing where you press it and it activates some binary switch—I mean, buttons have existed for a long time, mechanical buttons have existed for a long time, but really, this modern concept of a button that activates a binary switch really gets popularized by the popular advent of electricity. Before electricity, if you had a button that did something, you would have to construct a mechanical system where if you press down on a thing, it affects some other lever system that affects the final action. And this modern idea of a button that is just a binary switch gets popularized with electricity, and at that point one button has to work in the way that it does in an alarm clock because when you press down on it, there's a spring that makes sure that the button comes back up and that it completes a circuit. And so that's the way that button works. And then when we started writing graphical interfaces, we just took that idea of a thing that could be depressed to activate a switch. All the modern buttons that we have today in software interfaces are simulating electronic push buttons you press down to complete a circuit, except there's actually no circuit being completed. It's just a square on a—

Swyx: All virtualized.

Linus Lee: Right. And then you control the simulation of a button by clicking a physical button on a mouse. Except if you're on a trackpad, it's not even a physical button anymore. It's a simulated button in hardware that controls a simulated button in software. And it's all this cascade of conceptual backwards compatibility that gets us here. I think buttons are interesting.

Swyx: Where are you on the skeuomorphic design love-hate spectrum? There's people that have high nostalgia for the original YouTube icon on the iPhone with the knobs on the TV.

Linus Lee: I think a big part of that is, at least the aesthetic part of it, is fashion. Fashion taken very literally, in the same way that the early Y2K nineties aesthetic comes and goes. I think skeuomorphism as expressed in the early iPhone or Windows XP comes and goes. There's another aspect to this, which is the part of skeuomorphism that helps people understand and intuit software, which has less to do with skeuomorphism making things easier to understand per se and more about—a slightly more general version of skeuomorphism is, there should be a consistent mental model behind an interface that is easy to grok. And then once the user has the mental model, even if it's not the full model of exactly how that system works, there should be a simplified model that the user can easily understand and then sort of adopt and use. One of my favorite examples of this is how volume controls that are designed well often work. On an iPhone, when you make your iPhone volume twice as loud, the sound that comes out isn't actually physically twice as loud. It's on a log scale. When you push the volume slider up on an iPhone, the speaker uses 4 times more energy. But humans perceive it as twice as loud. And so the mental model that we're working with is, okay, if I make this volume control slider have 2 times more value, it's going to sound 2 times louder, even though actually the underlying physics is on a log scale. But what actually happens physically is not actually what matters. What matters is how humans perceive it in the model that I have in my head. And there, I think there are a lot of other instances where the skeuomorphism isn't actually the thing. The thing is just that there should be a consistent mental model. And often, the easy consistent mental model to reach for is the models that already exist in reality, but not always.

Swyx: I think the other big topic maybe before we dive into Notion is agents. I think that's one of the toughest interfaces to crack, mostly because the text box, everybody understands that. The agent is kind of this human-like feeling where it's like, okay, I'm kind of delegating something to a human. I think you made the example of a Calendly, a SavvyCal. It's an agent because it's scheduling on your behalf for something. That's actually—

Linus Lee: A really interesting example because it's pretty deterministic. There's no real—

Swyx: Deterministic. But it works.

Linus Lee: But it is agent in the sense that you're delegating it and automating something.

Swyx: Yeah. It does work without me. It's great.

Linus Lee: So that one we figured out. We know what the scheduling interface is.

Swyx: Well, that's the state of the art now. But, for example, the person I'm corresponding with still has to pick a time for my calendar, which some people dislike. Sam Lesssin famously says it's a sign of disrespect. I disagree with him, but it's a point of view. There could be some intermediate AI agents that would send emails back and forth like a human person to give the other person who feels slighted that sense of respect or a personalized touch that they want. So there's always ways to push it.

Swyx: Yeah. I think for me, other stuff that I think about—I was doing prep for another episode, and I had an agent and asked it to do background prep on the background of the person, and it just couldn't quite get the format that I wanted it to be. But I kept—the only way to prompt that is give a text example, give a text example, give a text example. What do you think the interface between human and agents in the future will be? Do you still think agents are this open ended thing that are objective driven where you say, hey, this is what I want to achieve, versus I only trust this agent to do X, and this is how X is done? I'm curious. Because that kind of seems like a lot of mental overhead to remember each agent for each task versus, if you have an executive assistant, they'll do a random set of tasks and you can trust them because they're human. But I feel like with agents, we're not quite there.

Linus Lee: Agents are hard. The design space is just so vast. Since all of the early agent stuff came out around Auto-GPT, I've tried to develop some kind of a thesis around it, and I think it's just difficult because there's so many variables. One framework that I usually apply to existing chat-based prompting kind of things that I think also applies just as well to agents is this duality between what you might call trust and control. So just now you brought up this example of you had an agent try to write up some prep document for an episode and it couldn't quite get the format right. And one way you could describe that is you could say, oh, the agent didn't exactly do what I meant and what I had in my head, so I can't trust it to do the right job. But a different way to describe it is I have a hard time controlling exactly the output of the model, and I have a hard time communicating exactly what's in my head to the model. And they're kind of two sides of the same coin. I think if you can somehow provide a way to, with less effort, communicate and control and constrain the model output a little bit more and constrain the behavior a little bit more, I think that would alleviate the pressure for the model to be this fully trusted thing. Because there's no need for trust anymore. There's just kind of guardrails that ensure that the model does the right thing. So developing ways and interfaces for these agents to be a little more constrained in its output, or maybe for the human to control its output a little bit more or behavior a little bit more, I think is a productive path. Another more recent revelation that I had while working on this AI autofill thing inside Notion is the importance of zones of influence for AI agents, especially in collaborative settings. So having worked on lots of interfaces for independent work on my year off, one of the surprising lessons that I learned early on when I joined Notion was that if you build—collaboration permeates everything, which is great for Notion because collaborating with an AI, you reuse a lot of the same metaphors for collaborating with humans. So one nice thing about this autofill thing that also kind of applies to AI blocks, which is another thing that we have, is that you don't have to alleviate this problem of having to ask questions like, oh, is this document written by an AI or is this written by a human? This need for auditability, because the part that's written by the AI is just in the autofilled cell or in the AI block, and you can tell that's written by the AI. And things outside of it, you can kind of reasonably assume that it was written by you. I think anytime you have sort of an unbounded action space for models like agents, it's especially important to be able to answer those questions easily and to have some sense of security that in the same way that you want to know whether your coworker or collaborator has access to a document or has modified a document, you want to know whether an AI has permissions to access something. And if it's modified something or made some edit, you want to know that it did it. And so as a complement to constraining the model's action space proactively, I think it's also important to communicate and have the user have an easy understanding of, what exactly did the model do here? And I think that helps build trust as well.

Swyx: Yeah. I think for Auto-GPT and those kinds of agents in particular, anything that is destructive, you need to check in with the user. I know, it's overloaded now. I can't say that.

Linus Lee: Confirm with the user.

Swyx: I need to confirm with the user. Yeah. Exactly.

Linus Lee: That's tough too though because you don't want to—one of the benefits of automating these things is that you can in theory scale them out arbitrarily. I can have a hundred different agents working for me, but that means I'm just spending my entire day in a deluge of notifications, and that's not ideal either.

Swyx: Yeah. So then it could be a reversible destructive thing with some kind of timeout or time limit. So you could reverse it within some window. I don't know. I've been thinking about this a little bit because I've been working on a small developer agent.

Linus Lee: Right. Or maybe you could batch a group of changes and sort of summarize them with another AI and approve them in bulk or something.

Swyx: Which is surprisingly similar to the collaboration problem. Yeah.

Linus Lee: Yeah. Exactly. I'm telling you, collaboration—a lot of the problems with collaborating with humans also apply to collaborating with AI. There's a potential pitfall to that as well, which is that there are a lot of things that some of the core advantages of AI end up missing out on if you just fully anthropomorphize them into human-like collaborators. But—

Swyx: Do you have a strong opinion on that? Do you refer to it as "it"?

Linus Lee: Oh, yeah. I'm an "it" person, at least for now. In 2023.

Swyx: Yeah. So that leads us nicely into introducing what Notion and Notion AI is today. Do you have a pet answer as to what is Notion? I've heard it introduced as a database, a WordPress killer, a knowledge base, a collaboration tool. What is it?

Linus Lee: Yeah. I mean, the official answer is that Notion is a connected workspace. It has a space for your company docs, meeting notes, a wiki for all of your company notes. You can also use it to orchestrate your workflows if you're managing a project, if you have an engineering team, if you have a sales team. You can put all of those in a single Notion database. And the benefit of Notion is that all of them live in a single space where you can link to your wiki pages from your onboarding docs or you can link to a GitHub issue through a task from your documentation on your engineering system. And all of this existing in a single place and this kind of unified—yeah, single workspace, I think has lots of benefits. That's the official line. There's an asterisk that I usually enjoy diving deeper into, which is that the whole reason that this connected workspace is possible is because underlying all of this is this really cool abstraction of blocks. In Notion, everything is a block. A paragraph is a block. A bullet point is a block, but also a page is a block. And the way that Notion databases work is that a database is just a collection of pages, which are really blocks, and you can take a paragraph and drag it into a database and it'll become a page. And you can take a page inside a database and pull it out, and it'll just become a link to that page. And so this core abstraction of a block that can also be a page, that can also be a row in a database, like an Excel sheet. That fluidity and this shared abstraction across all these different areas inside Notion, I think, is what really makes Notion powerful. This LEGO theme, this LEGO building block theme permeates a lot of different parts of Notion. Some fans of Notion might know that when you join Notion, you get a little LEGO minifigure, because it's LEGO building blocks for workflows. And then every year you're at Notion, you get a new block that says you've been here for a year, you've been here for 2 years. And then Simon, our cofounder and CTO, has a whole crate of LEGO blocks on his desk that he just likes to mess with because, he's been around for a long time. But this LEGO building block thing, this shared sort of all encompassing single abstraction that you can combine to build various different kinds of workflows, I think it's really what makes Notion powerful. And one of the sort of background questions that I have for Notion AI is, what is that kind of building block for AI?

Swyx: Well, we can dive into that. So what is Notion AI? So I kind of view it as a startup within the startup. Could you describe the Notion AI team? How seriously is Notion taking the AI wave?

Linus Lee: The most seriously. The way that Notion AI came about, as I understand it, because I joined a bit later, I think it was around October, all of Notion team had a little offsite. And as a part of that, Ivan and Simon kind of went into a little hack weekend. And the thing that they ended up hacking on inside Notion was the very, very early prototype of Notion AI. They saw this GPT-3 thing. The early, early motivation for starting Notion, building Notion in the first place for them, was sort of grounded in this utopian end user programming vision where software is so powerful, but there are only so many people in the world that can write programs. But everyone can benefit from having a little workspace or a little program or a little workflow tool that's programmed to just fit their use case. And so how can we build a tool that lets people customize their software tools that they use every day for their use case? And AI to them seemed like such a critical part of facilitating that, bridging the gap between people who can code and people who need software. And so they saw that. They tried to build an initial prototype that ended up becoming the first version of Notion AI. They had a prototype in, I think, late October, early November before ChatGPT came out and sort of evolved it over the few months. But what ended up launching was sort of in line with the initial vision, I think, of what they ended up building. And then once they had it, I think they wanted to keep pushing it. So at this point, AI is a really key part of Notion strategy and what we see Notion becoming going forward in the same way that blocks and databases are a core part of Notion that helps enable workflow automation, all these important parts of running a team or collaborating with people or running your life, we think that AI is going to become an equally critical part of what Notion is. And it won't be Notion is a cool connected workspace app and it also has AI. It'll be that, what Notion is is databases. It has pages. It has space for your docs. And it also has this sort of comprehensive suite of AI tools that permeate everything. And one of the challenges of the AI team, which is, as you said, kind of a startup within a startup right now, is to figure out exactly what that all permeating kind of abstraction means, which is a fascinating and difficult open problem.

Swyx: How do you think about what people expect of Notion versus what you want to build in Notion? A lot of this AI technology kind of changes—we talked about the relationship between text and human and how human collaborates. Do you put any constraints on yourself when it's like, okay, people expect Notion to work this way with these blocks. So maybe I have this crazy idea and I cannot really pursue it because it's there. I think it's a classic innovator's dilemma kind of thing. And I think a lot of founders out there that are in a similar position where it's, you're a Series C, Series D company, you're not quite yet the super established one. You're still moving forward, but you have an existing kind of following and something that Notion stands for. How do you kind of wrangle with that?

Linus Lee: Yeah. That is, in some ways, a challenge in that Notion already is a kind of a thing, and so we can't just scrap everything and start over. But I think it's also—there's a blessing side of it too in that because there are so many people using Notion in so many different ways, we understand all of the things that people want to use Notion for very well. And so we already have a really well defined space of problems that we want to help people solve, and that helps us. We have it with the existing Notion product, and we also have it by sort of rolling out these AI things early and then watching, learning from the community what people want to do with them. And so based on those learnings, I think it actually sort of helps us constrain the space of things we think we need to build because otherwise the design space is just so large with whatever we can do with AI in knowledge work. And so watching what people have been using Notion for and what they want to use Notion for, I think helps us constrain that space a little bit and make the problem of building AI things inside Notion a little more tractable.

Swyx: I think also just observing what they naturally use things for—and it sounds like you do a bunch of user interviews where you hear people running into issues or describe them—the way that I describe myself actually is I feel like the problem is with me that I'm not creative enough to come up with use cases to use Notion AI or any other AI.

Linus Lee: Which isn't necessarily on you. Again, it goes way back to the thing that we touched on early in the conversation around, if you have too much generality, there are not enough guardrails to obviously point to use cases.

Swyx: Blank piece of paper. I don't know what to do with this. Yeah. So I think a lot of people judge Notion AI based on what they originally saw, which is write me a blog post or do a summary or do action items, which fun fact for Latent Space, my very, very first Hacker News hit was reversing engineering Notion AI. I actually don't know if I got it exactly right. I think I got the easy ones right, and then apparently I got the action items one really wrong. So there's some art into doing that. But also you've since launched a bunch of other products, and maybe you've already hinted at AI autofill. Maybe we can just talk a little bit about what does the scope or suite of Notion's AI products have been so far and what you're launching this week.Linus Lee: (47:15) Yeah. So we have, I think, 3 main facets of Notion AI and Notion at the moment. We have the first thing that ever launched with Notion AI, which I think helps you write. Going back to earlier in the conversation, it's a writing, a content generation tool. If you have a document and you want to generate a summary, it helps you generate a summary, pull out action items, you can draft a blog post. You can help it improve your writing. It can help fix grammar and spelling mistakes. But under the hood, it's a fairly lightweight, thick layer of prompts. But otherwise, it's a pretty straightforward use case of language models. And so there's that, a tool that helps you write documents. There's a thing called an AI block, which is a slightly more constrained version of that where one common way that we use it inside Notion is we take all of our meeting notes inside Notion. And frequently when you have a meeting and you want other people to be able to go back to it and reference it, it's nice to have a summary of that meeting. So all of our meeting notes templates, at least on the AI team, have an AI block at the top that automatically summarizes the contents of that page. And so whenever we're done with a meeting, we just press the button and it'll resummarize that, including things like what are the core action items for every person in the meeting. And so that block, as I told before, is nice because it's a constrained space for the AI to work in, and we don't have to prompt it every single time. And then the newest member of this sort of AI collection of features is AI Autofill, which brings Notion AI to databases. So if you have a whole database of user interviews and you want to pull out what are the companies, what are their core pain points, what are their core features, maybe what other competitor products they use, you can just make columns. In the same way that you write Excel formulas, you can write a little AI formula, basically, where the AI will look at the contents of the page and pull out each of these little key pieces of information. The slightly new thing that autofill introduces is this idea of a more automated kind of background AI thing. So with Writer, the AI in your document, products in the AI block, you have to always ask it to update. You have to always ask it to rewrite. But if you have a column in a database, in a Notion database, or a property in Notion database, it would be nice if whenever someone went back and changed the contents of the meeting note, or something updated about the page, or maybe it's a list of tasks that you have to do and the status of the task changes, you might want the summary of that task or detail of the task to update. And so you can set up an autofilled Notion property so that anytime something on that database row or page changes, the AI will go back and update the autofilled value. And that, I think, is a really interesting part that we might continue leaning into of even though there's AI now tied to this particular page, it's doing its own thing in the background to help automate and alleviate some of that pain of automating these things. But yeah, writer, blocks, and autofill are the 3 cornerstones we have today.

Swyx: (50:05) You know, there used to be this glorious time where Rome Research was the hottest knowledge company out there, and then Notion built backlinks and

Swyx: (50:14) To how? Yeah. In a way.

Swyx: (50:17) I don't know if we

Swyx: (50:18) are to blame for that. No. No. But how

Swyx: (50:20) how do backlinks play into some of this? You know? I think most AI use cases today are kind of single page, right, kind of this document. I'm helping with this. Do you see some of these tools expanding to do changes across things? So we just had Adept Mar from Codium on the podcast, and he talked about how agents can tie in specs for features, test for features, and the code for the feature. So the 3 entities are tied together. I do see some backlinks help AI navigate through knowledge bases of companies where you might have the document the product uses, but you also have the document that marketing uses to then announce it. And as you make changes, the AI can work through different pieces of it. Definitely.

Linus Lee: (51:04) If I may get a little theoretical from that. Yeah. Yeah. One of my favorite ideas from my last year of hacking around building text augmentations with AI for documents is this realization that you know when you look at code in a code editor, what it is at a very lowest level is just text files. A code file is a text file, and there are maybe functions inside of it, it's a list of functions, but it's a text file. But the way that you understand it is not as a file, like a Word document. It's kind of a graph. You have a function. You have call sites to that function. There are places where you call that function. There's a place where that function is tested. Different definitions for that function. Maybe there's a type definition that's tied to that function. So it's kind of a graph. And if you want to understand that function, there's advantages to be able to traverse that whole graph and fully contextualize where that function is used. Same with types and same with variables. And so even though its code is represented as text files, it's actually kind of a graph. And a lot of what all the key interface innovations behind IDEs is helping surface the graph structure in the context of a text file. So things like go to definition or VS Code's little window view when you look at references. And interesting idea that I explored last year was what if you bring that to text documents? So text documents are a little more unstructured, so there's a more fuzzy kind of graph idea, but if you're reading a textbook, if there's a new term, there's actually other places where that term is mentioned, there are probably a few places where that's defined, maybe there are some figures that reference that term. If you have an idea, there are other parts of the document where the document might disagree with that idea or cite that idea. So there's still kind of a graph structure. It's a little more fuzzy, but there's a graph structure that ties together a body of knowledge. And it would be cool if you had some kind of a text editor or some kind of knowledge tool that let you explore that whole graph. Or maybe even AI could explore that whole graph. And so back to your point, I think taking advantage of not just the backlinks, backlinks is a part of it, but the fact that all of these inside Notion, all of these pages exist in a single workspace, and it's a shared context, it's a connected workspace, And you can take any idea and look up anywhere to fully contextualize what a part of your engineering system design means or what we know about pitching a customer at a company. Or if I wrote down a book, what are other places where that book has been mentioned? All these graph following things, I think, are really important for contextualizing knowledge.

Swyx: (53:25) Part of your job at Notion is prompt engineering. You are maybe one of the more advanced prompt engineers that I know out there, and you've always commented on the state of prompt ops tooling. What is your process today? What do you wish for?

Linus Lee: (53:41) There's a lot here. I mean, the prompts that are inside Notion right now, they're not complex in the sense that agent prompts are complex, but they're complex in the sense that there is even a problem as simple as summarize a page. A page could contain anything from no information, if it's a fresh document, to a fully fledged news article. Maybe it's a meeting note. Maybe it's a bug filed by somebody at a company. The range of possible documents is huge, and then you have to distill all of it down to always generate a summary. And so describing that task to AI comprehensively is pretty hard. There are a few things that I think I ended up leaning on, and as a team we ended up leaning on for the prompt engineering part of it. I think one of the early transitions that we made was that the initial prototype for Notion AI was built on instruction following the classic instruction following models, text-davinci-003 and so on. And then at some point, we all switched to chat based models like Claude and the new ChatGPT Turbo and these models. And so that was an interesting transition. It actually made few shot prompting a little bit easier, I think, in that you could give the few shot examples as previous turns in a conversation, and then you could ask the real question as the next follow-up turn. I've come to appreciate few shot prompting a lot more because it's difficult to fully comprehensively explain a particular task in words, but it's pretty easy to demonstrate 4 or 5 different edge cases that you want the model to handle. And a lot of times, if there's an edge case that you want a model to handle, I think few shot prompting is just the easiest, most reliable tool to reach for. One challenge that in prompt engineering that Notion has to contend with often is we want to support all the different languages that Notion supports. And so all of our prompts have to be multilingual compatible, which is kind of tricky because our instructions are written in English. And so if you just have a naive approach, then the model tends to output in English even when the document that you want to translate or summarize is in French. And so one way you could try to attack that problem is to tell the model answer in the language of the user's query. But it's actually a lot more effective to just give it examples of not just English documents, but maybe summarizing an English document, maybe summarize a ticket filed in French, summarize an empty document where the document's supposed to be in Korean. And so a lot of our few shot prompts in Notion AI tend to be very multilingual, and that helps support our non-English speaking users. The other big part of prompt engineering is evaluation.

Swyx: (56:09) Mhmm.

Linus Lee: (56:10) The prompts that you exfiltrated out of Notion AI many weeks ago, surprisingly, pretty spot on, at least for the prompts that we had then, especially things like summary. But they're also outdated because we've evolved them a lot more, and we have a lot more examples. And some of our prompts are just really, really long. They're thousands of tokens long. And so every time we go back and add an example or modify the instruction, we want to make sure that we don't regress any of the previous use cases that we've supported. And so we put a lot of effort and we're increasingly building out internal tooling infrastructure for things like sort of what you might call unit tests and regression tests for prompts with handwritten test cases, as well as tests that are driven more by feedback from Notion users that have chosen to share their feedback with us.

Swyx: (56:54) And you just have a hand rolled, testing framework or use Jest or whatever, and nothing custom out there. You basically said, you've looked at so many PromptOps tools and you're sold on none of them.

Linus Lee: (57:06) So that tweet was from a while ago. I think there are a couple of interesting tools these days, but I think at the moment, Notion uses pretty hand rolled tools. Nothing too heavy, but it's basically a for loop over a list of test cases. We do do quite a bit of using language models to evaluate language models. So our unit test descriptions are kind of funny because the test is literally just an input document and a query, and then we expect the model to say something. And then our qualification for whether that test passes or not is just ask the language model again, whether it looks like a reasonable summary or whether it's in the right language.

Swyx: (57:41) Do you have the same model? Do you have Anthropic criticize OpenAI or OpenAI criticize Anthropic? That's a good question.

Linus Lee: (57:48) Do you

Swyx: (57:49) you worry about models being biased towards its own self?

Linus Lee: (57:52) Oh, no. That's not a worry that we have. I actually don't know exactly if we use different models. If you have a fixed budget for running these tests, I think it would make sense to use more expensive models for evaluation rather than generation. But yeah, I don't remember exactly what we do there.

Swyx: (58:07) And then one more follow-up on you you mentioned some of your prompts are thousands of tokens. Yeah. That takes away from my budget as a user. Yeah. Isn't that a trade off that's a concern? So there's a limitless context window.

Linus Lee: (58:19) Yes.

Swyx: (58:19) Right? Some of that is taken by you as the app designer or product designer deciding what system prompt to provide. And then the remainder is what I, as a user, can give you to actually summarize as my content.

Linus Lee: (58:32) Yeah. In theory, I think in practice, there are a couple of trends that make that not an issue. I think for things like generating summaries, a summary is only going to be so many tokens long. If our prompts are generating you 3,000 token summaries, the prompt is not doing its job anyway. Yeah. But the source

Swyx: (58:49) source doc

Linus Lee: (58:49) is The source doc could be longer. So if you wanted to translate a 5,000 token document, you do have to truncate it, and there is a limitation. It's not something that we are super focused on at the moment for a couple of reasons. I think there are techniques that, if we need to, help us compress those prompts, things like parameter efficient fine tuning. And also the context lengths are it seems like the dominant trend is the context lengths are getting cheaper and longer constantly. Anthropic recently announced their 100,000 token context model recently. And so I think in the longer term, that's going to be taken care of anyway by the models becoming more accommodating of longer contexts, and it's more of a temporary limitation.

Swyx: (59:26) Cool. Shall we talk about the professionalizing of a scenius?

Linus Lee: (59:30) Yeah. I think one of the things that a helpful bit of context when thinking about HCI and AI in particular is historically, HCI and AI have been sort of competing disciplines, competing very specifically in the sense that they often fought for the same sources of funding and the same kinds of people and attention throughout the history of computer science. They used to HCI and AI both used to come from the same or very aligned similar parallel motivations of we have computers, how do we make computers work better with humans? And one way to do it was to make the machine smarter. Another way to do it was to design better interfaces. And through the AI booms and busts, when the AI boom was happening, HCI would get less funding. And when AI had winters, HCI would get a lot more attention because it was sort of the alternative solution. And now that we have this renewed attention on how to build better interfaces for AI, I think it's interesting that it's kind of a scenius now. There are podcasts like this where I get to talk about interfaces and AI, but it's definitely not a fully fledged field. My favorite definition of sort of what distinguishes the two apart comes from Andy Matuschak, where I'm going to butcher the quote, but he said something to the effect of a field has at their disposal a powerful set of established tools and methods and standards and a shared set of core questions they want to answer. And so if you look at machine learning, which is obviously a really dominant established field, if you want to evaluate a model, if you want to solve a particular task or build a model that serves a particular task. There are powerful methods that we have, like gradient descent and specific benchmarks, for building solutions and then reevaluating how to do the solutions. Or if you have an even more expensive problem, there are surely attempts that have been made before and then attempts that people are making now for how to attack that problem and frameworks to think about these things. In AI and UX, I think we're very early in the evolution of that space and that community, and there's a lot of people excited and a lot of people building, but we have yet to come up with a set of best practices and tools and methods and frameworks for thinking about these things. And those will surely arise, and as they do, I think we'll see the evolution of the field. In prompt engineering and using language models and products at large, I think that community is a little farther along. It's still very fast moving because it's really young, but there are established prompting techniques like ReAct and distillation of large instruction following models. And these techniques, I think, are the beginnings of best practices and powerful tools at the disposal of this language model using field.

Swyx: (1:02:06) Yeah. And mostly, it's just following Riley Goodside. It's how I've learned about prompting techniques. Right.

Linus Lee: (1:02:11) Right. Yeah. Pioneers.

Swyx: (1:02:13) But yeah, I am actually interested in this. We've recently kind of rebranded the podcast or the newsletter somewhat towards being for this term AI engineer, which I kind of view as somewhere between machine learning researcher and software engineer, some kind of in between mix. And I think creating the media, creating meetups, creating a de facto conference for it, creating job titles, and then I think that core set of questions that everyone wants to get better at, I think that is essentially how this starts.

Linus Lee: (1:02:47) That's Creating a space for the people that are interested to come together, I think is a really, really key important part of it. I'm always whenever I come back to it, I'm always amazed by how, if you look at the golden era of theoretical physics in the early twentieth century or the golden era of early personal computing, there are maybe 2 dozen people that have contributed all of the significant ideas to that field, and they all kind of know each other. I always found that really fascinating. And I think the causal relationship actually goes the other way. It's not that all these people happen to know each other. It's that because there was that core set of people that always that were very close to each other and shared ideas often, and they were colocated, that that field is able to blossom. And so I think creating that space is really critical.

Swyx: (1:03:31) Yeah. There's a very famous photo of the Solvay conference in 1927 where Albert Einstein, Niels Bohr, Marie Curie, all these physics names are

Linus Lee: (1:03:41) all How many Nobel laureates are in the photo. Right?

Swyx: (1:03:43) Yeah. And when I tweeted it out once, people were like, I didn't know these all live together. They all knew each other, they must have just exchanged so many ideas.

Linus Lee: (1:03:51) I mean, similar with artists and writers that help a new kind of period blossom.

Swyx: (1:03:57) Is it going to be San Francisco and New York, though?

Linus Lee: (1:03:59) That's a spicy question.

Swyx: (1:04:03) I don't know.

Linus Lee: (1:04:04) We'll see. Well, we're glad to at least be a part of your world, whether it is on either coast, but it's also virtual. Right? We have a Discord, it's happening online as well, even if you're in a small town like Indiana. Yeah. Cool. Lightning round.

Swyx: (1:04:21) Awesome. Yeah. Let's do it. We only got 2 questions for you. One is acceleration, one exploration, then our final takeaway. So the first one we always like to ask is what is something that happened in AI that you thought would take much longer than it has?

Linus Lee: (1:04:36) Price is coming down. Price is coming down and or being able to get a lot more bang for your buck. So things like GPT-3.5 Turbo being, I don't know, exactly the figure, like 10 times, 20 times cheaper.

Swyx: (1:04:48) Then GPT the davinci-003.

Linus Lee: (1:04:50) The davinci-003 per token or the super long context Claude or MPG story writer, these long context models that take theoretically, would take a lot of compute to run, but they're accessible to us now. I think they're surprising because I would have thought that before these things came out that cost per token and scaling context length, and these were sort of core constraints that you would have to design your AI systems around, and it ends up if you just wait a few months, OpenAI will figure out how to make these models 10 times cheaper. Or Anthropic will figure out how to make the models be able to take 100,000 tokens. And that the speed at which that's happened has been surprising and a little bit frightening because it invalidates a lot of the assumptions that I was operating with, and I have to recalibrate.

Swyx: (1:05:34) Yeah. There's this very famous law called Wirth's law, also known as Gates' law that basically says software engineers will take up whatever hardware engineers give them. Yeah. And I feel like there's a parallel law right now where language model improvements, AI UX people are going to take up all the improvements that language model people will give them. So you know, while the language model people are improving the costs by a single order of magnitude, you with your Notion AI Autofill are increasing by orders of magnitude the amount of consumption that's

Linus Lee: (1:06:02) being used. Right? Exactly. Before the show started, we were just talking about how when I was prototyping AI Autofill, just to make sure that things sort of scaled up okay, I ended up running autofill on a database with 6,000 pages and just summaries. And usually are fairly long pages, so I ended up running through something like 2 or 3 million tokens in a matter of 20 minutes. Yeah. Which is not too expensive luckily because a lot of it gets cheaper, but it is like 5 or $6, which the concept of running a test on my computer and it's spending the price of a nice coffee is kind of a weird thing still that I'm getting used to.

Swyx: (1:06:37) And Notion, Notion AI currently is $10 a month, something like that. There's ways to make Notion lose money.

Linus Lee: (1:06:44) Negative. You just get negative gross margins

Swyx: (1:06:46) on the test. Not sanctioned by Notion, but I mean, obviously You'll figure it out. You should use it to improve your life and support your

Linus Lee: (1:06:54) workflows in whatever ways that's useful. Okay. Second question is about exploration. What do you think is the most interesting unsolved question in AI?

Swyx: (1:07:03) Predictability, reliability. Well, in AI broadly, I think is much harder. But with language models specifically, I think how to build dependable systems is really important. If you ask Notion AI or if you ask ChatGPT or Claude, maybe a bulleted list of x y z, sometimes it'll make those bullets with the Unicode center dot. Sometimes it'll make them with a dash. Sometimes it'll add a title. Sometimes it'll bold random things. And all of the things are fine, but it's a little jarring if every time the answer's a little stochastic. I think this is much more of a concern for when you're automating tasks or having the model make decisions by itself. Predictability, dependability, so much of the software that runs the world is sort of behind the scenes decision making programs that run inside enterprises and automate systems and make decisions for people, And auditability, dependability is just so critical to all of them. One avenue of work that I'm really intrigued by is in these decision making systems, not having the model sort of internally as a black box make decisions, but having the model synthesize code that makes decisions. So you might ask the model for things like summarization, natural language tasks, you have to ask the model. But if you wanted to I don't know. Let's say you have a document and you want to filter out all the dates. Instead of asking the model, hey, can you grab all the dates? You can ask the model to write a regular expression that captures a particular set of date formats that you really care about. And at that point, the output of the model is a program. And the nice thing about a program is you can kind of check it. There's lots of nice things. One is it's much cheaper to run afterwards. Another is you can verify it. And the program becomes a kind of a, what in design we call a boundary object, where it's a shared thing that exists both in the sphere of the human and the sphere of the computer, and you can iterate on it to fix bugs, and you can co-evolve this object that is now a representation of this decision that you want the model, the computer to make. But it's auditable and dependable and reliable. And so I'm pretty bullish on code generation and other sort of program synthesis and program verification techniques, but using the model to write the initial program and help the people maintain the software.

Swyx: (1:08:59) Yeah, I'm so excited by that. Just in terms of reliability, I'll call out our previous guest.

Linus Lee: (1:09:04) Shreya Rajpal?

Swyx: (1:09:05) Yeah. Yeah. And she's working on Guardrails AI. There's also LMQL, and then Microsoft recently put out Guidance, which is their custom language thing. Have you explored any of those?

Linus Lee: (1:09:13) I've taken a look at all of them. I've spoken to Shreya. I think this general space of more speaking of adding constraints to general systems, adding constraints, adding program verification, all of these things I think are super fascinating. I also personally like it a lot because before I was spending a lot of my time in AI, I spent a bunch of time looking at programming languages and compilers and interpreters. And there is just so much amazing work that has gone into how do you build automated ways to reason about a program, like compilers and type checkers and so on. And it would be a real shame if the whole field of program synthesis and verification just became ask GPT-4, but actually it's not. They work together. You write the program, you synthesize a program with GPT-4 from human constraints, human descriptions, And then now we have this whole set of powerful techniques that we can use to more formally understand and prove things about programs. And I think the synergy of them, I'm excited to see. Awesome.

Swyx: (1:10:09) This is great, Linus. Our last question is always, what's one message you want everyone to remember today about the space, exciting challenges?

Linus Lee: (1:10:19) We were at the beginning. Maybe this is really cliche, but

Swyx: (1:10:22) It's okay.

Linus Lee: (1:10:23) One thing that I always used to say about when I was working on text interfaces last year was that I would be really disappointed if in 1,000 years, humans are still using the same kind of writing tools and writing systems that we are today. It would be pretty surprising if we're still writing documents in the same way that we are today in 1,000 years, right? The language and the writing system hasn't evolved at all. If humans plan to be around for many thousands of years into the future, writing has really only been around for 2, 3,000 years in its sort of modern form. And we should, I think, care a lot more about building flexible, powerful tools than about backwards compatibility if we plan to be around for many more times the number of years that we've been around. And so I think whether we look at something as simple as language models or as expansive as humans interacting with text documents, I think it's worth reminding yourself often that the things that we have today are sometimes that way for a reason, but often just because an artifact of the way that we've gotten here. And text can look very different. Language models can look very different. I personally think in a couple of years, we're going to do something better than transformers. So all of these things are going to change, and it's, I think it's important to have your eyes sort of looking over the horizon at what's coming far into the future.

Swyx: (1:11:47) Nice way to end it.

Linus Lee: (1:11:48) Yeah. Well, thank you, Linus, for coming on.

Linus Lee: (1:11:51) This was great. Thank you. This was lovely. Thanks for having me.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.