Interfacing with AI, with Linus Lee of Notion | Transcription

Transcription for the video titled "Interfacing with AI, with Linus Lee of Notion".

Interfacing with AI, with Linus Lee of Notion | Transcription

Note: This transcription is split by topics. All paragraphs are timed to the original video. Click on the time (e.g., 01:53) to jump to the specific portion of the video.

Welcome Linus Lee (01:00)

Nathan: Hello, and welcome back to The Cognitive Revolution. Today, my guest is Linus Lee, AI product leader at Notion and AI explorer extraordinaire. I've followed Linus online for a couple of years now, fascinated by his many groundbreaking projects and his unique way of thinking about AI systems. From creating novel interfaces that visualize and manipulate generative models in their latent spaces, to developing techniques for semantically editing text and images, Linus Lee has been a pioneering tinkerer. In this wide-ranging conversation, we dig into the details of how Linus goes about his explorations. He shares his toolkit, from PyTorch as a foundation to the custom tools he's built over time for data visualization, model evaluation, and rapid experimentation.

We also discussed the importance of spending time with raw data and failure cases, and the value of building your own tools to deeply understand the problems you're trying to solve. Linus Lee also offers his perspective on the current capabilities of language models, and where he sees the biggest opportunities for improvement. Beyond the high-level goal of better general reasoning, he emphasizes hallucination reduction, better instruction following, and cost efficiency. We also speculate about the future, considering scenarios like models communicating with other models via high-dimensional embeddings, techniques to connect latent spaces across modalities, and the societal implications as AI capabilities continue to advance. Linus Lee articulates a vision centered on amplifying rather than replacing human intelligence, a principle he believes should guide the development and deployment of AI-powered products.

Throughout this episode, Linus Lee demonstrates the curiosity, resourcefulness, and thoughtfulness that have made him such an influential figure in the AI community. His work inspires me to engage more deeply with these systems, to build the tools I need to understand them, and to steer their development in a direction that deepens rather than diminishes human creativity and agency. As always, if you find value in the show, please share it with others who might appreciate it. Particularly right now, as we're establishing the new feed, a tweet or a comment on YouTube would be especially valuable. And of course, we invite you to reach out with feedback or suggestions on our website, Now, please enjoy this deep dive into the art and science of exploring AI systems with Linus Lee of Notion AI. Linus Lee of Notion AI and general all-purpose AI explorer, welcome to the Cognitive Revolution.

Linus: Thank you. Thanks for having me. I'm really excited about this.

Nathan: I have followed you online for a couple of years now, and I've been fascinated by all your many projects and just the way that you think about AI systems. I really genuinely think you're a one-of-a-kind thinker from whom I've learned a lot. And so, going back to when I set out to do this a year ago, you were on the shortlist of guests that I maybe didn't want to have right away because I kind of wanted to get my, you know, get things set up a little bit and have a little more credibility, a little more comfort. But I definitely always knew that I wanted to get you on the show and pick your brain. So I'm excited to get into it.

Linus: Thanks. That's very generous of you and kind of you to say. Likewise, I'm excited to talk about both AI and everything else that I've worked on and kind of what I see coming up ahead.

Nathan: Well, very well deserved. The way I thought we would maybe approach this is, I wanted to give people at the top a little bit of a sense of the kinds of things that you build, and then try to kind of peel back some layers from that to then understand like how you're thinking about that in conceptual terms. Also, how you are doing the work in very, very practical terms, and then kind of a sense for where you think we're going from here, as well as perhaps some very practical tips based on your experience at Notion and elsewhere for people that are looking up to you as an AI application developer. So for starters, you have built some of the most interesting novel interfaces that folks have seen. And, you know, these can be found on your Twitter profile in various places. And you've shown a few off in different talks that you've given. And a few of them jumped out to me that you've demoed recently, but I'd love to kind of just introduce a few of the unique interfaces that you've created, just to give people a very tangible jumping-off point to understand the sorts of things that you put together.

Linus: Yeah, definitely. I think we can talk through them. I think that the more polished demo that I can do off the cuff right now is probably not as good as the ones that are more polished online, so you can find them on YouTube. I can talk about what the demos are concretely, and then I can maybe later talk about my framing for them, because I think that's a part of my work is trying to figure out and evolve exactly what the framing should be.

Controlling generative models in their latent space (06:08)

Linus: And I think that'll come up later in our conversation, but concretely, there are a couple of directions that I've explored. And one that I've been on for the longest time is to try to visualize and control generative models in their latent space, rather than in the input space that we're familiar with. So if you take image generation, for example, the common way of generating images is you describe an image, like a cow sitting in a grassy field with sunlight, long shot, you know, 35 millimeter film, whatever, and then the model generates an image. But another way you could imagine generating or improving upon an image is you start with an image and then you find a different image that has another element that you'd like. Like maybe you like the lighting from another image or maybe you like the framing, or maybe you have a character that you want to bring in from another image. And if there was some way to tell the model, can you merge all of the attributes that you see in both of these images into a single image and kind of blend it together in the semantic space rather than in the pixel space? That could be an interesting way to kind of stitch together images. So, a couple of my demos are around combining images and text in this semantic space of an image model called CLIP to generate, for example, start with a human face and then add emotion to it to make it smile or make it frown or starting with a photograph and then adding different kinds of style to it by just adding vectors in the latent space of a model together. You could do something like this with text as well. So, I have a model that's capable of embedding text into a latent space and embedding space, but then also inverting that embedding to go from the embedding to what that embedding represents as text. And it turns out, not just in that model, in a lot of embedding models that are expressive enough, this transformation is pretty invertible with minimal kind of meaning loss. And so I take that model and then I can start to manipulate the embedding itself to, for example, combine two different sentences together semantically and produce a new sentence or interpolate between sentences. So you have one that's maybe very optimistic and happy and bright, and another that's maybe a little more mellow. And I can take those two sentences and interpolate between them to generate sentences that are in the middle. And I think that kind of thing could potentially allow for other ways of controlling or editing or generating text or even images that let people control output using levers that are hard to verbalize with text. And there are more rigorous and precise versions of this that I've been working on more recently that I can also talk about, but that's the general gist of things that I've been interested in.

Visualizing a high-dimensional latent space (08:30)

Nathan: And sometimes there are some pretty creative front-end experiences on these as well. You want to describe the one also where you have kind of little tiles of text on seemingly an infinite canvas. And then you can kind of drag into different directions and create, like, take that text, but now move it in this direction. You're kind of making that a spatial computing sort of experience, almost.

Linus: Yeah. So the two that I think are most interesting visually, the one that you just described, actually came out of a problem that I ran into where one of my initial prototypes of this involved just having, imagine this high-dimensional latent space of a model, and you want to visualize it on a screen, so you cut a 2D slice through it. And you cut a 2D slice through this high-dimensional space such that the two dimensions that you see on the screen, the bottom edge of your screen, the left edge of your screen, correspond to two different kinds of targets that you move towards. And so maybe the bottom right corner of your screen is a happy sentence about weddings, and the top left corner is about dinosaurs or something. And you can pick any point in this plane, and it'll be a sentence that combines elements of those sentences. That works well for kind of demonstrating the seed of this idea, but the problem that you quickly run into is there are actually way more properties that you want to control when you're generating or editing text than just between two arbitrary anchor sentences. And when you have this fixed plane, you quickly run out of directions that you can move in. And so instead of fixing the directions that the levers that you can pull to adjust the top and bottom edges of your screen. The next revision, which you referenced, involves kind of flipping how the dimensions are presented. So, the way I used to describe it is, imagine you're playing an instrument, and when you play an instrument with one hand, this is very crude, but if you're playing a string instrument with your left hand, you control the strings and you control the pitches of the notes that you're playing, and with your right hand, you kind of control the like the volume or the amplitude of the sound in a very crude way. And this is a way to kind of transfer that in a spatial interface. So this is an infinite canvas where, with one hand holding down keys on a keyboard, you can choose what specific attribute you want to control for text, whether that's like a positive, negative emotion, or whether it's written in fancy English versus simplified English, or what kind of topic it is, is it more about sci-fi or so on. And then with your right hand, you can click down on a sentence that you've put down on this plane, and you can drag it out. And the further you drag out, the more you're kind of increasing the particular attribute that you've picked. So, for example, if you put a sentence down on this 2D canvas, and then you hold down the "make longer" button, and you pull the sentence farther from where you clicked down on it. The farther you pull, the longer that sentence will be. And over time, as you explore this latent space of sentences, you can see all of the branches that you took in this exploration and where each variation came from. And it naturally forms a kind of a branching edit history through which you can follow all of the different kinds of movements that you made through latent space and what the sentences that resulted from are. When you think of these projects, are they, I mean, I can, I think they probably have multiple functions for you, right?

Nathan: One is obviously you're learning by doing all this stuff. How much do you actually use these tools versus viewing them as kind of proof of concept for some further synthesis on top of this later? Do you think these kinds of interfaces are ready for people to actually get utility out of? And do you personally?

Prototypes (12:05)

Linus: Most of the prototypes that I've built for research purposes, I would say, are not good enough to actually be useful. So the kinds of edits that you can make with this latent space canvas UI, the edits are very crude. And the models that I'm using are too small, I think, to let you make very, very precise edits. And I think one of the continued problem areas in which I want to do more research is to try to improve upon that. There are other prototypes that I've made that have been more useful. And so I think it just depends on how good the particular technology or technique that I'm working on is and whether that is close enough to utility or not. Within the kind of set of prototypes that I've built for working in this area of building interfaces for AI, the eventual goal is always to build something that is useful. And for any individual prototype, the technology or technique may be too early to actually be useful. So the prototype may be more about identifying incremental improvements, or identifying problem areas, or identifying what directions to build more in. Or if the technique is more mature, then it may be close enough to utility that I build it more as something that I can use day-to-day and actually get benefit from. But there's always an underlying element of it which is like, this is going to help me better understand what it's for, how it can be useful, how to make it, how to improve on the technique, the quality of the outputs, and so on. An example of a prototype that I built a long time ago that I actually ended up using day-to-day for a long time is kind of an Obsidian or Roam Research style outliner note-taking app. That was my daily note-taking app for a long time where keywords would automatically kind of identify themselves, or entities would automatically identify themselves and connect them to other entities in the graph. And that was using mostly not language models, mostly kind of dumber techniques, but the outputs were good enough that I ended up finding those connections very useful and generative in my kind of creative thinking. And I used it for a long time.

Spatial Computing (13:51)

Linus: And I used it for a long time.

Nathan: That's cool. The 2D Canvas one, in particular, of the recent demos that I've seen, has a sort of Xerox PARC vibe about them, where you're like, this feels like, especially after I went and tried the Apple Vision Pro, it was like, I see something here. And it feels like there is, you know, spatial computing as, you know, spoon-fed to you. And it's amazing. It's, you know, incredibly, like a breathtaking technology experience to go put the Apple Vision Pro on your eyes. But it's very much like a consumer experience that you start off with. But seeing your 2D canvas, imagining that becoming a higher-dimensional space that you could potentially move around in, and then envisioning these sorts of hardware to actually make that feel very experiential starts to give me a little bit of a sense of how some of these future spatial computing notions might be. You know, more than just like, "Oh, look, I've got my YouTube screen over here, and I've got my email over here." But actually kind of using the brain's spatial intuition to do things that are currently hard to do.

Linus: Yeah, using all of the full kind of awareness, awareness of space, and the awareness that we have of how to move around. This is all very intuitive. We learn it when we're a child, and most of the ways that we work currently don't really take advantage of that. I spent a long time iterating on, and even still, every few months, I get a new version of exactly how to describe the motivation for building out these weird spatial demos that are technically not quite there yet but gesturing towards something interesting. The most recent version of this, I think, is actually somewhat relevant to how humans interact with space, among other things.

Spectrograms (15:37)

Linus: The closest metaphor that I've found for the reason that this direction of research is so interesting to me—I think language models work so well because they work with a kind of alternative, better representation for ideas than humans work with. The closest analogy that I have is like—spectrograms are important when people are dealing with audio. Normally, sound is like a wave in space. It's just a single kind of, I'm imagining, like a single string vibrating back and forth over time. And if you work with audio, that's the base thing that you work with. It's a basic representation. But if you're a professional with audio, then you actually, most of the time, work in a different representation space where you don't look at vibrations over time, but you look at a space of frequencies over time, or what's called a spectrogram, which is a visualization of frequencies over time. There is still a time axis on the one hand, but the other axis is no longer the amplitude of the vibration. It's like the prominence of different frequencies. So, if you imagine like—on the left side of your graph, every row is a different pitch, and every row gets a little mark when there's a sound that hits that pitch. It's a little bit like Western music notation, but it's like colorful gradients often appear in these spectrograms. And what you can see, because you've broken out the prominence of every different kind of pitch, every different frequency, is you can see things like, "Oh, here's a musical pattern that repeats where here is where the bass comes in, or here's where the tone shifts from this kind of very full sound to this more mellow sound." All these things you can notice, and then you can start to manipulate the sound in the frequency space. So you can do things like bring down the lows or add reverb only to the human voice range of sound, all these kinds of transformations that you're not really making to the sound waves themselves, you're making to this kind of projected out version of the sound that exists in this alternative frequency space, if that makes sense. The kind of interface that I'm eventually building towards is a tool that lets you edit text or work through ideas, not in the native space of words and characters and tokens, but in the space of actual meaning or features, where features can be anything from, "Is this a question? Is this a statement? Is this uncertain or certain?" to topical things like, "Is this about computers versus plants?" or to probably other kinds of features that we don't really even have words for because language models implicitly understand them. We implicitly understand that, but we don't need to ever verbalize it. So, an alternative way of projecting or viewing what's going on in language, and then maybe a way to start editing them.

Spectrograms as a bridge between modalities (18:04)

Linus: So, an alternative way of projecting or viewing what's going on in language, and then maybe a way to start editing them. That's like the closest metaphor that I've found for describing the kind of thing that I want. And then once we have that representation, the one reason that spectrograms work so well is because humans intuitively understand color as a way of representing range and positions in space, so we are representing range because we have this geometric intuition. And so, maybe rather than looking at ideas as just kind of like ink scribbles on paper, maybe we can also look at them in a spatial way. Canvas was one kind of way of exploring that. I think there are a bunch of other directions that we can go once we have a better, more rigorous understanding of what's going on inside these language models. But eventually, I think we should be able to interact with ideas by bringing them out into the physical space that we occupy, or being as close to that as possible, because we want to bring our kind of full senses and proprioception and sense of scale and sense of distance into how we work with ideas, rather than just kind of like eyeballs and pixels on the screen. Yeah, that's really interesting.

Nathan: I'm just looking up to try to figure out, was it Refusion, which might've been the project that people would know most of all, that was a stable diffusion fine-tuned, right, on these spectrogram images? So definitely check that out for further intuition there, because it's also an interesting kind of foreshadowing of one of the things I want to talk about a little bit later is just how all these spaces seem to be very bridgeable to one another in many cases through relatively simple adapters. Here you have a vision model trained to generate these spectrograms, and then that is just converted to music. The AI part stops at the generation of the spectrogram image. But at one point, this was one of the leading text-to-music techniques, all working through this vision backbone. I was going to say that the model is generating audio, but not in the typical space that we think of for presenting audio, which is waves in space, but it's generating in this spectrogram space.

Linus: But it's actually a little bit more convoluted than that even because most speech generation, speech synthesis models, or even a lot of music generation models, they actually, a lot of them do operate in this kind of frequency space, in this spectrogram space. I don't know how the state of the art is now, but until very recently, the way that you built speech recognition systems was also improved by feeding an input in this frequency space rather than just kind of like waves in space. And so, audio models used to naturally operate in this domain. And then what Refusion does is it actually takes a model architecture that is meant for generating pixels in a 2D image, using convolutions and applies that to this, like generating this thing that's supposed to represent audio. So it's coming through with kind of like two jumps in representation, and it still manages to work quite well.

The high hit rate of ideas in AI (20:50)

Nathan: It's amazing. That's one of my big kind of themes and takeaways from, you know, a few years now of increasingly obsessive study of the whole space is just what an incredibly high hit rate there is on ideas that actually turn out to work. And obviously, not every idea works, but as somebody who spent a little time in catalytic chemistry once upon a time, I can say it's definitely probably at least two orders of magnitude higher in AI right now than it is in many more well-mined corners of the physical sciences.

Linus: If you have a lot of data and a lot of compute, and you have a way of efficiently putting that compute to work and squeezing a model through that process. The model architecture is kind of like a final corrective term or coefficient in how well this whole thing works. It turns out if you can make gradient descent run on a lot of numbers very quickly with a lot of data that's high quality, most ideas will work within an order of magnitude, maybe, of efficiency. And so a lot of these ideas, I think, I attribute to that. And then the recent advancements to the fact that we have now built up kind of all of the open-source infrastructure and know-how of how to do this, like model training thing at industrial scale, rather than in like research lab scale, which to me is like a few GPUs, a few hours, maybe like a few days. And then industrial scale is like thousands of GPUs, many, many weeks, and going from one to the other is in the same way that you want to synthesize a little bit of chlorine in a lab. It's very different than building a chlorine factory. Running these models at scale, I think, is a similar kind of huge qualitative jump in terms of the kinds of knowledge that we have to have. But like we have a lot of it now. And so we can apply that compute to any arbitrary problem, given there's enough data and given there's an architecture that doesn't suck super hard. And I think the result of that is that we've suddenly seen an explosion of models that perform very well, all these modalities and cross modalities.

The importance of data, quality, and scale (22:48)

Linus: And I think the result of that is that we've suddenly seen an explosion of models that perform very well across all these modalities and cross modalities.

Nathan: Yeah, compute, you've got to have data. The quality is critical, and scale is also critical. And then the algorithm, it just kind of depends on how it determines how much compute you have to have to make full use of the data. That's like, if I set my brevity to max, that's about as brief as I could give, hopefully a decent account.

Linus: Yeah. As long as the architecture is not bottlenecking what information can propagate through the network, everything else is just kind of like, how many floating-point operations can you do? Going back to your exploration of all this technology, so you've been at it for a couple of years now.

Nathan: I'm a little bit unclear on the timeline, but I have to say I feel like I encountered quite a few ideas first through some of your experimental projects. And then later, it seems like they have become more formal as like mechanistic interpretability research publications. Some of the, you know, a couple of canonical results from the last handful of months, the representation engineering paper from Dan Hendricks and collaborators and the toward mono-semanticity paper out of Anthropic. I feel like you were basically doing that kind of stuff with small models well before those things kind of hit mainstream.

Sources of inspiration (24:06)

Nathan: Some of the, you know, a couple of canonical results from the last handful of months, the representation engineering paper from Dan Hendricks and collaborators and the toward mono-semanticity paper out of Anthropic. I feel like you were basically doing that kind of stuff with small models well before those things kind of hit mainstream. How would you describe your own sources of inspiration for that? Or maybe I have it wrong, and there were more academic papers coming out that you were able to take inspiration from, but how did you get to this? These kinds of advanced, like middle layers, manipulations concepts as early as you did?

Linus: So I don't come from academia; my actual technical background is in product engineering, building web apps, which is like the furthest thing you can be from training models. And so I have to kind of step my foot in slowly into the research side of the community, the ML community, and figure out not only how that side works and kind of how to engage people in that world, but also how ideas propagate through both of these parts of the community. My old mental model of how research works used to be that researchers are always at the forefront, and they try these things, and they come up with ideas, and if the ideas are good, they get written about, and then eventually the industry picks them up. And then eventually it percolates down to smaller and smaller, kind of like perhaps less resourced groups. And I think that's maybe true in other fields like biology, where there's perhaps a larger gap in capability between what labs can do and companies can do. Even there, I don't think it's actually perfectly true, but certainly in machine learning, because all of the tools are so accessible, especially in the last few years, my picture now of how ideas percolate is actually a lot more nuanced, which is that at any moment, there are a large group of people that kind of know all of the things that we know about models and what you can do with them. And some of those people exist in the industry with the tools that they have. Some of those people exist in academia. Some of these people are like hobbyists that are just kind of tinkering on the side, even when their day job is doing something else. And different people might stumble upon the same techniques or the same ideas at the same time because most of what results in the next ideas that get discovered is actually, I think, a function of what is already known rather than the specific ideas that individual people have. And so ideas like the intuition that there was a linearized kind of feature space in the latent space of these language models that you could do edits in, I think there was an idea that was kind of intuitively known for a lot of people working in this space. And some people like me, I think, chose to kind of prove that out by doing what I personally enjoy doing the most, which is building interactive prototypes on the web, because that's kind of my expertise. And then I think other people perhaps it took longer to be a lot more rigorous and theoretically robust, and write things about it and get it out that way. And there's a kind of different way you speak about ideas there versus speaking about ideas in the interface world. But I think most of the ideas that come about are just a function of what was already known in different parts of the community, both in research and in industry, that just happened to talk about them in different terms and perhaps reach other parts of the community at different times.

How do you work? (26:51)

Nathan: I mean, you've had to solve a lot of problems, if I understand correctly, right? And this gets into the "How you work" section, I suppose, but like, if I understand correctly, you are, for example, figuring out your own features, right? Like, I think you've done a mix of things where sometimes you can take a straight embedding, which is just kind of a one-hot lookup, and then do that again and mash those up. And that's relatively straightforward to do. But then, as you get deeper into the network, you're also doing, at least in some of your projects, you're making edits farther down through the layers of the model. And there you have a much more conceptually challenging problem of, okay, I may have an intuition that I could probably steer around in this space and steer the outputs, but how do I begin to figure out what direction is actually what? If I understand correctly, you've largely had to figure those sorts of things out for yourself, right? You weren't able to really use too much in the way of open-sourced GPT-2 or GPT-J or whatever. These things didn't come with mid-layer feature sets.

Model autopsy (28:03)

Nathan: Exactly.

Linus: My favorite mental model for research is actually related to the exact path that I followed to hit that wall and then go past it, which is that when I first started getting into deep learning models, most of what I did was read and try to understand things that were in the open-source world. So, like, I had a hard time reading academic papers in the beginning in deep learning because I didn't have all of the kind of vocabulary and the way that people write about these things in my head and in my hands. And so I would just go to the Hugging Face Transformers repository, and I would download GPT-2 and get the minimal version running. And I'd be like, "Oh, this is like generating texts that recognize it seems to work. Now that I have it, now that I have like a live entity that is working on my computer." And I understand Python, so I can put print statements everywhere and see exactly what's happening. And so that's kind of what I did. I basically did model autopsy on the things that were running on my computer because I felt like I think that I could understand and wrap my head around. And then I would try to understand things that way. And once I understood those models, I would try to understand things that were a little bit more on the frontier. And eventually, what you hit is you hit a wall where, for a while, you can get away with like, "Oh, I hit this problem. I'm going to Google how other people solved it." And you arrive at a Hugging Face forum post. And then maybe you get a little more esoteric and you arrive at a GitHub issue. And you're like, "Okay, this is kind of like, there isn't consensus, but here are some things that people tried and maybe they work." And then at some point, you hit a wall where you're like, there's nothing on the internet that's an established consensus for how to solve this problem. And you have to go find a paper that somebody wrote three months ago that purports to solve this. And then sometimes it's well-established and well-cited, and you can trust it. And other times you're like, well, this paper has two citations, and I don't even know if it actually works. And so you have to then find an implementation that has one star on GitHub that tries to reproduce it and run it, or oftentimes those are like broken too, and so you have to then like write your own implementation, which you're not even sure if it's the correct one, that it's the same one that the researchers used. So you hit a wall at which point you're kind of at the frontier of what we know as a research community or as an industry, and then but there are still problems to solve. I still have things that I want to do that I don't know how to do. And so, in the same way that if you hit a weird Python bug, you keep digging and digging, and then maybe you do some debugging. And eventually, maybe you have to go into the Python interpreter. I hit a wall with trying to find interesting features in these models or trying to even build models that I could use for these things that were capable of being amenable to these edits. And I hit a wall, and I had to eventually read a bunch of papers and develop my own intuition and try to combine the pieces that I found to say, "Here's an approach that might work based on what I know." There's no obvious consensus on it, but it might work, so I'll try it. And that's been the process most obvious to me for first building the autoencoder that I use for all of these embedding space manipulations, the model that I talked about that lets you go between text and embedding fluidly. And then second, for the work that I'm currently most focused on, which is a more unsupervised, kind of scalable approach to finding these edit directions, which builds on top of a lot of the mechanistic interpretive work that organizations like Conjecture and Anthropic have done. But that's also quite on the frontier. And there are some techniques there, but they haven't really been applied to embedding spaces yet. They've mostly been applied to kind of normal GPT language models. And so I've had to take a lot of those techniques and adopt them as well. And a lot of that is like, finding it weird, like literally, I was doing this earlier this morning, finding weird GitHub repositories that have like three stars and trying to transplant it over and hope that I'm not messing up the implementation and see how it works.

What is a transformer? (31:32)

Nathan: So, before getting into a little bit more detail on how exactly you go about that, how would you characterize, you've obviously built up a lot of intuition over the last few years of doing this. How would you describe what's going on inside a transformer, or perhaps what's going on inside a diffusion model? You can kind of break that down however you want. I don't just mean like, you know, describe the blocks, but like, how is the information being changed, you know, in more sort of... I guess another related question there is, you said you think that the models have a better representation of a lot of ideas than we do. I'm very interested to hear how, as you describe what's going on in a model, maybe you could contrast that against what you think is going on for us. I feel like I have a pretty similar setup in some ways to a transformer where it seems maybe I'm just like so, you know, prisoner of the AI paradigm now that I just see myself through that, which would be a weird irony. But I do feel like there is this sort of nonverbal, high-dimensional... You know, there's kind of only semi-conscious churning going on in some middle layers somewhere for me that then kind of gets done and feels like, "Okay, I'm happy with that now. Now you can speak." And then it kind of, you know, in the last layer, so to speak, it gets cashed out into tokens. But yeah, so what's going on in a language model at kind of a, you know, conceptually descriptive level or representationally descriptive level?

How do we sample from the output of a transformer? (33:04)

Nathan: And then it kind of, you know, in the last layer, so to speak, it gets cashed out into tokens. But yeah, so what's going on in a language model at kind of a, you know, conceptually descriptive level or representationally descriptive level?

Linus: My mental model for how humans work, I think, includes a really important piece, a piece that I've found to be more and more important over time, which is that I don't actually think, I think that the default assumption, I think intuition for how we think we work. Our brains work in that we think through things, and then we draw conclusions, and then we act upon them. But I don't actually think that's true at all. And I think what happens instead is our brain and our body take actions, and then, observing those actions, our brains retroactively build up a narrative for why we performed those actions in a way that seems causally consistent with our understanding of the world, which is a whole other interesting thing. And maybe this is what you'd like to call consciousness. But on the model side, there are different levels that you can look at this. And I think for transformer language models, the two levels that are most interesting are, how is the transformer processing information within a single forward pass, generating a single token? And then, how are we sampling from the output of the transformer? So I'll talk about the sampling first. The job of any language model is, it's called a language model because it's a statistical model, and the statistics that it's modeling is the probability distribution over all possible outputs. And in a lot of cases, this is a conditional probability distribution. So the probability distribution that the model is modeling statistically is, here's some start of a sentence, give me a probability distribution over all possible ways this input can continue. And this is kind of like a cloud of probability over every possible branch that the model can take. And obviously, that's way too many outputs. If we had infinite resources, we would simply ask the model for the probability rating of every single possible continuation and just pick the one that the model thought was the highest. But we can't do that because we just don't have the FQ. And so instead, we have to randomly sample from this distribution, this cloud of possible continuations. And there's a bunch of different ways to do the sampling. And I think the most common one is to just sample every token and immediately kind of commit it. So, ask the model what's the most likely token to come next. The model says, "Here are the top 50 most likely continuations," and you randomly pick one of them according to probabilities, and you just do that over and over and over again. But there's a bunch of other ways to sample from this. You could have a kind of running buffer of n branches and then only set things in stone when you're like n steps in, and you think that that's the most likely continuation. There's a name for a technique like that is called beam search. There are other ways to try to take into account not getting the model to repeat itself as much, or taking the probability distribution and kind of warping it in interesting ways to try to emphasize the... There's an interesting technique where you have a small model and a large model, and you assume that the large model improves upon the small model always. And so, you take the gap in the distribution and you amplify the difference. And then you try to approximate what an even smarter model would predict. So, a bunch of techniques for sampling from this distribution. But ultimately, once you have a good probability distribution that the model outputs, you're trying to find the continuation of a text that's most likely to maximize the likelihood. So that's the how we sample from the distribution part. And then there's the how we get the distribution part, which is what's happening in a single forward pass of a transformer.

How does a transformer process information within a single forward pass? (36:27)

Linus: And then there's the how we get the distribution part, which is what's happening in a single forward pass of a transformer. And this is very transformer specific, but the best mental model that I have, which a lot of this I owe to the excellent write-up that groups like Anthropic have done, is a transformer takes in a bunch of tokens, and a transformer is basically a stack of mini neural networks, a collection of mini neural networks where each token gets its own little mini neural network. Within each little mini neural network, there's a bunch of layers where there's a main kind of artery where the information just flows down, and then every once in a while, there's an offshoot where the transformer can go off and do a little bit of computation to iterate on its understanding of what's going on and then merge that change back into this main artery. Large, small models have like 12 to 24 of these kind of branch and merge paths within each kind of mini transformer in a token. Very large models can have up to like a hundred of these little blocks, but imagine you have a sentence and for each token in that sentence, you have a little mini transformer that over time, it continues to iterate on the model's understanding of what the token represents. And then, after each branch and merge back into this artery, all of the arteries of the transformer, the mini transformers, can exchange information with each other. The way that little token-level mini transformers exchange information with each other is they broadcast some information about what information they contain and also what information they're looking for from other mini models, mini stacks of the transformer. And then, based on what they have and what they're looking for, they learn to fetch information from other token stacks in this transformer. And then they merge those back into the RWB. So, a transformer is a repeated cycle of, here's some information I have, fetch more information from other tokens, and then do some iterative computation on it. For more information, iterate on that, get more information, iterate on that, until you get to the bottom of the stack, and then that gets put through this thing called the softmax, which gives you the final probability distribution. But ultimately, a transformer is a thing where for every token you do a bunch of iterative information processing, and then you exchange information with other mini transformers behind each token.

The main artery of a transformer (38:44)

Linus: But ultimately, a transformer is a thing where for every token, you do a bunch of iterative information processing, and then you exchange information with other mini transformers behind each token.

Nathan: Does that make sense? Yeah, I think the Anthropic work that you've alluded to has certainly been very influential for me too. I think when you say artery, main artery, you know, synonyms for that would include the residual stream and the skip connection. And I think, I don't know who all was involved with this work. I'm sure multiple people, but I also definitely associate this with Neil Nanda and watching some of his tutorials where he'll say things like, "The residual stream, it's really important, man."

Linus: And you're like, "If you say so, I believe you." The residual stream is the name for the artery. And then within each transformer block, attention is the mechanism by which the little tokenized residual streams exchange information with one another. And then the MLP layer is the thing that actually does the actual, in my mind, most of the actual computation and meaning association happens within these MLP layers. There's a lot of other weird nuances about this. When humans think of our computation step, we think of a discrete mapping from one input to another, like mapping from Paris to the Eiffel Tower. But it turns out it seems as though a lot of times this meaning formation may be distributed across multiple blocks over time. And so there's a lot of strange things about exactly what human interpretable computation happens and how it's spread out across all of these mechanisms. But in terms of just like raw bits flowing through, I think that the clearest picture that I've found is the one that I just described. Yeah, that's good.

Nathan: So now let's go back to a little bit more of the practical, how you actually do this work. And then we can kind of go to a higher order concept again and talk about some of the capabilities and big questions around what they can actually do, and where we're confused and where we're confident about what's actually happening. So, okay, so you've just described a sort of flow of information. You've got this main kind of path, and then you've got these sidebars where the computation happens. There's this like attention block, which is the place where each of the mini streams, each token gets its own stream. They can put out a vector indicating what information they contain. They also put out a vector indicating what information they are looking to receive. Those all get crossed, and thus is the information passing from token to token. And then they continue on, and each token kind of loads in various associations that are encoded in the weights. And that's like, where the information is stored in some way, some weird way.

Frameworks for working with transformers (41:20)

Nathan: And that's like, where the information is stored in some way, some weird way. Okay, cool. So now that has to get instantiated in code. What libraries, frameworks, templates, tutorials do you use to have the right level of convenience and abstraction, but also kind of clarity on what it is that you're setting up? Most of the time, I'm working in PyTorch on a GPU, like the Linux instance that I have on AWS PyTorch.

Linus: I guess the main alternatives to it would be JAX and TensorFlow. It seems like TensorFlow is in a bit of a maintenance mode at the moment from Google, and JAX, I think, is nice for many other reasons. I guess I should describe a little bit my intuition for what these things are, how they differ. So PyTorch, in my example, or in my intuition, gives you the most concrete way of interacting directly with the vectors and matrices. Like PyTorch, when you instantiate any tensor or any matrix or any vector in PyTorch, you can look at the numbers and it's a thing that feels like you can hold on to it. It's a concrete value. And the way you write a program or a model in PyTorch is you have some concrete value, and you apply operations to it, which is the way that normal programs are written. The mental model that I have for writing programs in PyTorch is it's a little more like a compiler in that you describe the model as a series of computation steps. So you're kind of describing a model as a function with these 12 steps you apply to the input, or these 96 steps you apply to the input. And then once you have a model, you can compile it, and then you can put an input through it, and then out comes the output. But there's, it's harder to, there are some affordances for this now, but it's harder in libraries like JAX to just poke inside of the model and say, "Okay, what is the actual matrix value here? Let me look at all the numbers, and let me poke inside of it and play with it." Because the library isn't constructed that way for performance reasons. If you can describe a model that's kind of straight through control flow graph or a function way, it's a lot easier to apply complex transformations and compilation steps to this to get higher performance. But the trade-off, then, is that you can't concretely play with the values. And I think, especially if you're doing lots of interpretability work, it pays to just be able to look at a specific value at any random place in the network and say, "Okay, what does this number correspond to? What happens if I treat this number to be this other number?" And for that reason, I think, in general, in the ML world, I think between JAX and PyTorch, most researchers use PyTorch, I think. And then if you're from Google or ex-Google or using Google Infra, you use JAX. And then I think in interpretability, things are a lot more skewed towards Torch and these other tools that let you play directly with the values because it just pays to have more observability into the values.

Complementing PyTorch with other libraries (44:02)

Linus: And then I think in interpretability, things are a lot more skewed towards PyTorch and these other tools that let you play directly with the values because it just pays to have more observability into the values.

Nathan: My general understanding is PyTorch is basically totally dominant outside of Google and has, for all the reasons that you mentioned, I mean, it's kind of won the open-source battle. To what degree do you need to complement that with other libraries? Like, I know, for example, Neil Nanda has a library that is kind of meant for these sorts of visualizations, but I don't know how much visualization you're doing of activations, you know, as the forward pass is happening versus how much you're able to just get by with the more human-readable inputs and outputs. Do you actually spend a lot of time visualizing what's going on inside? I do.

Linus: I try to. I would a lot more if the tools made it a lot easier. I think the kinds of visualizations that are beneficial for studying autoregressive transformers are slightly different than the ones that are useful for studying embeddings. Parts of the existing infrastructure I can use, parts I have to create on my own. Thankfully, visualizing things is something that I'm relatively better at compared to like building models. And so it doesn't take me as much time to build an embedding space or graphically explore a bunch of outputs of a model that I can whip together with some React front-end pieces, because that's mostly what I do at work. This is something that I think holds true for both our AI work at Notion and also my work independently, especially when you're early on in a field or early on in an investigation, which I think I would claim that we are early on with language models in general. Nobody really knows exactly what they're doing. The quality of the tools and how much you can iterate on the tools, I think, bottlenecks how much you can iterate on the thing that you're working on with the tools. And so it pays to be able to quickly tweak the tool or add the functionality that you need to see something new, whether that's a tool that's for evaluating models or running models or visualizing things either in the outputs or in the training behavior. And because of that, I think I've mostly defaulted to building my own little tools whenever I needed them. And then eventually, if I realize that I'm building the same kind of tools a bunch of times, I consolidate that into a little library or a script or something that I can reuse. To me, that flow of quickly whipping up something that works, and then once you realize it's being repeated, then, being able to smoothly turn that into a reasonable component. In some sense, this is all software engineering is. But being able to do that really fluidly, I think, is the meta bottleneck. And the speed at which you can do that, I think, is the thing that bottlenecks the quality of the tools, which is then the thing that bottlenecks the actual quality of research you can do. So a lot of my own work is a kind of tool that I've built over time for myself. When there are reusable pieces, obviously I reuse, but I try not to adopt anything that is very hard to see through. Like the Hugging Face Transformers library is very hard to see through because it's meant to serve like a million different use cases. And it's meant to host like a billion different models. And because of that, there's tons of interaction. It's very hard to just ask a simple question, like when the model is at this block, what are all the vector operations that are happening? It's like impossible to answer because the answer to the question depends on a bunch of different questions, and there's like 12,000 if branches in any single target-based function. And so the conclusion that I'm trying to get to is that building my own tools that are much simpler and just fit for my use case lets me iterate quickly and understand what's going on a little better.

Nathan: So just to make sure I understand that correctly, I think I do. It's PyTorch, and that is basically a super rock-solid infrastructure foundation. And then after that, it's very little else aside from things that you roll your own to address particular needs and then only selectively build anything more than just kind of an ad hoc problem-solving addendum.

Linus: There are pieces that at this point I've reused enough and consolidated enough that they're like a staple part of my, like when I create a new folder in my model repo, personal model repo, there are a bunch of stuff that I import by default because they're like libraries for components, UI components that are reused, little code for like running evals on models, training models. If I'm not importing them directly, then I'm certainly copying and pasting them over. And then one of the things that I've learned in doing more research things over building products is that in research land, I just do not feel guilty about copying and pasting code because you have no idea how the thing is going to change. And it may be that copying and pasting it's just kind of like saving you from not having to overgeneralize anything. So there were a lot of pieces that I ended up reusing over time or importing over time, but those tend to be like things that I built up over time, rather than someone saying, "Here's a library that helps you do interpretability research," and then me importing it and learning how to use that. Okay, cool.

Models used for interpretability research (48:25)

Nathan: That's very interesting. What sort of models do you use today? Obviously, you know, when you started, the available models were, interestingly, probably much closer to the frontier, like GPT-2, I think you started in the kind of GPT-2, GPT-J era, right? So at that time, you had both way worse models than we have today, but way closer to the frontier capability than probably what you can manage to use in any sort of convenient way today. So what models are you using now when you want to do a new experiment? Yeah, GPT-J was like the biggest open-source model that was available when I first started doing this.

Linus: And then since then, I've upgraded parts of my infrastructure to like Llama2 and stuff. So in general, I think most interpretive research these days is using either GPT, GPT-2 is just like, there are so many interesting things about GPT-2's place in time, and also GPT-3, but it's not open source. It's harder for people to work with, but GPT-2 and GPT-3 were trained at a time when you didn't have to do a lot of data filtering over the internet because there was no existing language model data. And so it's kind of like pre-radiation steel in that it's trained on things that are guaranteed to be not AI-generated. And there's a lot of other interesting things about those models. All of them have been very well documented. It's a little bit like the fruit fly of the language model world. And so GPT-2 is very popular in general.

LLAMAs, PyTorch, and T5 (49:47)

Linus: And so GPT-2 is very popular in general. I think a lot of people are doing interesting work with LLM-2 or LLM models because they're also popular with those models. I think a lot of serious, more academic interpretability work, especially more theoretical ones, have moved on to this family of models called PyTorch, which is specifically trained for interpretability research. The way that it's specifically trained for that is that it has many more checkpoints through the course of training that you can look at, so you can look at how features evolve over time. It's fully reproducible, all the data is available, the training code is well known. That's in general what you find. A lot of my work, because again, I deal a lot of the time with embeddings, I work with open-source embedding models, or if I need to do reconstruction specifically, I have a custom model that I've trained for reconstructing text out of embeddings that's based on T5. Which also was a cutting-edge model when I first trained this model, but obviously now it's not anymore. But T5 continues to be, I think, a very rock-solid model for if you want to fine-tune or if you want like a medium-sized model to do anything complex. And so I have a T5-based embedding model that I've trained once for this like embedding, going back and forth between text and task. And then I continue to use that for a lot of different things. And at this point, using that is easy because, again, I've built a lot of scripts and tools for working with a particular model, and I have a bunch of datasets that I've generated with this model that take a bunch of time to generate. So again, I feel like when I sit down to work now, I have a kind of sprawling workstation, or a little like a workshop environment where I've put all of the little tools and ingredients exactly where I need them at arm's length. And if I pulled in a new model, then I'd have to redo a bunch of stuff. But now that I have a few stable sets of models that I'm working with, I can just reach for them and put pieces together really quickly. And so my custom model is the main one. And then, given the quality and the popularity of either open-source embedding models that are at the cutting edge or OpenAI embedding models, for example.

Exporting a Personal Work Environment (51:38)

Linus: And then, given the quality and the popularity of either open-source embedding models that are at the cutting edge or OpenAI embedding models, for example. Sometimes I work with those as well.

Nathan: How sort of exportable do you think that personal work environment is? You know, obviously a lot of people, I think, are very curious about this kind of exploration and would like to be able to do some of it. And honestly, I would include myself in this to a degree. Like I'm, I think I have a pretty good conceptual understanding of most of these questions these days, but I'm not that facile with the code to be able to go in and just run whatever experiment. It's definitely still a barrier for me to have a hypothesis and then be like, I can go run that experiment myself. I would find that to be broadly hard and slow. Do you think that everybody just kind of has no choice but to, if they want to get there, you just have to slog through it and ultimately end up kind of creating your own thing? Or do you think I could like drop into your cockpit and learn to fly your plane, so to speak?

Linus: It's a balance, right? I think that there's a part of, there are bits and pieces of my tooling that are in exportable form. Some of them I designed them that way from the start. Some of them I've found to be kind of an isolated piece that I can now like share. Some of them are under the Notion umbrella, so I can't open-source them that easily, but some of them I might in the future. My mental model for this, again, is if you're a watchmaker and you have an expert watchmaker and a watchmaking workshop set up or something, and even if it's an expert in an expert's workshop, you wouldn't expect to be able to just drop into someone else's workshop and then immediately get to work. You either have to learn the way that things are laid out, or you have to customize the layout to fit exactly the way you work. Not because the tools are that different fundamentally or anything like that, but just because the things are not there when your hand is used to reaching for them. So, more concretely, what that metaphor means in terms of my working style is like, I have a local GPU rig on my that's where I am right now. I have a couple of rigs in the cloud. In those computers, there's a specific folder layout with specific datasets in these places. And I have scripts that assume that those things live there. And I have a script that, when the model's trained, they upload it to S3 and all this stuff. And it's kind of a spiderweb, but it's a spiderweb that works. The exportable bits are more like if there's a specific problem to be solved. I wrote a little custom mini language model training library for myself to make it really easy and efficient to run a lot of GPT-4 inferences in parallel. And those kinds of things are like, they're like solving an isolated problem so I can isolate them and then package them and maybe share them. But in general, a lot of the tooling that I've built over time isn't actually like solving a particular hard technical problem. It's like, I want this thing to be there when I reach for it, or I want to not have to worry about where exactly my dataset is. So it's very much about the things that are the ways that things are laid out, and I don't know how much that is shareable. If I was at a company and I had to unify everyone under the same tooling, maybe we could all agree to be like, okay, all the datasets go in this folder, all the model outputs go in this S3 bucket, and then maybe there could be a shared spiderweb. But in public and open source, I think that's much harder to achieve.

Using language models to assist in code writing (54:49)

Linus: But in public and open source, I think that's much harder to achieve.

Nathan: How much do you use language models to assist in your personal code writing these days?

Linus: A lot, but I don't think any more than like most other people that I have, like Copilot and have used ChatGPT. I use Copilot, GitHub Copilot. I've also tried Sourcegraph Cody. Most of the time they're just useful for kind of writing boilerplate code that you knew you were going to use anyway. And so a lot of times I'll just write things and I'll pause for Copilot to do its thing, and then I'll hit tab. They are never really useful for writing code that you're not sure of the correct form. So a lot of time I'm doing kind of complex matrix algebra in the codebase, and Copilot would be like, "Hey, here's the exact five-line operation that you need to do to finish this function." I'm like, I don't... I don't know if there's a bunch of arbitrary numbers in here. I don't know if there's a bunch of arbitrary dimensions in here. I don't know if those are correct. And so for those, I'll either work it out on paper myself or I'll frequently do this thing where I give some sample code to GPT-4, and I'll ask it, "Can you find the bug in this code?" And if there is, it'll find it. If not, it'll say, "I don't see a bug in this code." And that gives me some confidence. But ultimately, all of those are kind of accelerants. And I think that you always have to do the final verification of like, running the code and seeing that the output is exactly what you expect, which again is where PyTorch is much better suited than other more opaque tools. Because when you're debugging things, especially, you can print the outputs at every step and see exactly whether the outputs match your expectations.

Model capabilities (56:10)

Linus: Because when you're debugging things, especially, you can print the outputs at every step and see exactly whether the outputs match your expectations.

Nathan: So let's zoom out again, or go up a level, and talk about model capabilities. I mean, I guess maybe one little tiny follow-up before we do that. That definitely relates. What scale of models are you using? And do you have any concern about needing to upgrade the core models that you work with to make sure that the investigations you're doing are, you know, on the most relevant systems? And this is something that, you know, we've mentioned Anthropic a couple of times. And as I understand it, a big part of the reason that they feel it necessary to train frontier models, even though like their founding ethos is very AI safety-centric, is that they feel like they can only do the frontier interpretability work if they have frontier models. And, you know, they just feel like they're qualitatively different. So I'm kind of transitioning into the qualitative, you know, potentially qualitatively different capabilities that the latest systems have and wondering, to what degree you feel like you need to run the latest models to have those, whatever those may be, to make sure that they're happening on the system that you're studying.

Linus: Most of this is speculation, right? Based on the things that we know about smaller models. And so that umbrella caveat applies to everything that I will say next. I think there are two kinds of categories of facts that we can try to learn about models. One thing we can learn about is how neural networks work in general, or how transformers work in general when trained under a particular objective. So this is more like the physics of a neural network under training. An example of something that belongs in this category is like the linear feature hypothesis, where we generally see a lot of current interpretability work, is happening under the assumption that models represent specific facts it knows about the input as a kind of direction in its vector space. That seems like the kind of thing that if you can either theoretically demonstrate in a kind of like rigorous mathematical way in a proof. And if you can also observe in smaller models, it seems like you can make a pretty good assumption and conjecture that it will generalize to larger models. And I think a lot of my work is mostly predicated on that kind of thing where it's about the physics of how these systems behave. And so even as you scale systems up, the way that models represent things and the way that information flows through these models are probably going to stay roughly similar. The other category of things that we can try to know about models are for a given specific model, what are the things that it's doing, and what are the things that it knows? So, for a specific model, what are the kinds of features that it's recognizing in the input? What are the kinds of circuits that it has that are useful to implement? How large are these circuits? What is the distribution of features? Are most features topics versus most features about grammar, things like that? And I think that's something that that's the category of facts for like, we have some of this, we have very little to go off of for projecting forward to say, what would the distribution of features look like for a hundred billion parameter scale model? There are interesting analogs in vision models that I think where there has been interpretability work going on for a longer period of time, where something that's interesting about smaller vision models is that there's a kind of a sweet spot when you have very small models that are not super capable. The features that they use to represent inputs tend to be very not interpretable. And then as you scale them up, there's a kind of sweet spot where the features tend to be the most human-interpretable. You can see features like, oh, this is a cat, this is a dog ear, this is a wheel of a car. And then perhaps as you scale the models up further, maybe it's possible that the models think in terms of even higher-level features that don't necessarily correspond to the way that we conceptualize the world. And so that's the kind of thing that changes with scale. And that's the kind of thing that I expect we'll see in language models also. But if we—so there's some interesting, like, I think OpenAI has done some interesting work in the space of how do you supervise and interpret and understand models, even under the assumption that they are using, they are thinking in terms of things that humans don't have words for. But a lot of my current work is in terms of how information flows through models and how models represent information.

Reasoning abilities in frontier language models (01:00:26)

Linus: But a lot of my current work is in terms of how information flows through models and how models represent information. And I think for that, there are pretty solid assumptions that I think will hopefully generalize to larger models. So, how do you think about some of the, I mean, here are some hot button topics or, you know, kind of a flashpoint vocabulary words.

Nathan: Things like reasoning capabilities in frontier language models. How, you know, obviously the one extreme view is it's all just correlation and there's no, you know, quote-unquote, real reasoning going on. It's just statistical association and, you know, they're all stochastic parrots. My sense is, to put my cards on the table, I think that is not really true at the frontier. And it seems like I've seen plenty of interesting examples of problem-solving, you know, that seem to be meaningfully outside of the sorts of token-level correlations, you know, that might be expected to be found even in like super broad training data to support that. How would you give an account of reasoning abilities or things like grokking or things like emergence as we go to these frontier scales?

Linus: When we talk about what models can do, I think there are three levels at which you can ask this question. Level one is, does there exist a transformer? Is it possible to have a transformer model that accomplishes this task? In other words, if you were an omniscient being and you knew how to set every single parameter in a large transformer, could you manufacture a transformer that did this task? That's level one. Level two is, is it possible for, like, a reasonably randomly initialized transformer model to, through some gradient descent-based optimization process, arrive at a set of parameters that accomplish this task with the model. And then the third level is, when we are training language models in the real world with limited precision floating-point numbers on internet-scale data, will the resulting model be sufficiently close to the ideal model that it performs the task? And I think for most definitions of reasoning, and problem-solving and generalization, the answer to part one is almost always a yes. I think for any given logical deduction problem or any given math problem, it's generally possible to, given lots of parameters, write a program inside a transformer, manually engineer all the parameters, and have a model that's capable of doing it. My intuition, my non-rigorous intuition, is that given sufficient data and sufficient time and sufficient compute, it's probably possible to also train transformers in the infinite compute limit that do any arbitrary computing task. I say that because for non-trivial math problems, if you have enough examples, you can train very small transformers to fully solve them. Not fully solve them just in the sense that you can give it a bunch of unseen problems and it will solve them, but also fully solve them in the sense that if you look inside a transformer, you can see how the algorithm is implemented inside the transformer, and you can show that that is the correct algorithm that always generalizes. So, I think in a lot of cases, it's possible to have a transformer that, through gradient descent, arrives at something that we would call reasoning. The big question that I have personally in my head is, do we have enough data to train on such that for these extremely large models that have so many parameters, we could train them sufficiently long that they would go from starting to memorize to eventually starting to look like something that we would call fully reasoning rather than relying on a mix of reasoning and pattern matching. I think that's a big challenge. I think it's also very difficult to show in extremely large models currently that, for example, a model solved this problem by reasoning it within a circuit in some way over kind of like pattern matching. Because the diversity of the problems that we're going to show it for is really large, and because there are so many parameters, and relative to the number of parameters that we have, I think the datasets that we're using are still relatively small. And so it's very possible for the models to mostly memorize and still do really well. And so I don't know if we'll ever get to that point because data is really messy. And I think to get to the regime where you train a model and you can show that the model is perfectly generalizing or perfectly reasoning, I think you need extremely clean data that basically only does the correct thing all of the time. And you need that for a long enough period that the model, to anthropomorphize it a little bit, feels comfortable forgetting its memorized solutions and only relying on its generalized solution. And so there's a lot of issues, I think, in terms of practically can we produce models that we would be able to show is reasoning. But I think it's, in theory, possible to have models that do this.

Tiny stories (01:05:03)

Linus: But I think it's, in theory, possible to have models that do this. And perhaps with research breakthroughs, we will get better at closing the gap between options two and three. I think one of the best, certainly the best thing I did, and one of the best conversations I've had about a topic like this was an episode that we did with the authors of the Tiny Stories paper, which was out of Microsoft Research, and Ronan and Yuanju. They did something where they observed that the order of learning seems to be syntax first, like just getting the part of speech right from one token to the next, then kind of facts and associations, and then reasoning. And so, in order to, like, try to get to reasoning, well, this is at least how you could frame their work differently.

Nathan: This is maybe not their exact framing, to be clear. But the way I've thought about it ever since is to get to that reasoning starting to happen as soon as possible and at the smallest scale possible. They created this vastly reduced dataset called Tiny Stories, which is like stories that are short and that, like, you know, a three-year-old or a third-grader, but whatever, like reduced vocabulary, you know. Pretty simple little universe, but because the vocabulary size was way lower and the facts that you might encounter were also greatly reduced, then you could start to see some of these very early reasoning abilities starting to emerge, even with just a few million parameters. I think the biggest model they created was 30 million parameters. And by the time they got to 30 million, which is still whatever, 2% to 3%, maybe the size of GPT-2, they were starting to see some of these, like very simple, but like negation, for example. You know, GPT-2 really struggles with it, but they were able to observe it somewhat consistently, at least in their very small models, by just reducing the world.

Synthetic data (01:07:00)

Nathan: I have a big expectation that curriculum learning, in general, is going to be a big unlock for this. And I kind of, you know, we've heard from OpenAI and others, I think at this point too, that like, training on code really helps with general reasoning ability. My sense is that at this point, they're probably doing a bunch of stuff like that, and kind of, you know, much like that paper we saw also recently from DeepMind on geometry, where they generated a ton of just like provably correct geometry statements and said, "Okay, we're going to start with this," and then you'll kind of learn what a correct geometry statement looks like, and then we'll apply you to this problem. I sort of suspect that the datasets at Frontier Labs these days have a lot of just pure reason, you know, kind of in, especially in the early going when they're trying to establish these core reasoning circuits, but that's on somewhat speculative authority for sure.

Linus: I think the synthetic data, something that's related to this that I thought about recently, is on OpenAI's new video model. This is purely observational speculation, but sometimes you look at the motions in the videos that are generated, or the lighting, or the shadows, and it looks like a video game. And partly based on that mostly groundless suspicion, and partly based on the confidence that I think we both have about the importance of synthetic data. I think they're probably training on, at least early in training, on a huge amount of just like a game engine generated physics and motion and movement as another example of synthetic data. Tiny squares is also super interesting. And I think it's an example of like, in general, as you said, models are very lazy about what they have to learn, and they only learn the thing that you want them to learn when they've run out of options. They've exhausted all the other options that they have to try to minimize their loss, and the only remaining option is to finally learn the thing that you want them to learn. In language data broadly, I think it's so difficult to get to that point. Even if you think about the math proofs that occur naturally on the internet, for example, there are a bunch of proofs on the internet that are just incorrect. And so, in order for the model to perfectly learn how to solve a math problem, it has to first, or at least simultaneously, also learn how to disambiguate whether the proof that follows some text is going to be likely to be wrong or correct. There's a bunch of noise like this that I think either you can have relatively small amounts of very high-quality synthetic data, or you can have vast, vast amounts, like in the trillions of tokens range, of extremely noisy data that is only kind of filtered. But to get to like getting these very large models to reason in the way that, I think, humans want to reason with natural language, I think is going to require bridging the gap. And that's a really interesting data problem for sure.

Nathan: In the lightning round.

Current capabilities (01:09:45)

Nathan: In the lightning round. So we've just been talking about these kinds of... You know, what exactly are the current frontier capabilities, you know, exactly how much reasoning is going on. You know, I agree, it's hard to say. I think if I could summarize our relative positions, I think I probably see more reasoning in it than you do today. Although it sounds like, I definitely, it sounds like we're together in that I definitely don't think of it as a binary. It is very, for me, like, it's not that it's all this or all that. I always say AIs defy all the binaries we try to force them into. And instead, it's like some mix, right? Like I would say, it seems that there are times when it can reason and there are other times when it can't reason. And, you know, mapping that boundary, which, by the way, might be fractal for some other results we've seen recently, is definitely a super hard challenge.

Notion AI (01:10:35)

Nathan: And, you know, mapping that boundary, which, by the way, might be fractal for some other results we've seen recently, is definitely a super hard challenge. To get practical on this for a second. So you're obviously working at Notion AI, a product leader there, and bringing frontier models to the masses. You've shared in detail in other places, some of the functions and gone into how you're building it. I really liked the podcast you did with Redpoint, by the way, for those that want to get into how Notion is made. I thought that was a really good one. So, skipping over all of that kind of stuff and now just saying, okay, for the stuff that it's not able to do yet, as well as you would like, and as well as you would hope to provide to Notion users. What do you tell the model developers are the shortcomings? Where do they need to improve to give the quality of experience that you think they should be able to get to next and would be most useful at that intersection of feasibility and value?

What are the top problems that need to be solved in AI? (01:11:31)

Nathan: I can run down a list, but you could just respond freeform. This is a list that's always top of mind for me as well, because everyone on the team talks to companies building models, and they always ask us this question.

Linus: The main ones that are always top of mind are we want models that hallucinate less, we want models that are cheaper and faster, lower latency, and we want models that follow instructions better. And there's a fourth one, which is a big one, but a very hard one, which is like we want models that are better at general reasoning. All three of these, so not hallucinating, lower latency, and following instructions more faithfully, following complex instructions more faithfully. There are, I think, areas where, for example, CLAW 2 or GPT-4 have shown marked improvements over the previous generation models, except obviously they're both more expensive. But I think the combination of those three, or a couple of those three, have enabled entirely new kinds of products. So, for example, Notion has a product called QnA that helps you answer questions based on all the information in your knowledge base. And to do that requires both not hallucinating as much, not just from its own memory, but from information that it has read in its context. But it also requires following extremely complex instructions because we have massive books of instructions, metaphorically, that are just like dozens and dozens of bulleted lists of here's how you should format your answer. Here's what you should do. Here's what you should not do. Here's exactly what you should do in this situation versus another situation. And earlier models have had a lot of trouble following that kind of, kind of like a thousand, 2000, 5000 token long instruction. And so long instruction following, I think, is an area where we've continued to see improvements, even in the latest iterations of GPT-3.5. And it's very essential. Hallucination reduction makes a lot of sense. And then, just as instruction following enables a bunch of use cases that were impossible before, I think lower cost and faster latency enable a bunch of use cases that were hard to pull off before. Notion also has a feature called AI autofill, where you can use a language model, kind of like an Excel formula, where you have a data table or a database of interviews, companies, customers, or your classes, or whatever. And then, instead of filling in a column with a formula, you can fill in a column with language model prompts. And to be able to do that at the scale of tens of thousands of rows that some of our customers have in their tables requires models that are both fast and also efficient because we want to be able to send OpenAI or Anthropic thousands of these requests in a matter of a few minutes and have them be fulfilled reliably. And so that's an example of a use case where as the models continue to get better at this Pareto frontier of latency and cost versus capability, we're going to continue to be able to make new kinds of products.

Why isnt general reasoning ability the top priority? (01:14:15)

Nathan: I often have this debate with product people where I say, I think it was in your fourth or fifth spot before you get to general reasoning ability, general kind of capability. I always put that first. And I say that as somebody who's also kind of scared of an AI that's smarter than me. But definitely, I feel like on my personal utility, you know, the results that I get day to day, I'm always like, it's pretty fast, you know, and it's pretty cheap. It would be nice if it was faster and cheaper, but what I really want is for it to be smarter. What I really want is for it to write as me better, you know, better in context learning or better kind of style imitation on the fly. So it can kind of put something together in the way, you know, or at least close to the way that I would want it. Why isn't that like the very number one thing? I mean, if you had that, couldn't you charge like hundreds of dollars a month for Notion AI? Is there something I'm missing about that?

Linus: Part of the reason why that's at the end of the list is because it's, I think, much harder to get to basically. Again, obviously, this is something that new models continue to get better at. And I think like between going from GPT-3.5 Turbo to GPT-4, for example, we saw improvements in how people were using it and the quality of feedback that we were getting. Better at reasoning is obviously a goal, but also if we go to Google and say, "Hey, we want models that are better at reasoning," their engineers are going to be like, "What, what do we do with this? What do we do with this piece of feedback? Like, obviously, we're trying to make models that are reasoning, but how do we measure that? What exact kind of reasoning?" And so, in some ways, following instructions and not hallucinating are facets of reasoning that I feel are most tractable in like a few months to a year timeline, and also to have them unblock the most urgent current problems with existing products that we have. Another way to look at this is I think a part of your desire for better reasoning is a function of the way that language models have been packaged for you, which is you interact with it in a single-threaded way, one at a time, as the model is generating the tokens in real time. And you ask it questions. And I think there's a lot of other ways that language models can be used that isn't packaged that way. So, for example, the autofill feature that we mentioned, that's a bulk operation. Someone might upload a 2,000-row CSV and fill in five different columns. And wouldn't it be better? First, wouldn't it be good if you could do that, which is a cost problem? And then wouldn't it be good if you could write a prompt, run that, and a few seconds later, have the entire table be filled, and you can filter it by category or feature or where people are and so on? And those kinds of experiences are not really bottlenecked by reasoning. Most of the time, they're bottlenecked by things like speed and latency. And especially if you're trying to do this at the scale of billions of tokens generated per day, this is a huge, huge problem. There are categories of products that, for example, smaller startups might be able to launch, but we might not be able to launch just because for us to deliver that same thing would cost us enough money to not be feasible. And then I also think there's a lot of useful work. I always kind of joke that most business documents that most people write, or most academic documents that most people write, are just interpolations between other business documents that have previously been written before. And so, for a lot of the, I think it's like 95% of all documents that are being written inside Notion, not because there's nothing novel in there, but just because, you know, given enough information and diversity in all the inputs that these models have seen, most instances of problem-solving or doing important work, communicating well, are within the training set.

Gemini 1.5 (01:17:45)

Linus: And so, for those, I don't actually think the biggest problems are getting the model to reason outside of what it's already seen, but just being better at following instructions so that we can steer it towards the right space in the space of events that it's already seen. Perfect transition to my next question around the next big model upgrade.

Nathan: You kind of alluded to it a little bit, but what does that look like? And, you know, because one of the big theories I have right now is that we have, as an industry, as an AI app industry, if you will, we have been building a ton of stuff that basically tries to compensate for the language model's weaknesses, right? We're like, they hallucinate. So we ground them in database access. They, you know, have very limited context. So we've got to like re-rank the results to try to make sure we're getting the right stuff into context. Obviously, we have, you know, every part of that pipeline can fail. And then, you know, you might just have something come along that like changes the game. And maybe it just did in the form of GPT-3.5 with up to a million and maybe up to 10 million tokens of context, and some pretty wowing recall demos. I guess, first of all, I'd be interested to hear what kind of process you plan to go through. I'm sure you have a standard set of evaluations that you would, you know, get systematically comfortable with, that like, okay, we're not regressing. Interested to hear what that's like. But then I kind of also wonder, do you see that the next model, like a GPT-3.5, could perhaps lead to a significant leap in performance given all that scaffolding, or possibly even make some of that scaffolding kind of not necessary anymore? Like, do you think the system itself maybe gets simplified because the new model just is more tolerant of just, "Hey, just throw some more stuff in it. You know, we don't have to work so hard on some of these previously hard problems."

Multimodal use cases (01:19:40)

Linus: I'm very curious what use cases motivated the million-token context window for the new Gemini models. My hunch is that it's actually mostly multimodal use cases. I can imagine a long piece of audio or a long piece of video filling up the context. But I think most of the time, if you're purely in the text realm, it's difficult to imagine. There are a lot of benefits to retrieving limited context rather than just putting everything in a model window. Some of them include observability. So if you give the model 10,000 inputs and it gives you the right answer, and it gives you the wrong answer, how do you debug that? Maybe you can look at things like attention maps and so on, but that is an interpretability problem in itself, where if you have a pipeline that gives you maybe the top 10 documents and has a language model answer that, answer the right question, if you got it wrong, you could ask useful questions like, did the answer exist in the documents that it saw? Was it at the beginning or the end of the context? If you swap this out, does the model get a different answer? And so, having a more pipelined system, I think, helps you debug. I think it helps with costs, obviously, and latency for a given compute budget. It also lets you incrementally upgrade different parts of the system. So, having a better language model while keeping the same retrieval pipeline versus having a better retrieval pipeline with the same language model both improve the results. So, I think there's a lot of just structural benefits to having this pipeline model. And on the Gemini results, in particular, have a lot of these tests that are kind of needle in the haystack tests. If I have a million tokens, there's an example of an anomaly somewhere in the million, can you find it? And in the real world, a lot of the kind of complex retrieval that models have to do is actually so much more nuanced. An example of something more nuanced could be like a couple of examples. One example is the model might have to not just find the information but do some reasoning to figure out which of the pieces of information to use.

Retrieval (01:21:31)

Linus: One example is the model might have to not just find the information but do some reasoning to figure out which of the pieces of information to use. In Notion's case, in the retrieved documents, some documents may be out of date. Some documents may be written by someone who's not authoritative. Some documents may be written in a different language, and it might conflict with information written in the canonical document written by the HR team in a company or something like that. And the model has to, and we have instructions for how to deal with all of these cases. And so the model has to do some reasoning over exactly what, and that looks very different than just like find this word in this million tokens. Another example of a more complex case is when there needs to be synthesis. So if you ask a question, if you're like in the Notion, internal Notion, and if you ask a question like, has the AI team been productive this week? That's a complex, multi-step question that not just requires knowing not just a single answer to a question, but in general, what's been going on. And then that requires maybe reasoning through, okay, what does it mean when this person is productive? Who are all the people in it? Maybe there are people that are out of office? And so again, a lot of problems that are not just about what information is in the context, but actually how the model performs and all the biggest challenges that we've faced so far in optimizing retrieval. It's mostly those kinds of things, those kinds of more like reasoning related or edge case situations where it's unclear exactly what the model should do based on our existing instructions. And the best ways that we've found to attack those problems involves a lot of stepping through all of the steps in the chain and saying, "What's the answer found at this step? What's the answer found at this step?" And all of that debugging is just much easier when you have like 10 examples to look at instead of 10,000. So at least for this particular use case, I think context length is not the most dire need. But obviously, there are lots of other use cases, like for example, audio, where I think it could be a big changer.

Nathan: Yeah, very interesting. I'll be very keen to spend more time figuring out exactly what I can do with a million tokens. And I totally agree with you that the needle in the haystack results that we've seen are not enough of an answer to how it really performs to be confident just dropping it in. Although I aim to find out exactly how far this thing might have advanced. And it does hold a lot of promise for me, just because, first of all, this podcast, you know, might be more than like GPT-4 Turbo or Claw II can really handle. It probably would fit, you know, the transcript probably fits in the 128k; it certainly would fit in the 200k. But it's getting to the point where like the recall isn't great. And just to be able to, like, take my podcast transcripts and throw them into something and get reliable answers, you know, that I feel like are better than I could delegate, you know, an answer pretty reliably, like that alone is super exciting. And then it'll be really interesting to see how the synthesis part of that works with those.

Linus: Yeah, I mean, there's a part of me that wonders, like, training a model of such long context, I think requires probably architectural inventions, probably some engineering work, definitely a lot of extra data work. There are only so many examples that are high quality that are that long. And so there had to have been some good use case motivation for Google to go ahead and train that model. And I'm very curious, even just like internally, what they're using that context for.

Agents (01:24:43)

Nathan: Yeah. Agents, I think, is going to be one also super interesting application. Like, to what degree can you just kind of append your past failures and continue to roll forward in a sensible way, learning from what happened today? Obviously, you know, you can't just drop all your API failures into a single context window; you'll run out. You'll still run out even with the 10 million eventually, but it could make a big difference for that sort of thing. How about just any big tips that you have for AI app developers, things that you know to work that you just don't see people doing enough of?

Data (01:25:21)

Nathan: This could be in the prompting domain. It could be in the RAG domain. It could be in how people set up their evals or even just the user experience and the interface that is presented.

Linus: Both of these are kind of boring things, but I think they're not. There's always something you could always be doing more. The first is, regardless of what exact kind of model you're training or working with, it obviously always pays to spend time with data. And I don't just mean like run an evaluation, look at the charts. I mean, like read the logs, exactly. Like make a table of a hundred outputs and a hundred inputs and ask yourself, for this input, what output would you, the human, generate, and then think about all the steps that you go through and reason through that and look at edge cases, look at where the models fail. I've also found it personally interesting, if a little dubious in value, to look at pre-training data, like raw pieces of text from the pile dataset. Most of the text on the internet is quite garbage, and there's a lot of stuff in there that I think, like Gesture app, for example, the reason that language models are good at certain kinds of output formats and not in other formats. But the general theme, I think, is to spend a lot of time with your data, in particular with your input data that you're giving the models and the tasks, and then also with failure cases in the wild. In the beginning with Notion AI, we spent some time setting up a system for us to have kind of like human-annotated logs and a more kind of scaled automated system for detecting errors and fixing them. And eventually, what we've settled on for a lot of our features is instead. We have, like, the engineers have scheduled time on our calendar every week where we go into a meeting room, and we just stare at a Notion database of all the bad cases, like individual outputs that were bad that were reported by our users. And we ask ourselves for each input, what is the exact step in the pipeline where this failed? What category does this belong in? We kind of treat it like a software bug. And we say, is this already being fixed? Is this an instance of a new bug? Is there like a systematic issue in a pipeline that requires us, like, inventing something new to fix this? And so, spending a lot of time with the failure cases of data, I think, pays off. And then, sort of on a similar theme, I think investing early in internal tools to quickly run evaluations, quickly and easily run evaluations, quickly and easily kind of generate synthetic datasets, visualize outputs, sort through them.

Tips for improving a product (01:27:32)

Linus: I think that helps a lot. And there are a few companies out there doing stuff like this. I think for working with your own data and visualizing it in particular, that generally the tools shouldn't be so complex that you can just have an engineer kind of like whip it up in a couple of days. And I think that pays off in the long run by you being able to customize it over time, as I alluded to before. So that's kind of what we've done at Notion. Obviously, your mileage may vary, but I found those two things to be particularly worth their time in terms of improving the product. I think those are very solid tips.

Nathan: And I agree with you that there, you know, while it may not be exactly what people are looking to hear, the admonition or reminder at a minimum to look, read the logs, just look at the raw data, look at the failure cases. I have certainly been served extremely well by that over time too. Like there's just no substitute for looking at the raw inputs and outputs.

Linus: Once you understand the problem at a deep level, then you can start thinking about, like, we're running out of time during the week to do this. How can we scale this out? How can we automate this? But the automation is not really possible until you have a really solid understanding of exactly what you're trying to monitor for or automatically fix or protect. Yeah.

Nathan: Okay, cool. All right, last section of things that could possibly get weird.

Nathan: All right, last section of things that could possibly get weird. Two big trends, I'd say one, you know, kind of predates the other a little bit, but both are increasingly well established at this point. One big trend is that it seems to me like all the latent spaces can sort of be mapped onto one another, right? We saw this kind of arguably first in a really powerful way with CLIP, where image space and text space were kind of brought together. And now we have text-image space, like amazing. But that was done on a high scale way. And then it wasn't too long after that, that we got things like, for me, Blip2 was really a major moment. I'm sure it wasn't the first, but it was one of the first that I really read and understood deeply and realized that, holy moly, like they've got a frozen language model here and a frozen image model trained totally separately. And then just a small bridging connector model between them that just took like a couple of GPU days to train is enough to unlock, you know, even though that was like beyond CLIP level capability. But we just see so many examples of this now, right, where one latent space to another, often just a linear projection, sometimes a small connector model is kind of all it takes. And then another big trend has been the realization that a lot of models can be kind of merged together in seemingly unprincipled ways, or like not very principled ways, right? We've seen some going back to, I think it was called relative representations. It was a paper that showed that models initialized differently, trained with different shuffles of the data or whatever, still seem to converge on pretty consistent isomorphic representations of data that are maybe a rotation away from each other, or maybe a dilation and a rotation away from each other, but ultimately look very similar in data visualization. And now we're getting to people saying, "Well, hey, what if we just train model A over here on one dataset and B over here and just add their weights together?" And it's like, wait a second, that works? It doesn't maybe always work, but to a surprising degree, these sorts of things are starting to work. So you can comment on that in any and all ways, but the way that I'm most interested to ask you about is, is there a risk or a concern, or maybe you would see it as a good thing, that AI systems may start to communicate with each other in a higher-dimensional, more embedding-mediated way, or middle layer activation sort of way? As opposed to through different forms of communication that we could actually read in a native way, right? It seems like the models don't really need to generate text to, you know, send the text over to one another. Instead, they probably are going to perform better with high-dimensional, you know, vector communication. But I kind of worry about that at the same time, even as it may boost performance in various ways, I kind of worry that like, much like you said earlier with the, "How do you debug that?" And I was like, "Well, I can't even read the messages that they're passing back and forth." So, how big do you think those trends are? How excited are you about those trends? Do you see, you know, downsides along with the upsides, you know, all things about kind of weird embedding-based communication and model merging?

Model Merging (01:32:04)

Linus: Yeah, model merging is truly weird, and I don't have a great principled understanding of exactly why it works so well the way it does. I have a general intuition of why it works. It works decently, which is that a lot of these models are fine-tuned from a single base model. And when you fine-tune these models, especially on tasks that look similar to the original task, like mostly doing natural language continuation, the models mostly tend to exist in a kind of linear subspace of the original model space. And so when you merge them, you're mostly doing linear interpolation between weights of these models. And so it makes sense that, perhaps it makes sense that the resulting model will just kind of inherit behavior between a bunch of different models. It's still very weird, but when you view it that way, it doesn't seem quite like black magic. It just seems like something that we have to understand. Relative representations and other kinds of projections between spaces, I think, are really, really cool. It's cool, obviously, at a theoretical level. I think it's also really cool to observe it at a technical level, an empirical level. One of the other experiments that I did late last year was to find mappings between embedding spaces. I had a model that was capable of inverting and finding features in one embedding model. And then I just trained an adapter between OpenAI's embeddings and my model, and it was a linear adapter, and I could start to read out OpenAI embeddings, even without spending so many, so many tokens, like training an OpenAI custom kind of inverter. So all of these things, I think, are super fascinating. I think where I place this technique in the kind of like pantheon of all these different things you can do with language models is a parameter-efficient or compute-efficient fine-tuning or customizing a model, where instead of fully tuning a new image-to-text model or fully tuning a new way to go, a new way to like invert an embedding, you could take in a model that's close enough in what it represents and then maybe tune a few parameters to get to the final destination. To your question about their models will communicate in latent spaces. If you think about it, GPT-3 is just kind of like 100 GPT-2s talking to each other in activation space. Obviously, it's not quite that. There are denser connections. But if we can manage to precisely understand exactly how different layers in a transformer, different token residual streams in a transformer, communicate with each other. I think a lot of those techniques will definitely generalize to understanding, like a mixture of experts models, or understanding ways that fully tuned models like GPT-2 that you mentioned communicate with each other through kind of like mapping of representation space. In some ways, it's actually easier because, unlike its transformer residual stream where the concepts that it may be representing could be really weird, like but you can imagine a concept that's really useful for predicting the next token in a Python program but not really useful for humans in general in life. A lot of times when you're mapping between, kind of like, fully formed representation embedding spaces, between like an image embedding space and a text embedding space, I think most of those, I intuitively expect most of those concepts to be pretty human-interpretable. And so a lot of the kind of mechanistic techniques that I think people are working on today will probably generalize to understanding them.

How weird will things get? (01:35:00)

Linus: And so a lot of the kind of mechanistic techniques that I think people are working on today will probably generalize to understanding them. And so I just view it as kind of an exciting, efficient way to build more interesting systems. How weird do you think things might get over the next couple of years in general?

Nathan: I mean, it seems to me like we're headed for at least one more, and probably more than one more, notable leap in model capabilities. And I feel like I have a sort of rough intuition for what GPT-5 would be, which I might probably describe as like smarter, to borrow a word from Sam Altman, you know, better general reasoning capabilities and probably more long-term coherence as another big aspect of that. You know, they've seen how many people are trying to make web agents and various kinds of agents on the platform, and it's just not quite working. So I would expect to see like smarter and sort of more readily goal-directed as kind of two big advances for the next generation. Beyond that, it starts to get honestly kind of hard to figure out like what would even happen, and you know what it would mean. But do you have kind of a sense for like how big is your like personal Overton window or kind of, you know, cone of possibility for the next few years? Like how far do you think AI might go in a few years' time? I mean, everything monotonically improves from here, right?

Linus: I think that's the scary part. MKBHD has this good video on Sora where he utters this phrase like, "This is the worst that this technology is going to be from here on out." And I think that's a really succinct way of expressing the fact that, okay, maybe you think GPT-4 is not super, super smart, but this is like, if you look back at the history of smartphones, every phone when it came out, it's like the worst that smartphones are ever going to be from that point on out. And it's only gotten monotonically better. And I think when you think about that, I think language models are monotonically improving so rapidly from here on out as a trend line, I think, is interesting and scary.

Long-term planning (01:36:54)

Linus: And I think when you think about the Cognitive Revolution, I think language models are monotonically improving so rapidly from here on out as a trend line, I think, is interesting and scary. Their long-term coherence and, I think, goal-directedness in particular is really interesting. Right now, with every GPT iteration, I think OpenAI has done a little bit of not just making the model, the base model smarter, obviously, but some opinionated tweaks to the final tuning objective to make the model more useful for certain use cases. Obviously, the big, most recent one was like API use; before that, it was chat, but I could imagine OpenAI or other model makers tuning their models to be better at kind of not just having the model expect the world to end after the next turn in the conversation. Expecting that there's further turns and maybe planning for, okay, if I assume that I'm given, you know, infinite turns into the future, what might I start to do? So that kind of like long-term planning, I think, is something that's missing in current models that make it very hard to use for agents. So I agree a lot with that. And then, beyond models themselves, there's obviously lots of corners of culture that these models touch. And I think that's a much harder, much more complex, dynamical system. I think it's much harder to predict exactly what will happen. Like, to the concept of copyright, for example, or even to our concept of exactly what a single piece of creative artifact is. There's a really good TEDx talk by creativity and HCI researcher, Dr. Kate Compton, where she talks about the idea of an image generation model, not just like when a human produces a piece of art, you make the thing and then it's just like that, that is the concrete object. But when you have a model that's capable of producing a bunch of images to a single piece of text or like millions of images at once, one way to look at it is some of the models just producing art faster than a human, but a different way to look at it is the model is a tool to map out the space of all possible outputs of a certain style of art or to a certain prompt. And so it starts to kind of change exactly what we imagined a single piece of artifact to be, from like a single blob of pixels to, like, here's now a kind of subspace of all possible outputs that, as a bundle, is a form of creative expression. So there's a lot of cultural stuff that I think is just much harder to predict that I'm, frankly, not equipped to.

A positive vision for the future (01:38:56)

Linus: So there's a lot of cultural stuff that I think is just much harder to predict that I'm, frankly, not equipped to. I have too much smart commentary on, but I think it would be very interesting to watch and probably have ripple effects beyond the model capabilities themselves. Do you have a sort of positive vision for the future for just like your own life?

Nathan: I find that this is in very short supply, and certainly, you've been one who has been so up close and personal with the AIs as they've developed. You might not, but if you do, I'd be interested to hear your sort of day in your own life, you know, three years from now, five years from now, what is AI doing for you? You know, what are you able to do that you couldn't do before? You know, who knows what Hollywood's doing or how the entertainment industry has evolved, but like, what does the day look like for Linus as things really start to hit, you know, the key thresholds for utility?

Linus: The high-level concept that I gravitate towards when I think about this is like, you can take a base technology and express it in a way that's agency-taking, agency-expanding, or agency-amplifying. An example of a tool that takes away agency from a human is like a dishwasher. But that's fine because I don't actually care about creatively washing my dishes or exactly which order I wash my dishes. I just want them washed, or like a laundry machine or a car maybe. And then there's a bunch of technology, packaging, ways of packaging technology where preserving agency really matters. Writing tools are an obvious example, but maybe also more subtle things like which emoji show up first in an emoji keyboard or predictive text keyboards, or obviously social media algorithms. These are kind of like somewhere in between agency taking and agency amplifying. And one thing that I'm kind of concerned about right now is that I don't think people are thinking enough about whether the ways that language models are packaged is amplifying human agency or taking away from it. That's something that I think I want to talk and think more about, and perhaps push other people building in the space to be better at. Assuming that we can steer the way we package language models to respect agency where it is required and only take agency where we want the models to take agency, I'm generally a pretty optimistic person about technology, and I think I have a lot of optimism for where this leads. As long as the way we package these things is more humanist rather than just kind of like automate all of the things. When you look at different kinds of AI companies, you see companies situated at different points in the spectrum between you want models to automate things in a way that takes away agency, i.e., replacement, or do you want models that amplify? For example, I think OpenAI is very much on the replacement side. Literally, their definition of, I think, HCI is something like a thing that can take over a single full human's job, whereas if you look at a company like Runway, a lot of their framing of usefulness, or Adapt perhaps, a lot of their framing of usefulness is about extending that agency of what you want to express. So there's a healthy amount of diversity here, and I think it's just a matter of who, where the winners sort of end up lying. Assuming we get that right, I have a lot of optimism for where we're going.

Nathan: It's a big question, but I agree that's a key one. We want to guard our agency probably increasingly jealously, especially the more different AI systems might want to usurp it. That could be a great note to end on.

Closing (01:42:10)

Nathan: Anything else you want to touch on that we haven't?

Linus Lee: No, I think we covered everything. Interfaces, capabilities, interpretability, all the things I spend my time thinking about. How people can be more like you, of course, a key highlight as well.

Nathan: All right. I love it. Well, thank you very much for doing this. You've been really generous with your time and insights, and I definitely count you among the must-follows in the space for all sorts of new and very generative ideas. So it is my honor to have you. And I will say in closing, Linus Lee, thank you for being part of the Cognitive Revolution.

Linus Lee: Thank you. It was my pleasure.

Nathan: It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please, don't hesitate to reach out via email, or you can DM me on the social media platform of your choice.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.