Watch Episode Here

Read Episode Description

In this episode, Linus Lee, AI product leader at Notion, joins us to discuss his groundbreaking projects and unique approach to exploring AI systems. He shares his toolkit and insights on language model capabilities, along with his vision for the future of AI, which is centered on amplifying human intelligence. Linus inspires listeners to engage deeply with AI and to steer its development in ways that enhance human creativity and agency.

RECOMMENDED PODCAST: How Do You Use ChatGPT with Dan Shipper via @EveryInc
Dan Shipper talks to programmers, writers, founders, academics, tech executives, and others to walk through all of their ChatGPT use cases (including Nathan!). They even use ChatGPT together, live on the show. Listen to How Do You Use ChatGPT? from Dan Shipper and the team at Every, wherever you get your podcasts : https://link.chtbl.com/hdyucha...

SPONSORS:

Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/

The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR

Plumb is a no-code AI app builder designed for product teams who care about quality and speed. What is taking you weeks to hand-code today can be done confidently in hours. Check out https://bit.ly/PlumbTCR for early access.

Head to Squad to access global engineering without the headache and at a fraction of the cost: head to https://choosesquad.com/ and mention “Turpentine” to skip the waitlist.

CHAPTERS :

(00:00) Introduction
(10:16) AI and Creativity
(14:37) Sponsor: Omneky
(14:57) AI Research Development
(20:05) Bridging Modalities
(32:38) Sponsor: Brave / Plumb / Squad
(35:36) Transformer Models and Techniques
(52:32) Personal AI Research Setup
(58:55) Leveraging Language Models for Coding
(01:02:05) AI Model Development and Notion AI
(01:22:05) Future of AI Models and App Development
(01:32:51) Emerging Trends in AI

Full Transcript

Transcript

Linus Lee (00:00) The kind of interface I'm eventually building towards is a tool that lets you edit text or work through ideas, not in the native space of words and characters and tokens, but in the space of actual meaning or features, where features can be anything from, is this a question? Is this a statement? Is this uncertain or certain? To like topical things like, is this about computers versus plans? Or to probably other kinds of features that we don't really have words for. I'm generally a pretty optimistic person with technology, as long as the way we package these things is more humanist, rather than just kind of like automate all of the things you see companies situated at different points in the spectrum between you want models to automate things in a way that takes away agency, I. E. Replacement, or do want models that amplify? Like, I think OpenAI is very much on the replacement side. Literally, their definition of, I think, AGI is something like a thing that can take over us as single, like, full humans job, where if you look at a company like Runway, a lot of their framing of usefulness is about extending that agency of what you want to express.

Nathan Labenz (01:00) Hello, and welcome to the cognitive revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas and together we'll build a picture of how AI technology will transform work,

Linus Lee (01:16) life,

Nathan Labenz (01:17) and society in the coming years. I'm Nathan Labenz joined by my cohost, Erik Torenberg. Hello, and welcome back to the cognitive revolution. Today, guest is Linus Lee, AI product leader at Notion and AI explorer extraordinaire. I followed Linus online for a couple years now, fascinated by his many groundbreaking projects and his unique way of thinking about AI systems. From creating novel interfaces that visualize and manipulate generative models in their latent spaces, to developing techniques for semantically editing text and images, Linus has been a pioneering tinkerer. In this wide ranging conversation, we dig into the details of how Linus goes about his explorations. He shares his toolkit from PyTorch as a foundation to the custom tools he's built over time for data visualization, model evaluation, and rapid experimentation. We also discussed the importance of spending time with raw data and failure cases and the value of building your own tools to deeply understand the problems you're trying to solve. Linus also offers his perspective on the current capabilities of language models and where he sees the biggest opportunities for improvement. Beyond the high level goal of better general reasoning, he emphasizes hallucination reduction, better instruction following, and cost efficiency. We also speculate about the future, considering scenarios like models communicating with other models via high dimensional embeddings, techniques to connect latent spaces across modalities, and the societal implications as AI capabilities continue to advance. Linus articulates a vision centered on amplifying rather than replacing human intelligence, a principle he believes should guide the development and deployment of AI powered products. Throughout this episode, Linus demonstrates the curiosity, resourcefulness, and thoughtfulness that have made him such an influential figure in the AI community. His work inspires me to engage more deeply with these systems, to build the tools I need to understand them, and to steer their development in a direction that deepens rather than diminishes human creativity and agency. As always, if you find value in the show, please share it with others who might appreciate it. Particularly right now as we're establishing the new feed, a tweet, or a comment on YouTube would be especially valuable. And, of course, we invite you to reach out with feedback or suggestions on our website, cognitiverevolution.ai. Now please enjoy this deep dive into the art and science of exploring AI systems with Linus Lee of Notion AI. Linus Lee of Notion AI and general all purpose AI explorer. Welcome to the cognitive revolution.

Linus Lee (03:54) Thank you. Thanks for having me. Excited to chat.

Nathan Labenz (03:56) I'm really excited about this. I have followed you online for a couple years now and really been fascinated by all your many projects and just the way that you think about AI systems. I really genuinely think you're a 1 of a kind thinker from which I've learned a lot. And so going back to when I set out to do this a year ago, you were on the shortlist of guests that I maybe didn't wanna have right away because I kinda wanted to get my, you know, get things set up a little bit and have a little bit more credibility, a little more comfort. But I definitely always knew that I wanted to, get you on the show and pick your brain. So I'm excited to get into it.

Linus Lee (04:32) Thanks. That's very generous of you and kind of you to say. Likewise, I'm excited to talk about both AI and everything else that I've worked on and kind of what I see coming up ahead.

Nathan Labenz (04:41) Well, very well deserved. The way I thought we would maybe approach this is I wanted to give people at the top a little bit of a sense of the kinds of things that you build and then try to kind of peel back some layers from that to then understand, like, how you're thinking about that in kind of conceptual terms. Also, how you are doing the work in very, very practical terms, and then kind of a sense for where you think we're going from here as well as perhaps some very practical tips based on your experience at Notion and and elsewhere for people that are looking up to you as an AI application developer.

Linus Lee (05:19) That sounds good to me.

Nathan Labenz (05:21) So for starters, you have built some of the most interesting novel interfaces that folks have seen. And, you know, these can be found on your Twitter profile in various places, and you've you've shown a few off in in different talks that you've given. And a few of them jumped out to me that you've demoed recently, but I'd love to kind of just introduce a few of the unique interfaces that you've created just to give people a very kind of tangible jumping off point to understand, the sorts of things that you put together.

Linus Lee (05:49) Yeah, definitely. I think we can kind of talk through them. I think the the more polished the demo that I can do off the cuff right now is probably not as good as the ones that are more polished online. So you can find them on YouTube. But I can talk about what the go what concretely the demos are, and then I can maybe later talk about my framing for them, because I think that's a part that I view part of my work is trying to figure out and evolve exactly what the framing should be. And I think that'll come up later in your conversation. But the concretely, there are a couple of directions that I've explored in 1 that I've sort of been on for the longest time is to try to visualize and control generative models in their latent space rather than in the input space that we're familiar with. So if you take image generation, for example, the common way of generating images is you describe an image like a calysoon, a grassy field with sunlight, long shot, know, 35 millimeter film, whatever, and then the model generates an image. But another way you could imagine generating or improving upon an image is you start with an image, and then you find a different image that has another element that you like. Maybe you like the lighting from another image, maybe you like the framing, or maybe you have a character that you want to bring in from another image. And if there was some way to tell the model, can you merge all of the attributes that you see in both of these images into a single image and kind of blend it together in the semantic space rather than into pixel space, that could be an interesting way to kind of stitch together images. So a couple of my demos are around combining images and text in this semantic space of an image model called clip to generate, for example, start with a human face and then add emotion to it to make it smile or make it frown or starting with like a photograph and then adding different kinds of style to it by just adding vectors in the latent space of a model together. You could do something like this with text as well. So I have a model that's capable of embedding text into a latent space and embedding space, but then also inverting that embedding to go from the embedding to what that embedding represents as text. And it turns out that's just in the model, a lot of embedding models that are expressive enough, this transformation is like pretty, is invertible with minimal kind of meaning loss. And so take that model, and then I can start to manipulate the embedding itself to, for example, combine 2 different sentences together semantically and produce a new sentence or interpolate between sentences. So you have 1 that's maybe very optimistic and happy and bright and another that's a little more mellow. And I can take those 2 sentences and interpret between them to generate sentences that are in the middle. And I think that that kind of thing could potentially allow for other ways of controlling or editing or generating text or even images that let people control output using levers that are hard to verbalize with text. And there are more rigorous and precise versions of this that I've been working on more recently that I can also talk about, but but that's the general gist of things that I've been interested in.

Nathan Labenz (08:32) And sometimes there are some pretty creative front end experiences on these as well. You wanna describe the 1 also where you have kind of little tiles of text on seemingly an infinite canvas, and then you can kind of drag into different directions and create, like to take that text, but now move it in this direction, you're kind of making that a spatial computing sort of experience almost.

Linus Lee (08:55) Yeah. So the the 2 that I think are most interesting visually, the 1 that you just described is actually came out of a problem that I ran into, where 1 of the initial prototypes of this involves just having imagine this high dimensional latent space of a model, and you want to visualize it, and you want to visualize it on like a screen, and so you cut a 2 d slice through it. And you cut a 2 d slice through this high dimensional space such that the 2 dimensions that you see on the screen, the bottom edge of your screen or the top, the left edge of your screen correspond to 2 different kinds of targets that you move towards. And so maybe the bottom right corner of your screen is like a happy sentence about weddings, and the top left corner is about like dinosaurs or something. And you can pick any point in this plane, and it'll be a sentence that combines elements of those sentences. That works well for kind of demonstrating the seed of this idea, but the problem that you quickly run into is there are actually way more properties that you want to control when you're generating or editing text than just like between 2 arbitrary anchor sentences. And when you have this fixed plane, you quickly run out of directions that you can move in. And so instead of fixing the directions, the levers that you can pull to just the top and bottom edges of the screen, the next revision, which you referenced, involves kind of flipping how the dimensions are presented. So the way I used to describe it is imagine you're playing like an instrument. And when you play an instrument with 1 hand, this is very crude, but if you're playing like a string instrument with your left hand, you control the strings and you control the pitches of the notes that you're playing. And with your right hand, you kind of control the volume or the amplitude of the sound in a very crude way. And this is a way to kind of transfer that in a spatial interface. So this is an infinite canvas, where with 1 hand holding down keys on a keyboard, you can choose what specific attribute you want to control for text, whether that's like a positive negative emotion, or whether it's written in fancy English versus simplified English, or what kind of topic it is, is it more about sci fi or so on. And then with your right hand, you can click down on a sentence that you've put down on this plane, and you can drag it out. And the further you drag out, the more you're kind of increasing the particular attribute that you pick. So for example, if you put a sentence down on this 2 d campus, and then you hold down the like, make longer button, and you pull the sentence farther from where you clicked down on it, the farther you pull, the longer that sentence will be. And over time, as you explore this latent space of sentences, you can see all of the branches that you took in this exploration and where each variation came from. And it for it naturally forms a kind of a branching edit history for through which you can follow all of the different kind of movements that you made through latent space and what the sentences that resulted from are.

Nathan Labenz (11:42) When you think of these projects, are they I mean, I can I think they probably have multiple functions for you? Right? 1 is, obviously, you're learning by doing all this stuff. How much do you actually use these tools versus viewing them as kind of proof of concept for some further synthesis on top of this later? Like, do you think these kinds of interfaces are ready for people to actually get utility out of it? And do you personally?

Linus Lee (12:07) Most of the prototypes that I've built for kind of, like, research purposes, I would say are not good enough to actually be useful. So the kinds of edits that you can make by with this, like, kind of latent space Canvas UI, the edits are very crude, and the models that I'm using are too small, I think, to let you make very, very precise edits. And I think 1 of the the kind of continued problem areas in which I want to do more research is to try to improve upon that. There are other prototypes that I've made that have been more useful. And so I think it just depends on how how good the particular technology or technique that I'm working on is and whether that is close enough to utility or not. Within the kind of set of prototypes that are built for working in this area of building interfaces for AI, the eventual goal is always to build something that is useful. And for any individual prototype, it may be the technology or technique may be too early to actually be useful, so it may be the prototype may be more about identifying incremental improvements or identifying problem areas or identifying what directions to build more in. Or if the technique is more mature, then it may be close enough to utility that I build it more as something that I can use day to day and actually kind of get benefit from. But there's there's always an underlying element of it, which is like, this is going to help me better understand of what it's for, how it can be useful, how to make it, how to improve on the technique, the quality of the outputs and so on. An example of a prototype that I built a long time ago that I actually ended up using day to day for a long time is kind of a Obsidian or Roam Research style outliner note taking app. That was my like daily note taking app for a long time, where keywords would automatically kind of identify themselves or entities would automatically identify themselves and connect them to other entities in the graph. And that was using mostly not language models, mostly kind of dumber techniques, but it was the outputs were good enough that I ended up finding those connections very useful and generative in my kind of creative thinking, and I used it for a long time.

Nathan Labenz (13:54) That's cool. Hey. We'll continue our interview in a moment after a word from our sponsors. The the 2 d Canvas 1 in particular of the recent demos that I've seen has a sort of Xerox PARC vibe about it where you're like, this this feels like especially after I went and tried the, Apple Vision Pro. It was like, I see something here, and it feels like there is, you know, spatial computing as, you know, spoon fed to you, and it's amazing. It's, you know, incredibly, like, a like, a breathtaking technology experience to go put the Apple Vision Pro on your eyes, but it's very much like a consumer experience that, you know, you you start off with. But seeing your, you know, 2 d canvas, imagining that becoming a higher dimensional space that you could potentially move around in, and then, you know, envisioning these sort of, you know, hardware to actually make that feel very experiential starts to give me a little bit of a sense of, like, how some of these future spatial computing notions might be, you know, more than just like, oh, look. I've got my YouTube screen over here, and I've got my, email over here. But actually kind of using the brain's spatial intuition to do things that are currently, like, hard to do. Yeah. Using using all of the full kind of awareness awareness of space and

Linus Lee (15:14) the awareness that we have of how to move around. Like, this is all very intuitive. We learn it when we're when we're a child. And most of the ways that we work currently don't really take advantage of that. I spent a long time iterating on and I, like, even still, every, like, few months, I get a kind of a new version of exactly how to describe the motivation for building out these kind of weird spatial demos that are technically not quite there yet, but like gesturing towards something interesting. The most recent version of this, I think, is actually somewhat relevant to how humans interact with space, among other things. The closest metaphor that I found for the reason that this direction of research is so interesting to me is that I think language models work so well because they work with kind of an alternative, better representation for ideas than humans work with. The closest analogy that I have is like spectrograms when people are dealing with audio. So normally sound is like a wave in space, it's just a single kind of, I imagine, like a single string vibrating back and forth over time. And if you work with audio, that's like the base base thing that you work with. It's a basic representation. But if you work professionally with audio, then you actually most of the time work in a different representation space where you don't look at vibrations over time, but you look at space of frequencies over time, or what's called a spectrogram, which is a visualization of like, there's still a time axis on the 1 hand, but the other axis is no longer amplitude of the vibration, It's like the prominence of different frequencies. If you imagine, like, on the left side of your kind of graph, every like, the row is a different pitch. And every row gets a little mark when there's like a sound that hits that pitch. It's a little bit like Western music notation kind of staff, but it's just like colorful gradients often appear in these spectrograms. And what you can see, because you've broken out the prominence of every different kind of pitch, every different frequency is you can see things like, oh, here's a here's a musical pattern that repeats for here is where the base comes in, or here's where the tone shifts from this kind of very full sound to this more mellow sound, or all these things you can notice, and then you can start to manipulate the sound in the frequency space. So you can do things like bring down the lows or add reverb only to the human voice range of sound, all these kinds of transformations that you're not really making to the sound waves themselves, you're making to making to this, like, kind of projected out the version of the sound that exists in this alternative frequency space, if that makes sense. The kind of interface I'm eventually building towards is a tool that lets you edit text or work through ideas, not in the native space of like words and characters and tokens, but in the space of actual meaning or features, where features can be anything from, is this a question, is this a statement, is this uncertain or certain, to topical things like, is this about computers versus plans, or to probably other kinds of features that we don't really even have words because language models implicitly understand them. We implicitly understand them, but we don't need to ever really verbalize it. So alternative way of projecting or viewing what's going on in language, and then maybe a way to start editing them. That's the closest metaphor that I found from describing the kind of thing that I want. And then once we have that representation, 1 reason that spectrograms work so well is because humans intuitively understand color as a way of representing range and positions in space so we are representing range because we have this geometric intuition. And so maybe rather than looking at ideas as just kind of like ink scribbles on paper, maybe look at we can also look at them in a spatial way. And the canvas was 1 kind of way of exploring that. I there are a bunch of other directions that we can go once we have a better, more rigorous understanding of what's what's going on inside these language models. But eventually, I think we should be able to interact with ideas by bringing them out into the physical space that we occupy or as being as close to that as possible because we want to bring our kind of full senses and proprioception and sense of scale and sense of distance into how we work with ideas rather than just kind of like eyeballs and pixels on the screen.

Nathan Labenz (19:07) Yeah, that's really interesting. I'm just looking up to try to figure out, was it refusion which might have been the project that people would know most of all that was a stable diffusion fine tuned, right, on these spectrogram images? So definitely check that out for further intuition there because it's it's also an interesting kind of foreshadowing of 1 of the things I want to talk about a little bit later of just kind of how all these spaces seem to be very sort of bridgeable to 1 another in off in many cases through like relatively simple adapters. Here you have a vision model trained to generate these spectrograms, and then that is just converted to music. And, like, the AI part stops at the generation of the spectrogram image. But at 1 point, this was, like, 1 of the leading text to music techniques all working through this, like, vision backbone. So

Linus Lee (20:01) I was going to say that the model's generating audio, but not in the typical space that we think of for presenting audio, which is waves in space, but it's generating in this spectrogram space. But it's actually a little bit more convoluted than that even, because most like speech generation, speech synthesis models, or even a lot of music generation models, they actually, a lot of them do operate in this kind of like frequency space in this like spectrogramming space. I don't know how the state of the art is now, but until very recently, the way that you built speech recognition systems was also by feeding an input in this frequency space rather than just waves in space. And so audio models used to naturally operate in this domain. And then what refusion does is it actually takes a model architecture that is meant for generating pixels in a 2 d image using convolutions and applies that to this, like generating this thing that's supposed to represent audio. So it's coming through kind of like 2 jumps in representation, and it still manages to work quite well.

Nathan Labenz (20:57) It's amazing. That's 1 of my big kind of themes and takeaways from, you know, a few years now of increasingly obsessive study of the whole space is just what an incredibly high hit rate there is on ideas that actually turn out to work. And obviously, you know, not every idea works, but as somebody who spent a little time in catalytic chemistry once upon a time, I can, you know, I could say it's definitely probably at least 2 orders of magnitude higher in AI right now than it is in, you know, many more, well mined corners of the physical sciences.

Linus Lee (21:28) If you have a lot of data and a lot of compute, and you have a way of, like, efficiently putting that compute to work and, like, squeezing a model through that process. Model architecture is kind of like a final kind of corrective term or coefficient in how well this whole thing works. It turns out if you can make gradient descent run on a lot of numbers very quickly with a lot of data that's high quality, most ideas will work within within like an order of magnitude maybe of like efficiency. And so a lot of these ideas, I think are just I I attribute to that. And then the recent advancements to the fact that we have now built up kind of all of the open source infrastructure and know how of how to do this model training thing at industrial scale rather than in research lab scale, which is maybe research lab scale to me is like a few GPUs, a few hours, maybe like a few days. And then industrial scale is like thousands of GPUs, many, many weeks. And go go from 1 to the other is in the same way that you can want to synthesize a little bit of chlorine in a lab. It's very different than like building a chlorine factory. Running these models at scale, I think, is a similar kind of like huge qualitative jump in terms of the kinds of knowledge that we have to have, but like, we have a lot of it now. And so we can apply that compute to like any arbitrary problem, given there's enough data and given there's an architecture that like doesn't suck super hard. And I think the the result of that is that we've suddenly seen an explosion of, like, models that perform very well, all these modalities, across modalities.

Nathan Labenz (22:56) Yeah. Compute, you gotta have data. The quality is critical, and scale is also critical. And then the algorithm, it just kinda depends how determines how much compute you have to have to make full use of the data. That's like a if I'd set my brevity, to max, that's about as brief as I could give, hopefully, a decent account.

Linus Lee (23:17) Yeah. As long as the architecture is not bottlenecking what information can propagate through the network, everything else is just kind of like how many floating point operations can you can you do?

Nathan Labenz (23:28) So going back to your exploration of all this technology. So you've been at it for a couple years now. I'm I'm a little bit unclear on the timeline, but I have to say I feel like I encountered quite a few ideas first through some of your experimental projects. And then later, it seems like they have become more formal as, like, mechanistic interpretability research publications. Some of the, you know, the couple kind of canonical results from the last handful of months, the representation engineering paper from Dan Hendrickson collaborators and the toward monosemanticity paper out of Anthropic. I feel like you basically were doing that kind of stuff with small models well before those things kind of hit mainstream. Like, how how would you describe your own sources of inspiration for that? Or maybe I have it wrong and there were more academic papers coming out that you were able to take inspiration from, but how did you get to this these kind of advanced, like, middle layers manipulations concepts as early as you did?

Linus Lee (24:32) So I don't come from academia. Like, my my actual technical background is in, like, product engineering building web apps, which is the furthest thing you can be from training models. And so I have to kind of step my foot in slowly into the research side of the community, the ML community, and figure out not only how that side works and kind of how to engage people in that world, but also how ideas propagate through both of these parts of the community. My old mental model of how research works used to be that, like, researchers are always at the forefront and they try these things and they come up with ideas. And if the ideas are good, they get written about, and then eventually the industry picks them up. And then eventually it like percolates down to smaller and smaller kind of like, perhaps less resourced groups. And I think that's maybe that's true in other fields like biology, where there's perhaps a larger gap in capability between what labs can do and companies can do. Even there, I don't think it's actually perfectly true. But certainly in machine learning, because all of the tools are so accessible, especially in the last few years, my picture now of how ideas percolate is actually a lot more nuanced, which is that at any moment, like, there are there is a large group of people that kind of know all of the things that we know about models, and what you can do with them. And some of those people exist in industry with the tools that they have. Some of those people exist in academia. Some of these people are like hobbyists that are just kind of like tinkering on the side, even when their day job is doing something else. And different people might stumble upon the same techniques or the same ideas at the same time because of because like, of what results in the next ideas that get discovered is actually, I think, a function of what is already known rather than the specific ideas that individual people have. And so ideas like the intuition that there was a linearized kind of feature space in the latent space of these language models that you could do edits in. I think there was an idea that was kind of intuitively known for a lot of people working in this space. And some people like me, I think chose to kind of prove that out by doing what I personally enjoy doing the most, which is build interactive prototypes on the web, because that's kind of my, my expertise. And then I think other people perhaps took longer to be a lot more rigorous and theoretically robust and write things about it and get it out that way. There's a kind of different way you speak about ideas there versus speak about ideas in the interface world. But I think most of most of the ideas that come about is just a function of what was already known. And different parts of the community, both in research and in industry, just happen to talk about them in different terms and perhaps reach other parts of community at different times. You've had to solve a lot

Nathan Labenz (27:00) of problems, I understand correctly. Right? And this gets into the how you work section, I suppose. But if I understand correctly, you are, for example, figuring out your own features. Right? Like, I think you've done a mix of things where sometimes you can take a straight embedding, you know, which is just kind of 1 hot lookup and then, you know, do that again and, you know, mash those up, and that's, like, relatively straightforward to do. But then as you get deeper into the network, you're also doing, at least in some of your projects, you're making edits, like, farther down, you know, through the layers, right, of the of the model. And there, you have, like, a much more conceptually challenging problem of, okay. I may have an intuition that, you know, there I could probably, like, steer around in this space and kind of steer the outputs, But how do I begin to figure out what direction is actually what? If I understand correctly, you've largely had to figure those sorts of things out for yourself. Right? Like, there you weren't able to really use too much in the way of open sourced kind of GPT-two or GPT J or, you know, whatever. Like, these things didn't come with, like, mid layer feature sets.

Linus Lee (28:10) Exactly. My favorite mental model for research is actually related to kind of the exact path that I followed to hit that wall and then go past it, which is that when I first started getting into deep learning models, most of what I did was read and try to understand things that were in the open source world. So I had a hard time reading academic papers in the beginning deep learning because I didn't have all of the kind of vocabulary and the way that people write about these things in my head and in my hands. And so I would just go to like the Hugging Face transformers repository, and I would download GPT-two and get the minimal version running. And I'd be like, oh, this is generating text that recognizes it seems to work. Now that I have a live entity that's working on my computer, and I understand Python, and so I can put print statements everywhere and see exactly what's happening. And so that's kind of what I did. I basically did like model autopsy on the things that were running on my computer, as because that felt like a thing that I could understand and get my head around. And then I would try to understand things that way. And once I understood those models, I would try to understand things were a little bit more on the frontier. And eventually what you hit is you hit a wall where like, for a while, you can get away with like, oh, I hit this problem. I'm going to Google how other people solved it. You arrive at like a Hugging Face forum post, and then maybe you get a little more esoteric and you arrive at like a GitHub issue. You're like, okay, this is kind of like, there isn't consensus, but like, here are some people, some things that people try and maybe they work. And then at some point you hit a wall where you're like, there is nothing on the internet that's like an established consensus for how to solve this problem. And you have to go find a paper that somebody wrote 3 months ago that reports to solve this. And then sometimes it's well established and well cited and you can trust it, and other times you're like, well, this paper has 2 citations, and I don't even know if it actually works. And so you have to then find an implementation that has 1 star on GitHub that tries to reproduce it and run it, or oftentimes those are broken too, and so you have to then write your own implementation, which you're not even sure if it's the correct 1, that's the same 1 that the researchers used. So you hit a wall, at which point you're kind of at the frontier of what we know as a research community or as an industry. And then, but there are still problems to solve. Like, I still have things that I want to do that I don't know how to do. And so in the same way that if you hit a weird Python bug, you keep digging and digging, and then maybe you do some debugging, and eventually maybe you have to go into the Python interpreter. I hit a wall with with like trying to find interesting features in these models or trying to even like build models that I can use for these things that were capable of being immutable to these edits. And I hit a wall, and I had to eventually read a bunch of papers and develop my own intuition and try to combine the things that I found to combine the pieces that I found to say, here's an approach that might work based on what I know. It's there's no obvious consensus on it, but it might work and then try it. And that's, that's been the process, most obviously to me for first building the autoencoder that I use for all of these embedding space manipulations, the model that I talked about that lets you go between text and embedding fluidly. And then second, for the work that I'm currently most focused on, which is more unsupervised kind of scalable approach of finding these edit directions, builds which on top of a lot of the mechanistic contributor related work that organizations like Conjecture and Anthropic have done. But that that's also quite on the frontier, and there are some techniques there, but they haven't really been applied to embedding spaces yet. They've mostly been applied to normal GPT language models. And so I've had to take a lot of those techniques and adopt it as well. And a lot of it is like finding weird like, literally, I was doing this earlier this morning, finding weird GitHub repositories that have like 3 stars and try to transplant it over and hope that I'm not messing up the implementation and and see how it works.

Nathan Labenz (31:39) Hey. We'll continue our interview in a moment after a word from our sponsors. So before getting into a little bit more detail on how exactly you go about that, how would you characterize you've obviously built up a lot of intuition over the last few years of doing this. How would you describe, like, what's going on inside a transformer or perhaps what's going on inside a diffusion model. You can kinda break that down however you want. I don't if just mean, like, you know, describe the blocks, but, like, how is the information being changed, you know, in more sort of and you I I think I said another related question there is you said you think that the models have a better, like, representation of a lot of ideas than we do. I'm very interested to hear, you know, how as you describe kind of what's going on in in a model, maybe you could contrast that against what you think is going on for us. Like, I feel like I have a pretty similar setup in some ways to a transformer where, like, it seems that maybe I'm just, like, so, you know, prisoner of the AI paradigm now that I just see myself through that, which would be a weird irony. But I do feel like there is this sort of nonverbal, high dimensional, you know, kind of only semi conscious churning going on in some middle layers somewhere for me that then kind of gets done and feels like, okay. I'm happy with that now. Now you can speak. And then it kind of you know, in the in the last layer, so to speak, you know, it it gets cashed out into tokens. But yeah. So what's going on in in a language model at at kind of a, you know, conceptually descriptive level or representationally descriptive level? My mental model for how humans work,

Linus Lee (33:19) I think, includes a really important piece, a piece that I found to be more and more important over time, which is that I don't actually think I think that the default assumption, I think intuition for how we think we work, our brains work is that we think through things and then we draw conclusions and then we act upon them. But I don't actually think that's true at all. I think what happens instead is our brain and our body takes actions, and then observing those actions, our brains retroactively built up a narrative for why we perform those actions in a way that seems like causally consistent with our understanding of the world, just a whole other interesting thing. And maybe maybe this is you call consciousness. But on the model side, there are different levels that you can look at this. And I think for transformer language models, the 2 levels that are most interesting are how is the transformer processing information within a single forward pass generating a single token? And then how are we sampling from the output of the transformer? So I'll talk about the sampling first. The job of a transformer language or job of any any language model is, it's called a language model because it's a statistical model, and the statistics that it's modeling is the probability distribution over all possible outputs. And in a lot of cases, this is a conditional probability distribution. So the probability distribution that the model is modeling statistically is, here's some start of a sentence, give me a probability distribution over all possible ways this input can continue. And this is kind of like a cloud of probability over every possible branch that the model can take. And obviously, that's way too many outputs. If we had infinite resources, we would simply ask the model for the probability rating of every single possible continuation and just pick the 1 that the model thought was the highest. But we can't do that because we just don't have a different queue. And so instead, we have to randomly sample from this distribution, this cloud of possible continuations. And there's a bunch of different ways to do the sampling. And I think the most common 1 is to just sample every token and immediately kind of commit it. So ask the model what's the most likely token to come next. The model says, here are the top 50 most likely continuations, and you randomly pick 1 of them according to probabilities, you just do that over and over and over again. But there's a bunch of other ways to sample from this. Like, you could have a kind of running buffer of n branches, and then only only set things in stone when you're like n steps in and you think that that's the most likely continuation. There's a there's a name for a technique called beam search. There's other ways that try to try to take into account not getting the model to repeat itself as much, or taking the probability distribution and kind of warping it in interesting ways that to try to like, to emphasize the There's an interesting technique where you have a small model and a large model, and you assume that the large model improves upon the small model always, and so you take the gap in the distribution and you amplify the difference. And Then you try to approximate, like, what would an even smarter model predict? So there's a bunch of techniques for sampling from this distribution. But ultimately, once you have a good probability distribution that the model outputs, you're trying to find the continuation of a text that's most likely to maximize the likelihood. So that's how we sample from the distribution part. Then there's how we get the distribution part, which is what's happening in a single forward pass of a transformer. And this is very transformer specific, but the best mental model that I have, which a lot of this I owe to kind of the excellent that groups like Anthropic have done, is a transformer takes in a bunch of tokens, and a transformer is basically a stack of mini neural networks, a collection of mini neural networks where each token gets its own little mini neural network. And within each little mini neural network, there's a bunch of layers where you like, there's a main kind of artery where the information just flows down. And then every once in while, there's an offshoot where the transformer can like go off and and do a little bit of computation to iterate on its understanding of what's going on, and then merge that change back into this like main artery. And large, like small models have like 12 to 24 of these kind of like branch and merge paths within each kind of mini transformer in a token. Very large models can have up to like 100 of these little blocks. But imagine you have a sentence, and for each token in that sentence, you have a little mini transformer that over time continues to iterate on the models understanding of what the token presents. And then after each branch and merge back into this artery, all of the arteries of the transformer, the MIDI transformers, can exchange information with each other. And the way that little token level MIDI transformers exchange information with each other is they broadcast some information about what information they contain and also what information they're looking for from other MIDI models, MIDI stacks in this transformer. And then based on what they have and what they're looking for, they learn to fetch information from other token stacks in this transformer, and then they merge those back into the artery. And so a transformer is a repeated cycle of here's some information I have, fetch more information from other tokens, and then do some iterative computation on it, get more information, iterate on that, get more information, iterate on that until you get to the bottom of the stack, bottom of the stack, and then that gets put through this thing called the softmax, which gives you the final probability distribution. But ultimately, a transformer is a thing where for every token, you do a bunch of iterative information processing, and then you exchange information with other mini transformers behind each token. Does that make sense?

Nathan Labenz (38:57) Yeah, I think the Anthropic work that you've alluded to certainly been very influential for me too. I think when you say artery, main artery, you know, synonyms for that would include the residual stream and the skip connection. And I think I don't know who all was involved with this work. I'm sure multiple people, but I also definitely associate this with Neil Nanda and watching some of his, tutorials where he'll say things like, the residual stream, it's really important, man. And you're like, if you say so, I believe you.

Linus Lee (39:31) The residual stream is is the name for the artery. And then within each transformer block, the attention is the mechanism by which the little tokenized residual streams exchange information with 1 another, and then the MLP layer is the thing that actually does the actual, in my mind, most of the actual computation and meeting association happens within these simulpiliars. There's a lot of other weird nuances about this. When humans think of all computation stuff, we think of a discrete mapping from 1 input to another, like mapping from Paris to Eiffel Tower. But it turns out it seems as though a lot of times this meeting formation may be distributed across multiple blocks over time. And so there's a lot of strange things about exactly what human interpretable computation happens and how it's spread out across all of these mechanisms. But in terms of just raw bits flowing through, I think that the clearest picture that I found is the 1 that I just described.

Nathan Labenz (40:30) Yeah, that's good. So now let's go back to a little bit more of kind of the practical how you actually do this work. And then we can kind of go higher order concept again and talk about some of the capabilities and big questions around what they can actually do and where we're confused and where we're confident about what's actually happening. So okay. So you've just described a sort of flow of information. You've got this main kind of path and then you've got these sidebars where the computation happens. There's this, like, attention block, which is the place where each of the mini streams, each token gets its own stream. They can put out a vector indicating what information they contain. They also put out a vector indicating what information they are looking to receive. Those all get crossed and thus is the information passing from token to token. And then they continue on and each token kind of loads in various associations that are encoded in the weights. And that's like where the information is stored in some way, some weird way. Okay, cool. So now that has to get instantiated in code. What libraries, frameworks, templates, tutorials do you use to have the right level of convenience and extraction, but also kind of clarity on what it is that you're setting up?

Linus Lee (41:54) Most of the time, I'm working in PyTorch on GPU, like Linux instance that I have on AWS PyTorch. I guess the main alternatives to it would be JAX and TensorFlow. It seems like TensorFlow is in a of a maintenance mode at the moment from Google, and JAX I think is nice for many other reasons. I guess I should describe a little bit my intuition for what these things are, how they differ. So PyTorch, in my example, or in my intuition, is gives you the most concrete way of interacting directly with the vectors and matrices. Like PyTorch, when you instantiate any tensor, or any matrix or any vector in PyTorch, you can like, look at the numbers, and it's a thing that feels like you can hold on to it, it's a it's a concrete value. And the way you write a program or a model in PyTorch is you have some concrete value and you apply operations to it, which is the way that normal programs are written. The mental model that I have for writing programs in JAX or TensorFlow is it's a little more like a compiler in that you describe the model as a series of computation steps. So you're kind of describing a model as like a function with these 12 steps you apply to the input or these 96 steps you apply to the input. And then once you once you have a model, you can like compile it, and then you can put an input through it, and then outcomes, the output. But there's it's harder to there are some affordances for this now, but it's harder in libraries like JAX to just poke inside of model and say, okay, what is the actual matrix value here? Let me look at all the numbers and let me poke inside of it and play with it. Because the library isn't constructed that way for performance reasons. If you can describe a model in this straight through control flow graph or function way, it's a lot easier to apply complex transformations and compilation steps to this to get higher performance. But the trade off is that you can't concretely play with the values. And I think, especially if you're doing lots of interpretability work, it pays to just be able to look at a specific value at any random place in the network and say, okay, what does this number correspond to? What happens if I tweak this number to be this other number? And for that reason, I think in general, in the ML world, I think, like between JAX and PyTorch, most researchers use PyTorch, I think. And then if you're like, from Google or ex Google or using Google infra, you use JAX. And then I think in interpretability, things are a lot more skewed towards Torch and these other other tools that you play directly with the values because it's just it just pays to have more observability into the values.

Nathan Labenz (44:14) My general understanding is PyTorch is basically totally dominant outside of Google and has for all the reasons that you mentioned, I mean, it's kind of won the open source battle. To what degree do you need to complement that with other libraries? Like, I know, for example, the Amanda has a library that is kind of meant for these sorts of visualizations. But I don't know how much visualization you're doing of activations, you know, as the forward pass is happening versus how much you're able to just get by with the more human readable inputs and outputs. Do you actually spend a lot of time visualizing what's going on inside?

Linus Lee (44:53) I I do. I try to. I would a lot more if the tools made it lot easier. I think the kinds of visualizations that are beneficial for studying autoregressive transformers are slightly different than the ones that are useful for studying embeddings. Parts of the kind of existing infrastructure I can use, parts I have to create on my own. Thankfully, visualizing things is something that I'm relatively better at compared to building models, and so it doesn't take me as much time to build embedding space or graphically explore a bunch of outputs of a model that I can like work together with some like reactant front end pieces, because that's mostly what I do at work. This is this is something that I think holds true for both our AI work at Notion and also my work independently is, especially when you're early on in a field or early on in investigation, which I think I would I would claim that we are early on with language models in general, nobody really knows exactly what they're doing. The quality of the tools and how much you can iterate on the tools, I think bottlenecks how much you can iterate on this thing that you're working on with the tools. And so it pays to be able to quickly tweak the tool or add the functionality that you need to see something new, whether that's a tool that's for evaluating models or running models or visualizing things, either in the outputs or in the training like behavior. And because of that, I think I've mostly defaulted to building my own little tools whenever I needed them. Then eventually, if I realize that I'm building the same kind of tools a bunch of times, consolidate that into a little library or a little script or something that I can reuse. To me, that flow of quickly whipping up something that works, and then once you realize it's being repeated, then being able to smoothly trend that into a reasonable component. Like, in some sense, this is like all software engineering is, but being able to do that really fluidly, I think is like the meta bottleneck. And the speed at which you can do that, I think is the thing that bottlenecks the quality of the tools, which is then the thing that bottlenecks the actual quality of research you can do. So a lot of my own, work is kind of tools that I've built over time for myself. When there are when there are reusable pieces, obviously, I reuse, but I try not to adopt anything that is very hard to see through, like the Hugging Face transformers library is very hard to see through because it's meant to serve 1000000 different use cases, and it's meant to host 1000000000 different models. Of that, there's tons of interaction. It's very hard to just ask a simple question, when the model is at this block, what are all the vector operations that are happening? It's impossible to answer because the answer to the question depends on a bunch of different questions, there's like 12,000 different branches in any single talking face function. And so conclusion that I'm trying to get to is that building my own tools that are much simpler and just fit for my use case lets me iterate on quickly and understand what's going on a little better.

Nathan Labenz (47:20) So just to make sure I understand that correctly, I think I do. It's PyTorch, and that is basically a super rock solid infrastructure foundation. And then after that, it's very little else aside from things that you roll your own to address particular needs and then only selectively build to anything more than just kind of an ad hoc problem solving addendum.

Linus Lee (47:47) But there are pieces that at this point I've reused enough and consolidated enough that they're like a staple part of my like, when I create a new folder in my model repo, personal model repo, there are a bunch of stuff that I import by default because there are libraries for components, UI components that are reused, little code for running evals on models, training models. If I'm not importing them directly, then I'm certainly copy and pasting them over. And 1 of the things that I've learned in doing more research things over building product is that in research land, I just do not feel guilty about copy pasting code, because you have no idea how the thing is going to change. And it may be that copy pasting is just going like save you from not having to overgeneralize anything. So there's a lot of pieces that I ended up reusing over time or importing over time. But those tend to be like things that I built up over time rather than someone saying, here's a library that helps you do interpretability research and then me importing it and learning how to use it.

Nathan Labenz (48:36) Okay, cool. That's very interesting. What sort of models do you use today? Obviously, you know, when you started, the available models were interestingly probably much closer to the frontier, like GPT-two. I think you started in the kind of GPT-two, GPT J era. Right? So at that time, you had both way worse models than we have today, but way closer to frontier capability than probably what you can manage to use in any sort of convenient way today. So what what models are you using now when you wanna do a new experiment?

Linus Lee (49:10) Yeah. I GPT-two is was like the biggest model open source model that was available when I first started doing this. And then since then I've like parts of it, my infrastructure that upgraded to like LLMA 2 and stuff. So in general, I think most interpretive research these days are using either like GPT-two is just like the there's so many interesting things about GPT-two is like place in time, and also GPT-three, it's not open source, it's hard, it's harder for people to work with. But GPT-two and GPT-three were trained at a time when you didn't have to do a lot of data filtering over the internet, because there was no existing language model data. And so it's like, it's kind of like pre radiation steel, and that like, it's all it's trained on things that are guaranteed to be not AI generated. And there's a lot of other interesting things about all those models, all of them have, like, very well documented. It's a little bit like the fruit fly of the language model world. And so GPT-two is very popular in general. I think a lot of people doing interesting work with LAMA 2 or LAMA models because they're also popular with search models. I think a lot of serious, more academic interpretability work, especially more theoretical ones have moved on to this family of models called Pythia, which is specifically trained for interpretability research. The way that it's specifically trained for that is that it has many more checkpoints through the course of training that you can look at, so you can look at how features evolve over time. It's fully reproducible. All the data is available. All the training code is is like well known. That's in general what you find. A lot of my work because again, I deal a lot of the time with embeddings, I work with open source embedding models, or if I need to do reconstruction specifically, I have a custom model that I've trained for reconstructing text out of embeddings that's based on T5, which also was a cutting edge model when I first trained this model, but obviously now it's not anymore. But T5 continues to be, I think, a very rock solid model for if you want to fine tune or if you want like a medium sized model to do anything complex. And so I have a T5 based embedding model that I've trained once for for this like embedding, going back and forth between text and embedding task. And then I continue to use that for a lot of different things. And at this point, using that is easy, because again, I've built a lot of scripts and tools for working with this particular model. And I have a bunch of data sets that have generated with this model that take a bunch of time to generate. And so, again, I feel like when I sit down to work now, I have a, like a kind of sprawling workstation or little workshop environment where I put all of the little tools and ingredients exactly where I need them at arm's length. And if I pull in a new model, then I'd have to redo a bunch of stuff. But now that I have few stable set of models that I'm working with, I can just kind of reach for them and put pieces together really quickly. And so my custom model is the main 1. Then just given the quality and the popularity of either open source embedding models that are at the cutting edge or OpenAI embedding models, for example, sometimes I work with those as well.

Nathan Labenz (51:52) How sort of exportable do you think that personal work environment is? You know, obviously, a lot of people, I think, are very curious about this kind of exploration and would like to be able to do some of it. And, honestly, I include myself in this to degree. Like, I'm I think I have a pretty good conceptual understanding of of most of these questions these days, but I'm not, like, that facile with the code to be able to go in and, you know, just run whatever experiment. It's definitely still, like, a barrier for me to have a hypothesis and then be like, I can go run that experiment myself. I would find that to be broadly, you know, hard and slow. Do you think that everybody just kind of has no choice but to if they wanna get there, you just have to slog through it and ultimately end up kind of creating your own thing? Or do you think I could, like, drop into your cockpit and learn to fly your plane, so to speak?

Linus Lee (52:48) It's a balance, right? I think that there's a part of there are bits and pieces of my tooling that are in exportable form. Some of them I've designed them that way from the start. Some of them I've found to be kind of an isolated piece that I can now share. Some of them are under the Notion umbrella, so I can't open source them that easily, but some of them I might in the future. The my mental model for this again is like, if you're like a watchmaker and you have an expert watchmaker and like a watchmaking workshop setup or something. And even if it's an expert in an expert's workshop, you wouldn't expect to be able to just like drop into someone else's workshop and then immediately get to work. You either have to learn the way that things are laid out or you have to customize the layout to like fit exactly the way they work. Not because the tools are that different fundamentally or anything like that, but just because it's not the things are not there when your hand is used for reaching for them. So like, more concretely, what metaphor means in terms of my working style is like, I have a local GP rig on my desk where I am right now, I have a couple of like, rigs in the cloud. In those computers, there's like a specific folder layout with like specific data sets in these places. And I have scripts that assume that those things live where. And I have a script that like, when the models trained, they like upload it to s 3 and all this stuff. And it's kind of a spider web, but it's like a spider web that works. The exportable bits are more like if there's a specific problem to be solved, like, I wrote a little custom mini language model training library for myself, to make it really easy and efficient to, like, run a lot of GPT-four inferences in parallel. And those kinds of things are like, they're like solving an isolated problem, I can isolate them and then package them and maybe share them. But in general, a lot of a lot of the tooling that I've built over time isn't actually like solving a particular hard technical problem. It's like, I want this thing to be there when I reach for it, or I want to not have to worry about where exactly my dataset is. So it's very much about the things that are the ways that things are laid out. And I don't know how much that is shareable. If I was at a company and I had to unify everyone under the same tooling, maybe we could all agree to be like, okay, all the datasets go in this folder, all the model outputs go in this S3 bucket, and Then maybe there could be a shared shared spider web. But in public and open source, think

Nathan Labenz (55:00) that's much harder to achieve. How much do you use language models to assist in your personal code writing these days?

Linus Lee (55:08) A lot, but I don't think any more than most other people that have like Copilot and have used DratchPT. I use Copilot, GitHub Copilot. I've also tried source graph coding. Most of the time, they're just useful for kind of writing boilerplate code that you knew were gonna you were gonna use anyway. And so a lot of times, I'll just write things and will pause for Copilot to do its thing and then I'll hit tab. And they're never really useful for writing code that you're not sure of the correct form. So a lot of time I'm doing kind of complex matrix algebra and the code base and Copilot will be like, Hey, here's the exact 5 line operation that you need to do to finish this function. Like, I don't if there's a bunch of arbitrary numbers in here. Know there's a bunch of arbitrary dimensions in here. I don't know if this is correct. And so for those, I'll either work it out on paper myself or I'll frequently do this thing where I give some sample code to GPT-four and I'll ask it, can you find the bug in this code? And if there is, it'll find it. If not, it'll say, I don't see a bug in this code. That gives me some confidence. But ultimately, all of those are kind of accelerants. And I think that you always have to do the final verification of like running the code and seeing that the output is exactly what you expect, which again is where PyTorch is much better suited than other more opaque tools. Because when you're debugging things especially, you can print the outputs in every step and see exactly whether the outputs match your expectations.

Nathan Labenz (56:22) So let's zoom out again or go go up a level and talk about model capabilities. I mean, I guess maybe 1 little tiny follow-up before we do that that definitely relates is, like, what scale of models are you using, and do you have any concern about, like, needing like, how concerned I guess you've some, of course, but, like, how concerned are you about upgrading the core models that you work with to kind of make sure that the investigations you're doing are, like, you know, on the most relevant systems. I mean, this is something that we've mentioned Anthropic a couple of times. And as I understand it, a big part of the reason that they feel it necessary to train frontier models, even though, like, their founding ethos is very AI safety centric, is that they feel like they can only do the frontier interpretability work if they have frontier models and they just feel like they're qualitatively different. So I'm kind of

Linus Lee (57:23) transitioning

Nathan Labenz (57:24) into the potentially qualitatively different capabilities that the latest systems have and wondering to what degree you feel like you need to run the latest models have those, whatever those may be, to make sure that they're happening on the system that you're studying.

Linus Lee (57:42) Most of this is speculation, right, based on the things that we know about smaller models. And so that umbrella caveat applies to everything that I will say next. I think there are 2 kinds of categories of facts that we can try to learn about models. 1 is things that we can learn about how neural networks work in general, or how transformers work in general when trained under a particular objective. So this is more like the physics of a neural network under training. An example of something that belongs in this category is like the linear feature hypothesis where we generally a lot of current interpretability work is happening under the assumption that models represent specific facts it knows about the input as a kind of direction in its vector space. That seems like the kind of thing that if you can either theoretically demonstrate in a kind of like rigorous mathematical way in a proof, And if you can also observe in smaller models, it seems like you can make a pretty good assumption and conjecture that it'll generalize to larger models. And I think a lot of my work is mostly predicated on that kind of thing where it's about the physics of how these systems behave. And so even as you scale systems up, the way that models represent things and the way that information flows through these models are probably going to stay roughly similar. The other category of things we can try to know about models are for a given specific model, what are the things that it's doing, and what are the things that it knows? For a specific model, what are the kinds of features that it's recognizing in the input? What are the kinds of circuits that it has that it uses to implement? How large are these circuits? What is the distribution of features? Are most features about topics versus most features about grammar? Things like that. And I think that's something that that's the category of facts. We have some of this, but we have very little to go off of for projecting forward to say, what would the distribution of features look like for a 100,000,000,000 parameter scale model? There are interesting analogs in vision models that I think where there have been interoperability work going on for a longer period of time, where something that's interesting about smaller vision models is that there's a kind of a sweet spot. When you have very small models that are not super capable, the features that they were they use to represent inputs tend to be very not interpretable. And then as you scale them up, there's a kind of sweet spot where the features tend to be the most human interpretable. You can see features like, this is a cat, this is a dog ear, this is a wheel of a car. And then perhaps as you scale the models up further, maybe it's possible that the models think in terms of even higher level features that don't necessarily correspond to the way that we conceptualize the world. And so that's the kind of thing that changes with scale, and that's the kind of thing that I expect we'll see in language models also. But if we there's some interesting like, I think OpenAI has done some interesting work in this space of how do you supervise and interpret and understand models, even under the assumption that they are using, they're thinking in terms of things that humans don't have words for. But a lot of my current work is in terms of how information flows through models and how models represent information. And I think for that, there are pretty solid assumptions that I think will hopefully generate solution models. So how do you think about some of the I

Nathan Labenz (1:00:46) mean, here are some hot button topics or, you know, kind of a flashpoint vocabulary words. Things like reasoning capabilities in frontier language models. How you know, obviously, the 1 extreme view is it's all just correlation and there's no, you know, quote unquote real reasoning going on. It's just statistical association and, you know, they're all stochastic parrots. My sense is to put my guard on the table, I think that is not really true at the frontier. And it seems like I've seen plenty of interesting examples of problem solving, yeah, that seem to be meaningfully outside of the sorts of token level correlations, you know, that might be expected to be found even in, like, super broad training data to support that. How would you give an account of like reasoning abilities or things like grokking or things like emergence as we go to these frontier scales?

Linus Lee (1:01:46) When we talk about what models can do, I think there are 3 levels at which you can ask this question. Level 1 is, does there exist a transformer? Is it possible to have a transformer model that accomplishes this task? In other words, if you were on an omniscient being, and you knew how to set every single parameter in a large transformer, could you manufacture a transformer that did this task? It's level 1. Level 2 is, is it possible for like a reasonably randomly initialized transformer model to through some gradient descent based optimization process, arrive at a set of parameters that accomplish this task with the model. And then the third level is when we are training language models in the real world with limited precision floating point numbers on internet scale data, will the resulting model be sufficiently close to the ideal model that it performs the task? I think for most definitions of reasoning and problem solving and generalization, the answer to 1, part 1 is almost always a yes. I think for any given logical deduction problem or any given math problem, I think it's generally possible to, given lots of parameters, write a program inside a transformer manually engineer all the parameters and have a model that's capable of doing it. My intuition, like non rigorous intuition is that given sufficient data and sufficient time and sufficient compute, it's probably possible to also trans like train transformers in the infinite compute limit that do any arbitrary, like a missing task. I say that because for, like, non trivial math problems, if you have enough examples, you can train very small transformers to fully solve them, not fully solve them just in the sense that you can give it a bunch of unseen problems and it'll solve them also, but fully solve them in the sense that if you look inside a transformer, you can see how the algorithm is implemented inside the transformer, and you can show that that is the correct algorithm that always generalizes. So I think in a lot of cases, it's possible to have a transformer that through gradient descent arrives at something that we would call reasoning. The big question that I have personally in my head is, do we have enough data to train on such that for these extremely large models that have so many parameters, we could train them sufficiently long that they would go from starting to memorize to eventually starting to like looking like something that we would call like fully reasoning rather than relying on a mix of reasoning and showing correlation. I think that's a big challenge. I think it's also very difficult to show in extremely large models currently that, for example, a model solved this problem by reasoning within a circuit in some way over kind of pattern matching, because the diversity of the problems that we're show it for is really large and because there's so many parameters and relative to the number of parameters that we have, think the data sets that we're using are still relatively small. And so it's very possible for models to mostly memorize and still do really well. And so don't I know if we'll ever get to that point, because data is really messy. And I think to get to the regime where you train a model, and you can show that the model is perfectly generalizing or perfectly reasoning, I think you need extremely clean data that basically only does the correct thing all of the time. And you need that for a long enough period that the model, to anthropomize it a little bit, feels comfortable for getting its memorized solutions and only relying on its generalized solution. And so there's a lot of issues, I think, in terms of practically can produce models that we would be able to show is reasoning. But I think it's in theory possible to have models that do this. And perhaps with like research breakthroughs, we will get better at closing the gap between like options 2 and 3.

Nathan Labenz (1:05:21) I think 1 of the best, certainly, the best thing I did and 1 of the best conversations I've had about a topic like this was an episode that we did with the authors of the tiny stories paper, which was out of Microsoft Research. And Ronen and Yuanju, they did something where they observed kind of that the the order of learning seems to be, like, syntax first, like, just part of getting the part of speech right, you know, from 1 token to the next, then kind of facts and associations, and then reasoning. And so in order to, like, try to get to reasoning well, this at least you could frame their work differently. This is maybe not their exact framing to be clear. But the way I've thought about it ever since is to get to that reasoning starting to happen as soon as possible and at the smallest scale possible, they created this vastly reduced dataset called tiny stories, which is like stories that are short and that, like, a, you know, 3 I don't know if it was a 3 year old or a third grader, but whatever. Like, reduced vocabulary, you know, pretty simple little universe. But because the vocabulary size was, you know, was way lower and the, like, facts to you know, that you might encounter were also greatly reduced, then you could start to see some of these, like, very early reasoning abilities starting to emerge even with just, like, a few million parameters. I think the biggest model they created was, like, 30,000,000 parameters. And by the time they got to, like, 30,000,000, which is still, you know, whatever, 3% 2 2 to 3% maybe the size of GPT-two, they were starting to see some of these, like, very, you know, simple, but, like, negation, for example. You know, GPT-two really struggles with it, but they were able to observe it, like, somewhat consistently at least in in their very small models by just, like, reducing the world. I have a big expectation that curriculum learning in general is gonna be a big unlock for this, and I I kind of you know, we've heard from OpenAI and others, I think, at this point too that, like, training on code really helps with general reasoning ability. My sense is that at this point, they're probably doing a bunch of stuff like that and kind of you know, much like that paper we saw also recently from DeepMind on the geometry where they, like, generated a ton of just, like, provably correct geometry statements and said, okay. We're gonna start with this, and then you'll, like, kind of learn what a correct geometry statement looks like, and then we'll, like, apply you to this problem. I sort of suspect that the datasets at the frontier labs these days have a lot of just, like, pure reason, you know, kind of in especially in the early going when they're trying to establish these, like, core reasoning circuits. But that's on, you know, somewhat speculative authority for sure.

Linus Lee (1:08:08) I think the synthetic data something that's related to this that I I thought about recently is on OpenAI's new video model. This is purely observational speculation, but sometimes you look at the motions in the videos that are generated or the lighting or the shadows, and it looks like a video game. And partly based on that, mostly groundless suspicion, and partly based on this or the confidence that I think we both have about the importance of synthetic data. I think they're probably training on, at least early in training, on a huge amount of just like a game engine generated physics and motion and movement as another example of synthetic data. Tiny stories is also super interesting. I think it's an example of like, in general, as you said, models are very lazy about what it has to learn, and it only learns the thing that you want it to learn when it's run out of options. It's exhausted all the other options that it has to try to minimize its loss. And the only remaining option is to like finally learn the thing they want it to learn. In language data broadly, I think it's so difficult to get to that point. Like, even even if you think about like, the like, math proofs that occur naturally in the for example, there are a bunch of proofs on the internet that are just incorrect. And so in order for the model to perfectly learn how to solve a math problem, it has to first or at least simultaneously also learn how to disambiguate whether the proof that follows some text is going to be likely to be like wrong or correct. There's a bunch of like noise like this that I think either you can have relatively small amounts of very high quality synthetic data, or you can have vast, vast amounts of like in the trillions of tokens range of extremely noisy data that is only like kind of filtered. But to get to like getting these very large models to reason in the way that like I think humans want to reason with natural language, think is going to require bridging the gap. That's a really interesting data problem for sure.

Nathan Labenz (1:09:57) In in lightning round. So we've just been talking about these kind of, you know, what exactly are the the current frontier capabilities, you know, exactly how much reasoning is going on. I, you know, agree. It's hard to say. I think if I could summarize our relative positions, I think I probably see more reasoning in it than you do today. Although it sounds like I definitely it it sounds like we're together in that I definitely don't think of it as a binary. It is very for me, like, it's not that it's all this or all that. I always say AIs defy all the binaries we try to force them into. And instead, it's like some mix. Right? Like, I I would say, it seems that there are times when it can reason and there are other times when it can't reason and, you know, mapping that boundary, which, you know, by the way, might be fractal from for some other results we've seen recently is, like, definitely a super hard challenge. To get practical on this for a second. So you're obviously working at Notion AI, a product leader there and bringing frontier models to the masses. I you've, you know, shared in in detail in in other places, you know, some of the functions and and gone into how you're building out. I really like the the podcast you did with Redpoint, by the way, for those that wanna get into the how Notion is made. I thought that was a really good 1. So skipping over all of that kind of stuff and now just saying, okay, for the stuff that it's not able to do yet, as well as you would like, and as well as you would hope to provide to Notion users, what do you tell the model developers are the shortcomings? Like, where do they need to improve to give the quality of experience that you think, a, like, they kind of should be able to get to next and, you know, would be most useful at that intersection of, like, feasibility and value? I can run down a list, but you could just respond free form.

Linus Lee (1:11:46) This is a list that's always top of of mind for me as well because I talk to all everyone on the team talks to companies building models, and they always ask us this question. The main ones that are always top of mind are we want models that hallucinate less, we want models that are cheaper and faster, lower latency, and we model we want models that follow instructions better. And there's a fourth 1, which is like a big 1, but very hard 1, which is like we want models that are better at general reasoning. All 3 of these, so not hallucinating, lower latency, and following instructions more faithfully, following complex instructions more faithfully, are I think they are all areas where, for example, a cloud 2 or GPT-four have shown market improvements over the previous generation models, except for obviously, they're both more expensive. But I think those the combination of those 3, or couple of those 3 have enabled entirely new kinds of products. So for example, notion has a product called q and a that helps you answer questions based on all the information in your knowledge base. And to do that requires both not hallucinating as much, not just from its own memory, but from information that it has read in its context. But it also requires following extremely complex instructions, because we have massive books of instructions metaphorically, that are just like dozens and dozens of bulleted lists of here's how you should formulate your answer, here's what you should do, here's what you should not do, here's exactly what you should do in this situation versus this other situation. And earlier models have had a lot of trouble following that kind of a 2,000, 5,000 token long instruction. And so long instruction following, I think is like an area where we've continued to see improvements even even in the latest iterations of GPT-3.5, and it's very essential. Pollutionation makes a lot of sense. And then just as instruction following enables a bunch of use cases that were impossible before, I think a latency, lower cost and faster latency enables a bunch of use cases that were hard to pull off before. Version also has a feature called AI autofill, where you can use a language model call kind of like an Excel formula, where you have a data table or a database of interviews or companies or customers or your classes or whatever. And then instead of filling in a column with a formula, you can fill in a column with language model prompts. And to be able to do that at the scale of tens of thousands of rows that some of our customers have in their tables requires models that are both fast and latency and also efficient, because we want to be able to send OpenAI or Anthropic thousands of these requests in a matter of a few minutes and have them be fulfilled reliably. And so that's an example of a use case where as the models continue to get better at this like pereto frontier of latency and cost versus capability, we're going to continue to be able to make new kinds of products.

Nathan Labenz (1:14:27) I often have this debate with product people where I say, you got I think it was in your fourth or fifth spot before you got to general general reasoning ability, general kind of capability. I always put that first. And I say that as somebody who's also, like, kind of scared of an AI that's smarter than me. But definitely, feel like on my personal utility, you know, the results that I get day to day, I'm always like, it's pretty fast, you know, and it's pretty cheap. It would be nice if it was faster and cheaper, but like what I really want is for it to be smarter. What I really want is for it to, like, write as me better, you know, better in context learning or better kind of style imitation on the fly so it can kind of put something together in the way, you know, or at least close to the way that I would want it. Why isn't that like the very number 1 thing? I mean, if if you had that, couldn't you charge like hundreds of dollars a month for Notion AI? Is there something I'm missing about that?

Linus Lee (1:15:26) Part of the reason why that's at the end of the list is because it's it's I think it's much harder to get to basically. This, again, obviously this is something that new models continue to get better at, and I think like between going from GPT-3.5 turbo to GPT-four, for example, we saw improvements in how, like, how people were using it and the quality of feedback that we were getting. Better at reasoning is like, is obviously equal. But also, if we go to like Google and say, hey, we want models that are better at reasoning, their engineers are going to be like, what do we do with this? What do we do with this piece of feedback? Like, obviously, we're trying to make models that are better at reasoning, but how do we measure that? What exact kind of reasoning? And so in some ways, following instructions and not hallucinating are facets of reasoning that I feel is most tractable in a few months to a year timeline, and also sort of have the unblock the most urgent current problems with the existing products that we have. Another way to look at this is, I think a part of your desire for better reasoning is a function of the way that language models have been packaged for you, which is you interact with it in a single threaded way, 1 at a time, as the model is generating the tokens in real time, and you ask it questions. And I think there's a lot of other ways that language models can be used that isn't packaged that way. So for example, the autofill feature that we mentioned, that's a bulk operation. Someone might upload 2,000 rows CSV and fill in 5 different columns. And wouldn't it be better? First, wouldn't it be good if you could do that, which is a cost problem? And then wouldn't it be good if you could write a prompt, run that in a few seconds later, have the entire table be filled, and you can filter it by category or feature or where people are and so on. And those kinds of experiences are not really bottlenecked by reasoning. Most of the time they're bottlenecked by things like speed and latency. And especially if you're trying to do this at the scale of billions of tokens generated per day, this is a huge, huge problem. There are categories of products that, for example, smaller startups might be able to launch, but we might not be able to launch just because for us to deliver that same thing would cost us enough money to not be feasible. And then I also think there's a lot of useful work that I always kind of joke that most business documents that most people write, or most academic documents that most people write, are just interpolations between other business documents that have previously been written before. And so we did for a lot of the and I think it's like 95% of all documents that are being written inside Notion, not because there's nothing novel in there, but just because, you know, given enough information and diversity and all the inputs that these models have seen, most instances of problem solving or doing important work, communicating well are like within the training set. And so for those, I don't actually think the biggest problems are like getting the model to reason outside of what's already seen, but just being better at being better at following instructions so that we can steer it towards the right space in the in the space of events that's already seen.

Nathan Labenz (1:18:11) Perfect. To transition to my next question around the next big model upgrade. You kind of alluded to it a little bit, but what does that look like? And, you know, 1 because 1 of the big theories I have right now is that we have as an as an industry, as an AI app industry, if you will, we have been building a ton of stuff that basically tries to compensate for the language model's weaknesses. Right? We're like, they hallucinate, so we ground them in database access. They, you know, have very limited context, so we gotta, like, rerank the results to try to make sure we're getting the right stuff into context. Obviously, we have you know, every part of that pipeline can fail. And then, you know, you might just have something come along that, like, changes the game, and maybe it just did in the form of Gemini 1.5 with up to 1000000 and maybe up to 10,000,000 tokens of context and, like, some pretty wowing recall demos. I guess, first of all, it'd be interesting to hear, like, kind of what process you'd plan to go through. I'm sure you have, like, a standard set of evaluations that you would, you know, get, like, systematically comfortable that, like, okay. We're not regressing. Interested to hear what that's like. But then I kind of also wonder, do you see that the next model like a Gemini 1.5 could perhaps lead to a significant leap in performance given all that scaffolding or possibly even, like, make some of that scaffolding kind of not necessary anymore? Like, do you think the system itself maybe gets simplified because the new model just, like, is more tolerant of just, hey. Just throw some more stuff in it. You know, we don't have to work so hard on on some of these previously hard problems.

Linus Lee (1:19:52) I'm very curious what use cases motivated. Million token context window for the new Gemini models. My hunch is that it's actually mostly multimodal use cases. I can imagine a long piece of audio or long piece of video filling up the context. But I think most of the time in the if you're if you're purely in the in the text realm, I think it's difficult to imagine there's a lot of benefits of retrieving limited context, rather than just putting everything in model window. Some of them include observability. So if you give the model 10,000 inputs, and it gives you the right answer, or get it and it gives you the wrong answer, how do you debug that? Right? Maybe you can look at things like attention maps and so on. But it's like, that is an interpretability problem in itself, where if you have a pipeline that gives you maybe the top 10 documents and has a language model answer that answer the right question. If if got it wrong, you could ask useful questions like, did the answer exist in the documents that it saw? Was it at the beginning or the end of the context? If you swap this out, does the model get a different answer? And so having a world pipeline system, I think, helps you debug. I think it helps with costs, obviously, and latency for a given compute budget. I think it also lets you incrementally upgrade different parts of the system. Having a better language model while keeping the same retrieval pipeline versus having a better retrieval pipeline with the same language model both improve the results. So I think there's a lot of just structural benefits to having this pipeline model. And on on the, like, Gemini results in particular, they have a lot of these tests that are kind of needle in the haystack tests. If I have 1000000 tokens, there's there's an example of an anomaly somewhere in the million, can you find it? And in the real world, a lot of the kind of complex retrieval that models have to do is actually so much more nuanced. An example of something that more nuanced could be like a couple of examples. 1 example is the model might have to not just find the information, but do some reasoning to figure out whether that information which of the pieces of information to use. Like in Notion's case, in the retrieval documents, some documents may be out of date, some documents may be written by someone who's not authoritative, some documents may be written in a different language, and it might conflict with the with information written in, like, the canonical document written by the HR team and a company or something like that. And the model has to and we have instructions for how to deal with all of these cases. And so model has to do some reasoning over exactly what. And that looks very different than just like find this word and some tokens. Another example of a more complex retrieval case is when there needs to be synthesis. So if you ask a question, if you're like in the Notion, internal Notion, and if you ask a question like, has the AI team been productive this week? That's a multi step question that not just, first of all, that requires knowing not just a single answer to a question, but in general, what's been going on. And then that requires like maybe reasoning through, okay, what does what does mean when this person is productive? Who are all the people in it? Maybe are other people that are out of office? And so, again, of problems there are not just about like, all the what information is in the context, but actually how the model performs. And the biggest challenges that we've faced so far in optimizing retrieval is mostly those kinds of things, those kinds of more like reasoning related or edge case situations where it's unclear exactly what the model should do based on our existing instructions. And the best ways that we found to attack those problems involves a lot of like stepping through all of the steps in the chain and saying, the answer found at this step? Was the answer found at this step? And all of that debugging is just much easier when you have like 10 examples to look at instead of 10,000. So at least for for this particular use case, I think context length is not the most dire need. But obviously there are lots of other use cases, like for example, where I think it could be a game changer.

Nathan Labenz (1:23:23) Yeah. Very interesting. I'll be very keen to spend more time figuring out exactly what I can do with 1000000 tokens and totally agree with you that the needle in the haystack results that we've seen are not not enough of a an answer to, you know, how it really performs to be confident just dropping it in. Although I, you know, aim to find out exactly how far this thing might have advanced. And it does hold a lot of promise for me just because, first of all, this podcast, you know, might be more than, like, GPT-four turbo or Claude 2 can really handle. It probably would fit. You know, the transcript probably fits in the in the 1 28 k or certainly would fit in the 200 k, but it's getting to the point where, like, the recall isn't great. And just to be able to, like, take my podcast transcripts and throw them into something and get, like, reliable answers, you know, that I feel like are better than I could delegate, you know, an answer pretty reliably. Like, that alone is, like, super exciting. And then it'll it'll be really interesting to see how the synthesis part of that works with those.

Linus Lee (1:24:30) Yeah. I mean, there's a part of me that wonders like training a model of such long context requires probably architectural inventions, probably some like engineering work, definitely a lot of extra data work. There are only so many examples that are high quality that are that long. And so there have to have been some good use case motivation for Google to go ahead and train that model. And I'm very curious, even just internally what they're using that context for.

Nathan Labenz (1:24:55) Yeah. Agents, I think, is gonna be 1 also super interesting application. Like, to what degree can you just kind of append your past failures and continue to roll forward in a sensible way learning from what happened? Today, obviously, you know, you can't just drop all your API failures into a single context window. You'll you'll run out. You'll still run out even to 10,000,000 eventually, but it could make a big difference for that sort of thing. How about just any big tips that you have for AI app developers, things that you like know to work that you just don't see people doing enough of? This could be in the prompting domain. It could be in the RAG domain. It could be in how people set up their evals or even just the user experience and the interface that that are presented.

Linus Lee (1:25:45) Both of these are kind of boring things, but I think they're not. They're they're always you could always be doing more. The first is regardless of what exact kind of model you're training, we're working with, like, obviously always pays to spend time with data. And I don't just mean like, run an evaluation, look at the charts. I mean, like, read the logs, exactly, like, make the table of like 100 outputs and 100 inputs, and ask yourself for this input, how would you what output would you the human generate, and then think about what all the steps that you go through and reason through that, and look at edge cases, look at what the models fail. I've also found it just like personally interesting, if a little dubious in value to look at pre training data, like like raw pieces of text from the pile dataset, there's a lot of stuff in like, most of the text on the internet is like quite garbage. And there's a lot of stuff in there that I think, like gesture app, for example, the reason that language models are good at certain kinds of output formats and not in other formats. But the general theme, think is spend a lot of time with your data, in particular, with your input data that you're giving the models and the tasks. Then also, with failure cases of the wild. In the beginning with Notion AI, we spent some time setting up a system for us to have kind of like human annotated logs and a more kind of scaled automated system for detecting errors and fixing them. And eventually what we've settled on for a lot of our features is instead, we have like the engineers have scheduled time on our calendar every week, where we go into a meeting room and we just stare at a notion database of all the bad cases, like individual outputs that were bad, that were reported by our users and ask ourselves for each input, what is the exact step in the pipeline where this failed? What category does this belong in? Is this We kind of treat it like a software bug, and we say, is this already being fixed? Is this an instance of a new bug? Is there a systematic issue in a pipeline that that requires us inventing something new to fix this? So spending a lot of time with failure cases and data, I think pays off. And then sort of on a similar theme, I think having investing early in internal tools to quickly run evaluations, quickly and easily run evaluations, quickly and easily kind of generate synthetic datasets, visualize outputs, sort through them. I think that helps a lot. And there are a few companies out there doing stuff like this. I think for working with your own data and visualizing it in particular, that generally the tools shouldn't be so complex that you can just have an engineer kind of like whip it up in a couple of days. And I think that pays off in the long run by you being able to customize it over time, I alluded to before. So that's kind of what we've done in Notion. Obviously, your mileage may vary, but I've found those 2 things to be like particularly worth their time in terms of improving the product.

Nathan Labenz (1:28:11) I think those are very solid tips, and I agree with you that they're you know, while it may be sort of not exactly what people are looking to hear, the admonition or reminder at a minimum to look read the logs. Just look at the raw data. Look at the failure cases. I have certainly been served extremely well by that over time too. Like, there's just no substitute for looking at the raw inputs and outputs.

Linus Lee (1:28:35) Once you understand the problem at a deep level, then you can start thinking about, like, we're running out of time during the week to do this. How can we scale this out? How can we automate this? But like the automation is not really possible until you have a pretty solid understanding of exactly what you're trying to monitor for or automatically fix or or detect.

Nathan Labenz (1:28:52) Yeah. Okay. Cool. Alright. Last section of things that could possibly get weird. 2 big trends. I'd say 1, you know, kind of predates the other a little bit, but both are increasingly well established at this 0.1 big trend is that it seems to me like all the latent spaces can sort of be mapped on 1 to another. Right? We saw this kind of arguably first in a really powerful way with clip where image space and text space were kind of brought together, and now we have text image space. Like, amazing. But that was done in a high scale way. And then it wasn't too long after that that we got things like, for me, blip 2 was really a major moment. I'm sure it wasn't the first, but it was, like, 1 of the first that I really read and understood deeply and realized that, holy moly, like, they've got a frozen language model here and a frozen image model trained totally separately. And then just a small bridging connector model between them that just took, like, a couple GPU days to train is enough to unlock, you know, even that that was, like, beyond clip level capability. But we just see so many examples of this now, right, where 1 latent space to another, often just a linear projection, sometimes a small connector model is kind of all it takes. And then another big trend has been the realization that, like, a lot of models can be kind of merged together in seemingly unprincipled ways or, like, low like, not very principled ways. Right? We've seen some going back to, like, I think it was called relative representations, was a paper that showed that models, like, initialized differently, trained, you know, with kind of different shuffles of the data or whatever, seem to converge on, like, pretty consistent kind of isomorphic representations of data that are, like, maybe a rotation away from each other or maybe, you know, a dilation and a rotation away from each other, but ultimately, like, look very similar in visualization. And now we're getting to people saying, well, hey. What if we just train model a over here on 1 dataset and b over here and just, like, add their weights together? You know? And it it's like, wait a second. That works? It doesn't all maybe always work, but, like, to a surprising degree, these sort of things are starting to work. So you can comment on that in, you know, any and all ways. But the the way that I'm most interested to ask you about is, like, is there a risk or a concern, or maybe you would see it as a as a good thing, that AI systems may start to communicate between each other in a higher dimensional, more embedding mediated way or or, you know, middle layer activation sort of way as opposed to through different forms of communication that we could actually read in a native way. Right? It seems like the the models don't really need to generate text to, you know, send the text over to 1 another. Instead, they probably are gonna perform better with high dimensional, you know, vector communication. But I kinda worry about that at the same time, even as it may boost performance in various ways. I kind of worry that, like, much like you said earlier with the how do you debug that? You know? Now it's like, what? I can't even read the messages that they're passing back and forth. So how big do you think those trends are? How do are you excited about those trends? Do you see, you know, downsides with the upsides? You know, all things about kind of weird embedding based communication and model merging.

Linus Lee (1:32:16) Yeah. Model merging is, like, truly weird, and I don't have a good principle and understanding of exactly why it works so well the way it does. I have a general intuition of, like, why it works why it works decently, which is that a lot of these models are fine tuned from a single base model. And when you fine tune these models, especially in tasks that look similar to the original task, like mostly doing natural language continuation, the models mostly tend to exist in a kind of linear subspace of the original model space. So when you merge them, you're mostly doing linear interpolation between weights of these models. So it makes sense that perhaps it makes sense that the resulting model will just inherit behavior between a bunch of different models. It's still very weird, but when you view it that way, it doesn't seem quite like black magic. It just seems like kind of something that we have to understand. Relative representations and other kinds of projections between spaces, think are really, really cool. It's it's cool, obviously, like a theoretical level. I think it's also really cool to observe it at a technical level, empirical level. 1 of the other experiments that I did late last year was to find mappings between embedding spaces. So you could I had a model that was capable of, like, inverting and finding features in 1 embedding model, and then I just trained an adapter between OpenAI's embeddings on my model, and and it was a linear adapter, and it could start to read out OpenAI embeddings even without spending so many tokens like training an OpenAI custom kind of inverter. So all of these things I think are super fascinating. I think where I place this technique in the pantheon of all these different things you can do with language models is a parameter efficient or compute efficient of fine tuning or of customizing a model, where instead of fully tuning a new image to text model or fully tuning a new way to invert an embedding, you could take in a model that's like close enough in what it represents, and then maybe tune some few parameters to get to the final destination. To your question about whether models will communicate in latent spaces, if you think about it, like GPT-three is just kind of like 100 GPT-2s talking to each other in activation space. Obviously, it's not quite that. There's denser connections, but if we can manage to precisely understand exactly how different layers in a transformer or different token residual streams in a transformer communicate with each other. I think a lot of those techniques will definitely generalize to understanding mixture of experts models or understanding ways that fully tuned models, like the Bluetooth that you mentioned, communicate with each other through mapping a representation space. In some ways, it's actually easier because unlike a transformer residual stream where the concepts that it may be representing could be really weird, like you can imagine a concept that's really useful for predicting the next token in a Python program, but not really useful for humans in general in life. A lot of times when you're mapping between fully formed representation embedding spaces between an image embedding space and a text embedding space, I think most of those, I intuitively expect most of those concepts to be pretty human interpretable. And so a lot of the kind of mechanistic techniques that I think people are working on today will probably generalize to understanding them. And so I I just view it as kind of a exciting, efficient way to build more interesting systems.

Nathan Labenz (1:35:18) How weird do you think things might get over the next couple years in general? I mean, it seems to me like we're headed for at least 1 more and probably more than 1 more notable leap in model capabilities. And I feel like I have a sort of rough intuition for what GPT-five would be, which I might probably describe as, like, smarter to borrow a word from Sam Altman, you know, better better general reasoning capabilities and probably more long term coherence as another big aspect of that. You know, they've seen how many people are trying to make web agents and various kinds of agents on the platform, and it's just not quite working. So I would expect to see, like, smarter and sort of more readily goal directed as kind of too big advances for the next generation. Beyond that, it starts to get honestly kinda hard to figure out, like, what would even happen and, you know, what it would mean. But do you have kind of a sense for like, how how big is your, like, personal Overton window or kind of, you know, cone of possibility for the next few years? Like, how how far do you think AI might go in a few years' time?

Linus Lee (1:36:30) I mean, everything monotonically improves from here. Right? I think that's the scary part. Has this good video on Sora where he hears this phrase of like, this is the worst that this this technology is going to be from here on out. And I I think that's a really succinct way of expressing the fact that like, okay, you may maybe you think GPT-four is like, not not like super, super, super smart. But like, this is like, if you look back at the history of smartphones, every phone when it came out is the worst that smartphones are ever going to be from that point on out, and it's only gotten monotonically better. I think when you think about that, I think language models monotonically improving so rapidly from here and out, just like as a trend line, I think is interesting and scary. Their long term coherence, and I think goal directedness in particular is really interesting. Right now, with every GPT iteration, I think OpenAI has done a little bit of not just making the model, the base model smarter, obviously, but some opinionated tweaks to the final tuning objective to make the model more useful for some kind of use cases. Obviously, the big most recent 1 was, like, API use before that it was chat, which I could imagine OpenAI or other model makers tuning their models to be better at kind of not just having the model expect the world to end after the next turn in the conversation, but expecting that there's further turns and maybe planning for, okay, if I assume that I'm given infinite turns into the future, what might I start to do? So that kind of long term planning, think, something that's missing in current models that make it very hard to use for agents. So agree a lot with that. Then beyond models themselves, there's obviously lots of corners of culture that these models touch. And I think that's a much harder, much more complex, dynamical systems, I think it's much harder to predict exactly what will happen, like, to the concept of copyright, for example, or even to our concept of exactly what a single piece of creative artifact is. There's a really good TEDx talk by creativity and HCI researcher, Doctor. Keith Compton, where she talks about the idea of an image generation model, not just like, when a human produces a piece of art, you make the thing, and then it's just like that, that is the concrete object. But when you have a model that's capable of producing a bunch of images to a single piece of text or millions of images at once, 1 way to look at it is some of the models just producing art faster than a human. But a different way to look at it is the model is a tool to map out the space of all possible outputs of a certain style of art or to a certain prompt. And so it starts to kind of change exactly what we imagine a single piece of artifact to be from a single blob of pixels to like, here's now a kind of subspace of all possible outputs that as a bundle is a form of creative expression. So there's a lot of culture stuff that I think it's just much harder to predict that I'm, like, frankly not equipped to. I have too much smart commentary on, but I think it would be very interesting to watch and probably have ripple effects beyond the model capabilities themselves.

Nathan Labenz (1:39:15) Do you have a a sort of positive vision for the future for just, like, your own life? I find that this is in very short supply, and certainly you've been 1 who has been so up close and personal with the AIs as they've developed. You might not, but if you do, I'd be interested to hear your sort of day in your own life, you know, 3 years from now, 5 years from now. What is AI doing for you? You know, what are you able to do that you couldn't do before? You know, who knows what Hollywood's doing or how the, you know, how the entertainment industry has evolved, but like, what is what is the day look like for Linus as things really start to hit, you know, the the key thresholds for utility?

Linus Lee (1:39:55) The high level concept that I gravitate towards when I think about this is like, you can take a base technology and express it in a way that's agency taking or agency expanding, or agency amplifying an example of a tool that takes away agency from a human is like a dishwasher. But like, that's fine, because I don't actually care about creatively washing my dishes or exactly which order I wash my dishes. I just want them washed, or like a laundry machine or a car maybe. And then there's a bunch of technology, packaging, ways of packaging technology where preserving agency really matters, like a writing tool is an obvious example, but maybe also more subtle things like which emoji show up first in an emoji keyboard or predictive text keywords or obviously social media algorithms. These are kind of like somewhere in between agency taking and agency amplifying. And 1 thing that I'm kind of concerned about right now is that I don't think people are thinking enough about whether the ways that language models are packaged is amplifying human agency or taking away from it. And that's something that I think I want to talk and think more about and perhaps push other people building the space to to more be better at. Assuming that we can steer the way we package language models to kind of respect agency where it is acquired and only take agency where we want the models to take agency. I'm generally a pretty optimistic person with technology. I think I have a lot of optimism for for where this leads, as long as the way we package these things is more humanist, rather than just kind of like automate all of the things. When you look at different kinds of AI companies, you see companies situated at different points in the spectrum between you want models to automate things in a way that takes away agency, I. E. Replacement, or do want models that amplify? For example, like, I think OpenAI is very much on the replacement side. Literally, their definition of, I think, AGI is something like a thing that can take over us as a single, like, full humans job, where if you look at a company like Runaway, a lot of their framing of usefulness or depth, perhaps a lot of their framing of usefulness is about extending that agency of what you want to express. And so there's a healthy amount of diversity here, and I think it's just a matter of where the winners sort of end up lying. Assuming we get that right, I think I have I have a lot of optimism for where we're going.

Nathan Labenz (1:42:11) It's a big question, but I agree that's a key 1. We want to guard our agency probably increasingly jealously, especially the more different AI systems might wanna usurp it. That could be a great note to end on. Anything else you wanna touch on that we haven't?

Linus Lee (1:42:24) No. I think we covered interfaces, capabilities, interpretability, all the all the things I spend my time thinking about.

Nathan Labenz (1:42:32) How people can be more like you, of course, a key highlight as well. Alright. I love it. Well, thank you very much for doing this. You've been really generous with your time and insights and definitely count you among the must follows in the space for all sorts of new and very generative ideas. So it is my honor to have you. And I will say in closing, Linus Lee, thank you for being part of the cognitive revolution.

Linus Lee (1:42:53) Thank you. It was my pleasure.

Nathan Labenz (1:42:55) It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email, or you can DM me on the social media platform of your choice.

Scaling Intelligence Out: Cisco's Vision for the Internet of Cognition, with Vijoy Pandey

Scaling Intelligence Out: Cisco's Vision for the Internet of Cognition, with Vijoy Pandey

Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools

Interfacing with AI, with Linus Lee of Notion

Watch Episode Here

Read Episode Description

Full Transcript

Transcript

Read next

Scaling Intelligence Out: Cisco's Vision for the Internet of Cognition, with Vijoy Pandey

Scaling Intelligence Out: Cisco's Vision for the Internet of Cognition, with Vijoy Pandey

Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools

Interfacing with AI, with Linus Lee of Notion

Watch Episode Here

Read Episode Description

Full Transcript

Transcript

Read next

Scaling Intelligence Out: Cisco's Vision for the Internet of Cognition, with Vijoy Pandey

Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools

Zvi's Mic Works! Recursive Self-Improvement, Live Player Analysis, Anthropic vs DoW + More!

AI Scouting Report: the Good, Bad, & Weird @ the Law & AI Certificate Program, by LexLab, UC Law SF