In this special crossover episode of The Cognitive Revolution, Nathan Labenz joins Robert Wright of the Nonzero newsletter and podcast to explore pressing questions about AI development.

Watch Episode Here

Read Episode Description

In this special crossover episode of The Cognitive Revolution, Nathan Labenz joins Robert Wright of the Nonzero newsletter and podcast to explore pressing questions about AI development. They discuss the nature of understanding in large language models, multimodal AI systems, reasoning capabilities, and the potential for AI to accelerate scientific discovery. The conversation also covers AI interpretability, ethics, open-sourcing models, and the implications of US-China relations on AI development.

Subscribe to The Nonzero Newsletter at https://nonzero.substack.com and Podcast at https://www.youtube.com/@Nonze...

Apply to join over 400 founders and execs in the Turpentine Network: https://hmplogxqz0y.typeform.c...

RECOMMENDED PODCAST: History 102
Every week, creator of WhatifAltHist Rudyard Lynch and Erik Torenberg cover a major topic in history in depth -- in under an hour. This season will cover classical Greece, early America, the Vikings, medieval Islam, ancient China, the fall of the Roman Empire, and more.Subscribe on
Spotify: https://open.spotify.com/show/...
Apple: https://podcasts.apple.com/us/...
YouTube: https://www.youtube.com/@Histo...

SPONSORS:
Oracle Cloud Infrastructure (OCI) is a single platform for your infrastructure, database, application development, and AI needs. OCI has four to eight times the bandwidth of other clouds; offers one consistent price, and nobody does data better than Oracle. If you want to do more and spend less, take a free test drive of OCI at https://oracle.com/cognitive

The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR

Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/

Head to Squad to access global engineering without the headache and at a fraction of the cost: head to https://choosesquad.com/ and mention “Turpentine” to skip the waitlist.

CHAPTERS:
(00:00:00) About the Show
(00:00:22) About the Episode
(00:03:39) Introduction and Background
(00:06:58) AI Capabilities and Understanding
(00:12:22) Discussing Martin Casado's Views (Part 1)
(00:14:44) Sponsors: Oracle | Brave
(00:16:48) Discussing Martin Casado's Views (Part 2)
(00:21:51) Multimodal AI and Concept Representation (Part 1)
(00:31:40) Sponsors: Omneky | Squad
(00:33:26) Multimodal AI and Concept Representation (Part 2)
(00:38:05) AI's Potential and Limitations
(00:45:35) AI Safety and Risk Assessment
(00:53:31) AI Development and Global Implications
(01:03:30) Open Source AI and International Relations
(01:11:27) AI Ethics and Human Values
(01:22:06) AI Risk and Existential Threats
(01:31:21) Open Source AI Concerns
(01:38:20) China-US AI Relations
(01:48:36) State Space Models in AI
(02:02:32) Conclusion and Recommendations
(02:03:54) Outro

---
SOCIAL LINKS:
Website : https://www.cognitiverevolutio...
Twitter (Podcast) : https://x.com/cogrev_podcast
Twitter (Nathan) : https://x.com/labenz
LinkedIn : https://www.linkedin.com/in/na...
Youtube : https://www.youtube.com/@Cogni...
Apple : https://podcasts.apple.com/de/...
Spotify : https://open.spotify.com/show/...

Full Transcript

Nathan Labenz: (0:00) Hello and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas and together we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my cohost, Eric Torenberg. Hello, and welcome back to the Cognitive Revolution. Today, I'm excited to share a special crossover episode with Robert Wright, publisher of the Nonzero newsletter and host of the Nonzero podcast. Inspired in part by my recent episode with Martin Casado from a16z in which we debated how powerful AI systems are likely to become over the next few years. In this episode, Bob interviews me about the pressing questions surrounding AI development capabilities and risks. We try to be as accessible as possible to a general audience without shying away from technical concepts or critical questions that don't have easy answers. We began with a discussion of what we know about the nature of understanding in large language models and to what extent we can productively compare them to human cognition, and then continued on to discuss a wide range of topics, including the development of multimodal AI systems and their potential for more robust world modeling, the current state and future potential of AI reasoning capabilities, the potential for AI to accelerate scientific discovery and technological progress, the fascinating work being done on AI interpretability, including Anthropic's recent experiments with Golden Gate Claude, the remarkable degree to which LLMs do understand human values and ethics and why that's definitely not something we should take for granted, why I'm not advocating for a pause in AI development right now, as well as what future developments could cause me to reverse course and begin to do so, the emergence of new AI architectures like state space models and their possible implications, And finally, the pros and cons of open sourcing powerful AI models, especially as it relates to the increasingly fraught US China relationship and the risk of an international AI arms race. On this last topic in particular, it's worth noting that just keeping up with AI research and development is a full time job, and I am by no means an expert in US China relations. So I'm genuinely very uncertain about what US policy should be toward China with respect to AI. I'm instinctively very skeptical of the idea that we are the good guys, and thus we should be the ones that humanity trusts to develop transformative AI systems first, really for any possible meaning of we, including any of today's leading AI companies or even the West in general. I do believe it's super important to continue to question the wisdom of escalating tensions between major world powers, especially given the potentially transformative nature of near term AI developments and their obvious military applications. But I do hear compelling arguments in many different directions, and there doesn't seem to be any truly safe path forward. All I know for sure is that this topic is super important. And for that reason, while I normally try really hard to ground my analysis in concrete facts and avoid saying anything that could later be proven wrong, in this particular area, I feel like it's worth sharing new ideas and testing new arguments even while I fully expect that my position will continue to evolve. As always, if you're finding value in the show, we'd appreciate it if you'd share it with friends, post online, or leave us a review on Apple or Spotify. And I love hearing from listeners, so please feel free to DM me on your favorite social network anytime. For now, I hope you enjoyed this more speculative than usual crossover episode with Robert Wright of the Nonzero newsletter and podcast.

Robert Wright: (3:39) Hi, Nathan. Hi, Bob. How are doing?

Nathan Labenz: (3:43) I'm doing great. Good morning.

Robert Wright: (3:45) Good morning to you. Let me introduce this. I'm Robert Wright, publisher of Nonzero newsletter to which everyone should subscribe. Mhmm. And this is Nonzero podcast. You're Nathan Labenz. You're both an entrepreneur and a podcaster. You founded a company called Waymark. Is that right? That that uses AI now. Didn't originally uses AI to help companies come up with, like, video marketing plans and and so on. And so you're you're you're you're riding the wave. You're riding the AI wave. And your podcast is really is really been valuable to me, Cognitive Revolution. Although my comprehension rate is, like, maybe 65% on a good day because your podcast is you know, it's I don't wanna say quite that it's for insiders, but, you know, it's you tell me what what what your kind of business model is, but it's pretty high level discussion. You tend to have entrepreneurs and a lot of actual kind of AI engineer type scientists, and you understand all this stuff. And so you speak at a reasonably high level. And the way and the way this conversation came about is I was listening to 1 of your conversations, was fascinated by the subject because it's something of of great interest to me. And I was thinking both, you know, I have some questions I'd like to ask to clarify things. Also, I thought it would be a service if I ask these, like, dumb questions and kinda brought the conversation down to a level, you know, where mortals can understand it. So does that sound like a did I did I flatter you too much or the opposite?

Nathan Labenz: (5:31) A little too much, I'd say. Yes. I you know, with apologies to Tyler Cowen, basically, what I'm trying to do on the podcast is just have the AI conversation I want to have. And I do always warn people that it's I don't classify it as entertainment. It is definitely 1 for folks who are looking to learn and understand, and that's really the mission that I'm on. Mhmm. It does take a, you know, a Nonzero amount of work to say the least, but I really enjoy studying the subject. And, you know, I would definitely caveat any first of all, I would say, I don't think there's any stupid questions because we're all racing to figure out what's going on with AI. And everybody's, you know, coming at it from a different background and, you know, catching up. We're falling by I'm I'm falling behind. So I think we're all falling behind to varying degrees and in varying ways. And the other important caveat is just that a lot of stuff remains unclear. And so I'll I'll try to flag, you know, what's kind of well established versus what's my best guess, but we're very much still figuring out what is going on, especially inside the AI systems. Mhmm. That's a very active line of inquiry, which I don't do, but I do find fascinating and and at least try to keep up with.

Robert Wright: (6:48) Yeah. And I should say you actually are good at explaining things. It's just that often you're talking to somebody who doesn't need them dumbed down. So you don't. Now you are talking to someone like that. The so the subject I wanna talk about, I was listening to this conversation with Martin Casado, who is at the VC firm, Andreessen Horowitz. Like many people at Andreessen Horowitz, he thinks that there should be very little regulation of AI, and there's no reason to try to slow things down or pause or anything. And his rationale is that this stuff is really not as powerful as it's commonly presented as being. It's not as advancing as fast as people say. And this in turn gets into the fascinating question of, like, what's going on inside these AIs? Is it very much like what goes on inside a human brain in some broad generic sense? And that's something I've, you know, I've written a little about in the Nonzero Newsletter. I wrote a long piece on the Chinese Room thought experiment, John Charles' Chinese Room thought experiment. And while listening to your conversation, I just had some it was really you you know, I had some kinda new thoughts and also some things that I wanted to get you to flesh out. So before we I I start grilling you, what anything you wanna say about this or about that conversation? Or

Nathan Labenz: (8:25) Well, I thought that conversation with Martin was, first of all, an encouraging, instance of just friendly and, you know, I'd say constructive communication across different AI worldviews. I'm broadly you know, I I sort of try to maybe imitate the smartest people who I would say generally admit a pretty high level of uncertainty as to what is going on inside the systems, how fast things are gonna continue to improve, you know, whether or not they may level off, and, like, what all the consequences are gonna be. So, you know, when I look at, like, the the leaders of the leading developers today and, you know, there I have in mind, in no particular order, DeepMind, Anthropic, and OpenAI, the leadership there is all remarkably frank that they don't have a great sense of what's gonna happen. Anthropic in particular has, you know, a couple of canonical blog posts that they've put out that basically say, we really don't even know how hard the AI safety problem is. You know, we started this company because we're really worried about it. It still could prove to be reasonably tractable, or it could prove to be impossible, and we don't even have that question answered for ourselves yet. So I try to follow that school of thought. And I think this very quickly gets into, like, burden of proof type debates as well because I think a lot of people can agree that we don't really know. And then the question, you know, sort of becomes, well, okay. If we're ultimately gonna try to figure out what we should do about it, we also need to be kind of clear on what our tolerance for various, you know, forms and levels of risk are because we're not gonna nail down to 0 risk or to a 100% doom or whatever. I don't I don't think that's, you know, at all feasible right now, anybody who's claiming those extremes, I think, is is way overconfident. But sometimes, you know, there's just a lot of ways that these these conversations can break down. I at least thought that 1 was, conducted in in good Yeah. Faith and good spirits, and I definitely appreciated that. But, yeah, I remain extremely, extremely humble in terms of my ability to predict. I often say too, like, we gotta first, can we answer what is and then move to what ought to be done about it? Even what is is, like, confusing enough.

Robert Wright: (10:48) Yeah. So he said something that is kinda in in defending his claim that these machines are in some sense overrated, that what's going on inside is less sophisticated than you might imagine. He sometimes sounded a little bit like he was saying they were mere, quote, stochastic parents. That's parents. That's not his quote, but I got kind of that vibe. He said, look. It's just distribution in, distribution out. You're just you know, the the the machine is assessing structure in in these symbols that are that are fed into it, whatever, you know, predicting the next sentence. And I wanna say, in this conversation, I wanna go well beyond large language models in the narrow sense of processing human language because I don't think you can talk about the question of to what extent these machines will ever have something we can call understanding without getting into the multimodal stuff, you know, the the the processing audio, video, even tactile, and so on. But was your sense that that he was kind of saying another term is just fancy autocomplete. Right? All it is is predicting the next word that the average person who had uttered the first part of the sentence might say, you know, based on statistical patterns and language. Is that to some extent his argument?

Nathan Labenz: (12:22) Great question. A little bit hard to pin down. I I I would say definitely there are there were moments where, yeah, I would say it felt like that. And then there were other moments where it did seem like he was giving the current systems a little bit more credit. And, you know, there was sort of the we talked about, like, the Golden Gate Bridge, you know, or the Golden Gate Claw demo, for example.

Robert Wright: (12:48) How We should say is just for background so Anthropic, which is probably done more than any of the big companies, maybe to understand what's going on inside inside these things. But but but it, you know, it it found a way to tweak the LLM so that it could be inordinately interested in certain things. And and and for fun, it made it kind of obsessed with the Golden Gate Bridge. So if you said, how should I spend $10? It would say, you should drive across the Golden Gate Bridge and pay a toll. And and that that's Golden Gate, Claude. So go ahead.

Nathan Labenz: (13:21) Yeah. And what's, I think, really important about that is that to make that work at all, they needed to understand how does the model internally represent the Golden Gate Bridge concept so that they could inject that in a synthetic way no matter what you, the user, were talking about and then induce that behavior. And so I think, you know, this is an an instance of the ability to pry open the the black box, figure out to some significant degree what's going on inside. And I always try to ground these things out wherever possible, and it's not always possible yet. But wherever possible, I try to ground these ideas in engineering. Can we actually use this in a way that shapes system behavior in a way that is at least somewhat reliable? And with the Golden Gate Claude experiment, they did that. They showed that they could isolate this concept, and not just sort of hand wave or, you know, suggest, but actually feed it back into the system and create all these compelling experiences where you had an AI that was largely normal, but weirdly obsessed with this 1 particular And by

Robert Wright: (14:32) the way, they could do, subtler and less comic things like make it more sycophantic so that it would, like, flatter you and praise you. They can kind of change the personality of the machine.

Nathan Labenz: (14:44) Yeah. They found so Golden Gate Bridge is 1 of, I think, north of 10,000,000 different features, they call them, that they have isolated through this technique of sparse autoencoders. The jargon here obviously can get out of control pretty quickly. But the basic idea is that the models are they're large, but they're only so large in terms of how many parameters and and numbers they contain. And at each step, people I don't know how much people will, you know, know about the transformer. Right? But the sort of most famous architecture right now that's driven, you know, all the advances over the last few years, I would definitely put a pin in that and say, I do not think the transformer is the end of history, but it has been, you know, the biggest architecture used over the last few years. It's con it consists of all these layers. Each layer basically has 3 core components. It has the attention mechanism. It has the MLP, the multilayer perceptron, which is, like, 1 of the oldest, you know, things in in a in neural networks. And then it has this activation function, which is a nonlinear function. And that is worth mentioning because people often say, oh, it's just linear algebra. There is something nonlinear there. So just in a very technical sense, there is something that is a nonlinear function inside the transformer. Anyway, between each of these layers, there may be, you know, dozens of layers in a big transformer from start to finish. Between each 1, there is a array that represents all of the information array an array of numbers that represents all of the computation, all of the the results of all the computation that has happened so far as the thing is proceeding through its computation. And that is actually relatively small. That's often known as d model in the literature, and it's usually something like 4,000, 8,000. The big ones can be, like, 16,000. So all this information that is represented that that is the result of all the calculation that's happened up to that point is represented in this single array of 4, 8, 16,000 numbers. Obviously, there's way more concepts in the world than that. And so the challenge becomes, okay, how is this thing representing all these different concepts in such a small space of numbers? Mhmm. And the answer is it is packing them in very densely in ways that hopefully don't interfere too much. You could say if I wanted to have perfect clarity, perfect interference, and I lived in a a world of relatively small important concepts, you could have just 1 number represent each each number could represent 1 concept, and the sort of strength of that concept could just be, like, you know, how much the number is activated. Right? So you could have blue could be 1, and red could be the other, and and you could have blue turn up or red turn up. If they both turned up, you know, that might mean purple. Okay. Cool. But there's just too many concepts. So they've done a huge engineering challenge to try to untangle this dense representation. They call this polysemanticity because any individual position in that array of numbers will be activated for lots of different concepts. And it's like, okay. Well, maybe it's position 1 and 47 and 4000 that together activate to represent a certain concept. How do they sort all that out? It wasn't easy, but they've made a huge amount of progress, and now they've untangled to the point where they have north of 10,000,000 different features identified. Some of them are these, like, super clear, clean, you know, proper noun based things like, the Golden Gate Bridge. Others are a lot harder to parse. And and that I think there remains quite a bit of work to do there in the long tail of concepts because, you know, it's like, well, what what is that? Is that something that we is that something that the model has sort of learned that we might also wanna be paying attention to? In some cases, that could be the case. In other cases, it could just be sort of a weird, you know, hard to understand thing. Mean, these these things are definitely, in many ways, quite alien. But, anyway, going back to Martin, I mean, I think he did give a little more credit to the notion that there were some concepts that were in represented in meaningful ways. I'll air quote meaningful because what exactly is meaningful? But that these concepts seem to be usable in engineering, he was willing to grant. But then he still was sort of like, but the key is we created those concepts. They're learning those concepts from us. And so I think we're the real sort of, you know, division or, you know, let's say, almost continental divide in our worldview started to appear is where I was saying, I think they can probably do that even if we don't have the concepts. And you have to you do have to look to other modalities to start to get a clearer sense of that, I think.

Robert Wright: (19:29) Let me give you kind of my tentative interpretation of maybe what he meant. But let but let's back up a little and talk about the difference, which I gather exists between representing the meaning of the word, which is done via these so called embeddings, and representing a concept. So for starters, I mean, to me, the amazing part about these models begins with the representation of the meaning of words. Because the models kind of invented the system for doing it. Right? And the the and I I want you to comment on that if you disagree. There there there is kind of nuance there because they were broadly speaking, the structure of what they do was given to them, but it didn't. Anyway, they they they let let me just say that a a a problem with the phrase stochastic parrots is that parrots can't paraphrase. All they can do is repeat exactly what you said. And to to even get to the point where you can paraphrase something, which may not seem very impressive, you have to have a way of representing the meaning of words. Right? You have to have this intermediate layer, semantic layer that connects the word car to the word automobile or something. And and so the machines do that by locating words in vector space. And it it it turns and and by the way, there are people who think the human brain does it somewhat like this. I don't think we really know yet. But so, you know, if you imagine just a 2 dimensional graph and say say we're plotting animals, vertical is degree of lethality, horizontal is speed. Okay, so you've got kind of tigers are high on both. You've got kind of tarantulas high on 1, low on 1, rattlesnakes and so on. So they're in different parts of that space. Now these things map words in in tens of thousands of of dimensions or some just massive quantities. And they're not all semantic. I guess some are syntactic, but the point is they get the job done that way. And I think that's the first thing to understand about these machines is that so 1 what you know, 1 phrase of Martin's was, you know, it's just distribution in, distribution out. Let me let me read you a quote of his. All it's doing is learning structure in the text. It has nothing to do with underlying meaning. It's just structure that's in the text. You can understand the distribution of text. You can spit out text, but this doesn't say anything about learning fundamental principles of the world from which the text is based. Now I think he may kinda be right about that. That that's interesting. You know? Because, you know, Searle John Searle in his Chinese room paper, the first thing he said was well, he he was talking about, he wasn't talking about anything like the current models. But he was a philosopher who back in the eighties or something mounted a famous argument that machines can't understand things. And he said that they can handle syntax but not semantics. And he further added that 1 dimension of semantics they can't handle is what philosophers call intentionality, which is a very misleading term. It has nothing to do with subjective experience. In philosophy, this means being about something. So connecting a word to something in the real world. So and the argument I made in this piece is, well, with multimodal AI, you can now do that too. Right? You can you can show the AI an apple in the real world and ask it what it is, and it'll tell you. So I I think Searle's argument is dead, dead, dead. But at the same time, I think maybe Martin is right to say that if you're just talking about large language models, and I should say I'm not even sure that that's what he was talking about here, but I I think he more or less, for practical purposes, was, if you're just talking about the processing of text, he's he's right that although there is 1 aspect of semantic representation they do handle, which I described, still the part about connecting it to the world, which for us is part of understanding something. Right? They're not if you're just doing a text, they're not doing that. Right? Does that make sense?

Nathan Labenz: (24:20) I have no idea, I guess, is a short answer. I mean, I could tell you a lot of details about, you know, tokenization, which is sort of how language is broken down into its bits and the, you know, the large language models typically have a finite vocabulary of distinct tokens that they parse language into as they ingest it. That's usually, like, 50,000 or a 100,000 different tokens. And then it's commonly understood that there's this process of de tokenization where, you know, the a famous example that Neil Nanda, who's at DeepMind and just 1 of the alongside, Chris Ola on the Anthropic team is, you know, 1 of the, titans of the young field of interpretability. He often uses the example of Michael Jordan, where both of those tokens, Michael and Jordan I mean, I'm not even sure if Jordan, to be clear, is, like, a single token or maybe that's 2 tokens Jordan or who knows. Right? I I haven't studied to that level to know exactly what the vocabulary is. But it's clearly multiple tokens, And yet somehow the model does understand that we're not just talking about, you know, 2 random disassociated tokens, but actually knows, like, okay. This is Michael Jordan and then loads in through all these layers, all these associated concepts. So I you know, we can I think we're getting and by the way, and talk about useful in engineering, really good paper out of the Bao Lab, maybe a year or so ago showed that they could even edit information inside a large language model post training, you know, train the whole thing as normal, then go in there and tinker with it in such a way where Michael Jordan could be made per the model to have played a different sport? And they can make those edits in a way where the response that, okay, Michael Jordan played some other sport instead of basketball. You could make that edit in such a way that it's robust to different ways that you answer the question. It's it doesn't apply to other basketball players. You know? So Larry Bird and Magic Johnson and LeBron still all play basketball in this edited model. So there's, you know, there's we're getting decent at, like, even locating this information, figuring out how to edit it. I think I kind of don't know what it means in many of these cases when people say, okay. But that has nothing to do with meaning or has nothing to do with the real world. I mean, to me, it's like it's clear that it has something to do with meaning. It has something to do with the real world. I don't know how you would have such a rich representation of the world without having some relationship to the real world. But But then I think the question is, like, what is the nature of that relationship? And, you know, can we what what predictions can we make based on different theories of that relationship and, you know, how can we test those? And I do think we didn't quite get there in that conversation, so I'm not exactly sure how he thinks about that. I do think that robots are coming. I mean, if I was gonna try to venture a you know, something that seems like it might really move toward a resolution of this debate. If, you know, 1 thing you might say is, okay. Sure. These things can spit out text, and and maybe they can even, you know, take in imagery and and spit out images. And, yeah, maybe they can even take in audio and video now and and respond coherently to all that sort of stuff, but they still can't that's still you know, they they still don't I don't know. Again, I'm I'm not sure what it means to say they have that has nothing to do with the real world. It has nothing to do with meaning. But 1 thing it might mean is they're not gonna be effective. You know? They're not gonna be able to actually move around the world. They're not gonna be able to do things in a functional way. And to that, I would say, give the roboticists, like, 2 more years, and I would expect that we're gonna see quite good robots. Already today, you know, we've got robots that in the in the DeepMind case, and Meta's doing this kind of stuff too. And, you know, OpenAI has a partnership in this, and Tesla's trying to do it with their robot. Already, we have robots that can take verbal instruction, look around at their circumstances, and try to follow your verbal instructions. They're robust enough that they can kind of overcome perturbations. So there's a famous video from Google where the, you know, experimenter says to the robot, like, go pick me up that bag of chips and bring it to me. And the robot, you know, follows the instructions. And then when it gets to the bag of chips, they knock it out of of the robot's hand. And so now the robot, you know, it's just running this loop of like, okay. What's my goal? What am I looking at? You know, what should my next step be? And it's doing that, whatever, a couple times a second. All of a sudden, the bag is out of its hand, and it's like, I guess I better pick the bag up again. So this, you know, this sort of thing repeats where it seems to have, like, a reasonably good robustness even to these sort of Mhmm. You know, perturbations. If you knock the thing over, it's not gonna be able to stand back up. Certainly, it can still be confused. Like, certainly, it'll still drop the glass, you know, more often than you would want it to and spill things. So they're kind of clumsy, but I guess my expectation you know, 1 1 bet that we might make or 1, you know, hypothesis that we we might falsify is that I would expect that robots are gonna get quite good over the next couple of years. And at that point, it starts to become hard to say, okay. Well, what real world are they not connected to that, you know, that you really want them to be connected to?

Robert Wright: (29:46) And and that's kind of that kinda gets to this concept of intentionality. Like, in other words, it's 1 thing to represent the words in semantic space as these models, I would say, do multivectorial or whatever the term is, semantic space. It's another thing, you know, and by the way, I I gathered they do if words are ambiguous, you know, like apple the fruit, apple the company, they're located in different places. And then the the machines use context to determine which 1 of those they wanna hone in on. Right? And and but it's 1 thing to do that. And what Searle would say, if he's emphasizing the, quote, intentionality is, but you have to know an apple when you see 1. Okay? That's a different matter. And a large language model in the narrow sense doesn't do that, but these multi modal models obviously do, and you can well, it sounds like you may may dissent from that. Go ahead. No. I I

Nathan Labenz: (30:41) just think I think that even that claim of you have to know 1 when you see 1, I think that we just have a very sort of trapped sense of, like, what what counts in in many of these instances. So I would my immediately came to mind, you know, as we're doing classic thought experiments, what about a bat? If a bat flies around a room and emits its signal and gets a signal back and can identify an apple from that, like, does that count? I can't do that. Does that mean the bat has some way of knowing that I don't know? I would say, yeah. It does maybe have some way of knowing that I don't know. But I can also taste it. You know? Can the bat taste it? Maybe it can. Maybe it can't. I don't know if it has teeth to bite into an apple. So, you know, I think there's just a lot of different ways of knowing, it seems. Looking around at the animal kingdom, I see just incredible diversity that is probably underappreciated. I do tend to think these and most of these animals are conscious in some sense. I have no idea really if language models are conscious or not. That's a whole other can of worms. But I just think there's a often a very narrow consent well, 1 of my, you know, I amused myself with this tweet if nobody else, but I recently tweeted, it's only reasoning if it comes from the reasoning portion of the human brain. Otherwise, it's just sparkling brute force search. I think you could say something similar about understanding. Like, I understand in some way. It's not a universal understanding. It's not you know, I'm not 1 who believes in some, you know, single overarching god that it would have the platonic understanding. But if there is such a thing, clearly, my understanding is not that. And yet it counts. You know, it's functional. The bat's version is functional for it. The robot's version, if it's functional for it like, I I don't know on what basis we're gonna still try to wall off and say they don't understand if they can do a lot of things. And I don't think they have to do exactly the same things that we can do to be, you know, reasonably understood as understanding. Understanding differently, perhaps understanding very differently. Perhaps as we open the black box, we'll find that, man, this is really weird and not at all how we thought it was gonna work and totally counterintuitive. I think in general, less counterintuitive maybe than, you know, than our our worst fears. Mhmm. But I think then we're confronted with something where it's like, this becomes a different form of understanding. We can maybe develop different vocabulary words for that at that point. You know, we could call it understanding star or we could call it, you know, AI understanding or whatever. But I think, you know, if you imagine this sort of going back to your kind of, you know, many, many dimensional conceptual space, what the robot is doing seems like it's in the, you know, the general, region of this vast conceptual space, you know, nearby to where my version of understanding is. Certainly, you know, there are many other things that are far, far more different from my understanding than the robot understanding. So I think it matters really to pin down those differences. You know, I I don't I don't think should jump to the other. The other mistake would be to say, oh, it's just like us. I don't think we should jump to that conclusion either. But it's just like, you know, we just have a lot of work to do to answer these questions. And, unfortunately, they're probably not gonna be answered with sentences. You know, they're gonna be answered with, like, metrics and a variety of different ways to try to get at the problem, which will still probably ultimately kinda feel incomplete because just like we don't know what it's like to be a bat, we're probably not ever gonna know what it's like, if anything, to be a robot, or at least I don't see any path to that right now. But, again, that doesn't mean it's not understanding in some meaningful way.

Robert Wright: (34:24) Right. But, I mean, as long as you've brought up that phrase, what it's like to be a bat that, of course, famous paper by the philosopher Thomas Nagel about consciousness and among other things, it's just it's a definition of consciousness. It basically says, if it's like anything at all to be this being, then that being can be said to be conscious, have subjective experience, and so on. I want to emphasize, I am intent on defining understanding independent of the question of whether there is consciousness or subjective experience. Because as you said, we just don't know what's going on in these machines. In in the piece I wrote in Nonzero about the Searle thing, I tossed out this this my own conception of understanding. I said understanding means processing information in broadly the way human brains are processing information when humans are having the experience of understanding. It it it doesn't match so it doesn't matter if the computer is having the experience, but if it's processing information broadly the way and by broadly, what I meant was like so humans must have a system for representing the meaning of words. That is 1 element, not the only element of what we call understanding. If computers have such a system, then they have 1 element of understanding. That's the way I would put it. These LLMs clearly have that and impressively, they kind of invented it. You know? I mean, they they came up you know? I don't wanna get into details, and I suppose you could argue about how much how much creativity they exhibited given the constraints on the way they work when they came up with this kind of multi vector you know, they didn't come up with the idea of multi vectorial. That that was that was built into the machine. But but so so I wanna emphasize, I'm not you know? And a lot of people just can't think about understanding without thinking about subjective experience. And I would say, fine, but I can't help you. I just don't it doesn't seem to me to make sense to talk about this question if subjective experience is a prerequisite for understanding as we use the term. So I wanna I wanna get into more into multimodal. But first, I wanna say that there's 1 there's 1 other thing that Martin may have been saying either here or elsewhere in the conversation, which I think is interesting and may have validity, and I'm sure you've thought about this, you know, which is that well, the way I'd put it is, even if you grant that they have the elements of understanding, they're they're training on text created by human beings. They are translating the the accumulated body of human knowledge into into a system of representation that they can work with. And if nothing else, I think he would say since he's, you know, he's interested in emphasizing the limitations on what these things may ever be able to do, you you you could argue that that puts a kinda ceiling on what these things can ever do. Right? When we think of superintelligence, we think of something smarter than a human. But if you're training only on text generated by humans or on these these synthetic texts that's generated by machines emulating humans, you know, you're not you're not raising the ceiling. Right? You're just as smart as a super smart human. Now that would have vast implications since these things are infinitely replicable and could be as smart as the smartest human ever and know everything ever known, that would be kind of important. But it does does raise questions about, you know, whether suddenly they'll say, oh, I've resolved the wave particle paradox in quantum physics. Right? Like so what do you I'm sure you've thought I'm sure you've you've thought about this.

Nathan Labenz: (38:16) Yeah. I have a lot of thoughts on that. I think you're right, first of all, to emphasize that merely achieving high end human performance is enough to be transformative to, you know, work, life, play, society, future.

Robert Wright: (38:35) Right.

Nathan Labenz: (38:36) I'm often reminded of the the old I don't know if it's a joke or a cartoon where 2 guys are running from a bear in the woods. 1 stops to put on his running shoes, and the other guy's like, what are you doing? We gotta outrun this bear. And he says, I don't have to outrun the bear. I just have to outrun you. And so I do think we're not always thinking super clearly when we think if it only is as good as the best humans, like, that's no big deal or so. I I don't know that people are really saying that, but there does seem to be this sort of notion that, oh, if it levels off at at the top of the human distribution, then that's not some such a big thing to worry about. I do not share that intuition. It does seem like that is a plenty big deal. And it brings all the questions of, like, alignment and control and whatever. All those things seem, you know, very central in a world where we have human level AI that's, you know, that's generally capable. I also think that, you know, there this is debated, but I do think if you could match humans on all the things that humans are good at, it seems quite likely to me that you would find that the things that machines are already way better at would give them major advantages over us in ways that are gonna be very hard for us to figure out how to counteract. Right? And here, I'm just talking about runtime. You know? They can already generate text faster than we can read it. They can read probably depending on exactly which model, whatever they can read, potentially orders of magnitude faster than we can read. They have kind of brittle memories right now. I think our integrated memory is definitely still 1 comparative strength of human cognition relative to machine cognition. But they also have, like, vast hard drives, you know, that are sort of stable and can be returned to. And if they can figure out how to use those well, you know, that's something that we're gonna have a hard time matching. Elon Musk has said repeatedly that his goal with Neuralink is to, in the short term, yes, treat people who have, suffered catastrophic injuries and restore quality of life. But bigger picture, it's to increase the bandwidth with which humans can communicate with machines. Because right now, we're limited to typing into them or or talking. And presumably, there's a lot more going on almost for sure. There's a lot more going on in our brains. You know, much like there is in inside the language models, there is a richness of the representations that is happening internally that can't be communicated entirely through words. Right? And I I think this we can sort of see by just introspection that we can kind of feel like I had a feeling I couldn't quite articulate it or I came close. Maybe I was even happy with my articulation, but still something lossy happened there in the reduction of of, you know, the full richness of thought down to the the few syllables that were uttered. So he hopes that we can, you know, maybe keep up with machines, or I think he's used the phrase go along for the ride by creating some mechanism literally implanted into our brains that would, you know, connect directly into that richness of of not necessarily experience because we're not even fully conscious of that all the time, but would at least connect into that processing and enable higher, bandwidth communication. So I think that's really important. You know? I don't know, and I think there's definitely real reason to be unsure as to you know, do these things sort of level off ish at a kind of high end human level, or do they go way beyond? You know, the case for leveling off is basically like what you said. They're they're learning from human data. How are they gonna get better than the data they're learning from? I think the case that they go way beyond is that they can also learn from things that are beyond human data. You know? And we see this in a lot of narrow systems today. AlphaGo famously is 1 where they had the the AI play itself in Go and gradually get better, through just a a process of self play to the point where it became superhuman and created these moves that blew people's minds. And so now we're now you're kind of chipping away at this notion. Okay. Well, maybe it can be superhuman at that, but okay. But that's just a game. Okay. Fine. You know, now we can go to protein folding, and we can say, you know, humans do not have to my knowledge, no human has ever demonstrated the ability to look at the genetic code or for that matter, the sequence of proteins that that genetic code corresponds to and be able to guess what the in vivo three-dimensional shape of the protein is gonna be. We have done a ton of work to figure that out for a very small set of proteins. I mean, tens of thousands, but, like, small in the grand scheme of hundreds of millions of proteins out there in nature. And with that dataset, still no human was able to kind of just look at that for a while and develop an intuition for it, But we now have AIs that can do that. And so, again, we're like, oh, jeez, that's, you know, pretty impressive. That's definitely way superhuman. You know, no human is doing this, but the AI can. Okay. But, well, that's still pretty narrow. Well, what if we go to predicting the weather? You know, we have humans that go to school and become meteorologists and, you know, use all these instruments and try to figure out how to predict the weather. And, again, we now have an AI that is the best at predicting the weather. It takes in all these sensor, you know, measurements from around the world and puts out a state of the art, you know, best available 10 day forecast. Okay. Well, you know, like, how many of these things do we need to go on? There's another 1 too with solution data where we do have the ability with simulation to proceed just like at these I think it's like femtosecond scale where people simulate liquids at the atomic level. And it's super compute intensive, but we can at least run that in a computer simulation. Humans can't do it in our own brains, but we can we can code computers to do it. But it's super slow. I had a recent episode with a a guy who had trained a neural network to basically do that, solve the wave equation. You mentioned the wave equations that brought that to mind. Solve the wave equation for, like, a bunch of particles all kind of, you know, in random configurations. And, again, to do that orders of magnitude faster than the pure simulation. This was 1 important note, I think, in the Martin conversation where he was sort of saying, if you really wanna predict the world, you have to simulate the world. There's no way to make these predictions outside of actually running the full numbers. I do think we're starting to see real instances of that being shown to just not be true. Like, the old way of trying protein folding was super computationally intensive, and it was like, we're gonna try to calculate all the forces that are interacting and do that stepwise, stepwise, stepwise until we reach the actual folded state. And same thing with these solution simulations. The AIs don't work that way. They take in an initial configuration. They process it, but they use way less compute than you would use to do every single step, and they still get out a similarly good output. So they are learning some sort of structure that they're able to use to take, like, major shortcuts through, you know, what otherwise would be brute force computation to get to a right answer. And so, I mean, how many of these things do we need to do before we have something that we say, jeez, that starts to really look superhuman? And and you could also ask how integrated they need to be. AlphaGo is standalone. It can't also, write you a computer program or, you know, write compose you a poem or whatever. And the same you know, the weather thing is narrow, and and, you know, these many of these things basically

Robert Wright: (46:23) hard to come up with an

Nathan Labenz: (46:25) And if

Robert Wright: (46:25) they are integrated you to the yeah. Yeah. Exactly.

Nathan Labenz: (46:29) The the the If they're integrated, I think all bets are off.

Robert Wright: (46:33) Right. And and I don't know. Well, I guess there's 2 kinds of integration. I'm kind of fuzzily thinking. This is a good example of what you said about you've got this idea, but haven't, you know, put it into words. And it's interesting the irony that as you as you said, in some sense, there's a feeling that boiling things down to language oversimplifies them. At the same time, being forced to do that can clarify your thinking. And, you know, it's a it's a weird it's a weird thing. But so let's talk about this related issue of multimodal, and let's just take the text to video stuff. So Sora, OpenAI, I don't know how many months ago, blew everyone away. Have they have they actually released Sora? Like, can I go give Sora a prompt and and get a video, or is it still vaporware?

Nathan Labenz: (47:22) No. Not yet. It is released to a small number of creators who have Uh-huh. Done some stuff with it. So I think they've Right. Demonstrated that it's, you know, it is not vaporware, though, it's it's in still very limited release. It's said to be still quite slow. I think 1 of the main reasons they maybe haven't released it is that it it is expensive to run. And they are maybe not keen to devote as much compute as they would need to to serving that

Robert Wright: (47:51) place. Is unfair. I mean, that that's a term people use for software. It hasn't even been demonstrated, I think. But, so, anyway, the question arose, what kinds of internal represent if people haven't seen this, you know, you can say, you know, show me a golden retriever playing Frisbee with a frog or something. Whatever you say, it seems to be able to do, a not bad rendition of it. You know, there are glitches, weird weird weird kind of magical realism happens in little corners of it, but but it's pretty impressive. And the question is, I guess, still debated. What's going on inside these things? Are they building a model of the way the physical world works? Right? In other words, we don't think about it, but we as we navigate the physical and visual world, we must have certain intuitions about the fact that if you let go of something, it's probably gonna fall and not rise. Things keep if you put something in motion, it goes for a while. Other, you know, but but in general, lots of things just stay in the same place. And as you move through the world, they kind of need to stay in the same place with respect to 1 another even though they're changing. You know, there's a lot of things we don't think about, but we must these assumptions about the way the world works must be kinda built into our system. And I don't know what else you'd say about I mean, yeah, I'm not sure I really have the whole picture. To you, what does it mean to ask whether Sora is building internally a model of how the world a kind of intuitive physics or whatever people call it?

Nathan Labenz: (49:43) Yeah. But I like that phrase intuitive physics.

Nathan Labenz: (49:51) I think AI is probably gonna have ultimate this is something I've been thinking about just the last couple days. My guess is that we're gonna see AI that has way better intuitive physics than we do, and that will be yet another of these sort of before and after moments where it was like, wow. We used to think they couldn't do this, and now it turns out not only can they do it, could do it really well. You know, we are obviously evolved over a long very long period of time and many, many generations to make our way in the world. And so there's been, you know, incredible pressure on us to figure out, and I don't mean as individuals, but, you know, through the long So, like evolutionary history, like, what is that? You know? Do I need to be scared of it? Can I eat it? Do I need to run from it? And so we have these, like, you know, intuitive physics modules that are not real physics. Right? They're not like they're they're I guess they're real and that they're at some level of, you know, simplification of real physics, they are correct. But you can certainly go deeper. Right? Of course, we're we tend to think of the world as, like, three-dimensional space. We know from general relativity that that's not quite true. But, you know, in our local environment, it works for us. And, you know, we're we're quite good at at, you know, identifying the tiger. Right? Like, we we have these, certain things that we're, like, very specialized in. The AI has not been subjected to that kind of pressure. It's really just been subjected to predicting the next, you know, token in the case of language models or the next frame in the case of the video and trying to be as accurate as possible with that. And so it's not a surprise that it would have different strengths and weaknesses from us in terms of what it can do. But it does seem very likely that there is some form of intuitive physics developing there. The 1 example that they showed of a pirate ship in a teacup or whatever, where, you know, the this little ship is, like, sloshing around in a cup and the waves of the coffee are, you know, tilting the ship up and down, and it's rocking and rolling on the waves, and the waves are bouncing off the sides and coming back. You know, is that, what is that? It seems like it's probably some sort of intuitive physics. I mean, it it certainly obeys like, it passes or, you know, you have to look really hard to find ways in which it doesn't pass the sort of that looks real test. You know, it's clearly not simulating every particle of the liquid. You know, I think this is where we see at least some divergence from the idea that, you know, to simulate nature, you have to run every every computational step. It does seem like it's got some sort of shortcuts that are happening in there. And we just don't really know what they are yet, for multiple reasons. 1 is that OpenAI hasn't released it, so no you know, nobody outside of the small number of people can even use it, let alone dig into its internals and try to open the black box. I'm sure they're doing that internally and trying to figure out what sort of intuitive physics does it have, but the rest of us are left in the dark for now at least. And But I would expect I mean, just to cash it out, know, what do I think is happening? I think it has some intuitive physics. I think it's different from ours, reflecting the fact that it was created under different pressures. And in some ways, it is better. I can't draw. I can't visualize those waves with that level of accuracy. It is better than I am at simulating what a little pirate ship in a teacup would look like, you know, if if all of a sudden it's riding these little waves. So in some ways, it's better. In some ways, it's worse. You know, it probably is not I wouldn't wanna delegate recognizing a tiger to it, but I'm highly optimized for that, as compared to it. So, you know, we're just we're different. I again, I think we kind of are probably I don't I don't love analogies, but I sometimes think of this I have a a sort of visualization of of some abstract space or shape that is reality and then a model of that almost being, like, shrink wrapping some rough initial fuzzy understanding, you know, pulling the vacuum tighter and tighter and shrink wrapping your model or your your, you know, your wrapper around reality down to that core underlying shape. And I think we're just coming at that from very different directions. You know, The AI and and humans are we've both done that in in some way, and I I apologize for using an analogy, but we're just doing it in very different ways. It's coming at a very different starting positions.

Robert Wright: (54:35) Yeah. So but but it does seem to be a fair surprise that, you know, just as with the large language models, if you just fed it text and trained it to get good at predicting next token and and then also to actually carry on conversations by using that same technique, You know, just as it turned out that it had to in order to accomplish that, it had to build a system of semantic representation, a system of representing the meaning of words that at least in some broad sense must be comparable to the way we do it, even if ours isn't this multidimensional thing. Broadly speaking, that function had to be carried out. The 1 in our brains carry out had to be carried out. It seems to me it's a fair surmise that for this thing to turn text prompts into video, it it it must be performing and, you know, there must be some functionality in there that at some broad level is comparable to some functionality we have. Right? I mean, does that make sense?

Nathan Labenz: (55:42) Yeah. I mean, I think you can look at this with, you know, different levels of abstraction on a functional level. That's almost like definitionally true. Right? If it can it can perform the same functions.

Robert Wright: (55:54) I mean, the I mean, functionally comparable to things about our own mind that we don't really totally understand. I don't mean just functional in the sense that it can perform, know, mean like our system of representing the meaning words has a function that is, you know, beyond the the strict function we expect the machine to perform when we train it to predict the next token, and the machine had to build that function, the the the the semantic representation. I'm suggesting it must be something comparable in Sora. You know?

Nathan Labenz: (56:29) Yeah. I think there's I mean, I think we really don't know. I we might soon be in a weird position where we might understand how the AIs work better than we understand how we work.

Robert Wright: (56:38) Mhmm.

Nathan Labenz: (56:39) 1 there's multiple reasons for that. 1 is that we can perform experiments much, much easier on the AIs than we can perform them on ourselves. At least for now, we don't have any I'm not saying we should, although maybe at some point we should. But for now, it doesn't seem like we need to worry too much about the ethics of experimenting on AIs, whereas, obviously, we do worry about that for ourselves. So we can you know, the the term in the literature is ablate, and that just basically means delete or sort of 0 out or, like, put noise into a particular position. So a lot of what people are doing when they're doing these experiments to try to figure out what's going on inside an AI is, like, delete a part or make this part all zeros or make this part, you know, random and just kinda see what happens. Obviously, we can't do that to ourselves in the same way. Another reason is that the architectures are way simpler on the AI side than they are for us. I think that's another thing that, by the way, should give us real, humility about what might come next because, you know, I described the 3 parts the 3 core parts of the the core layer of the transformer, and those are, you know, stacked on top of each other. And you can basically spend an afternoon and get a pretty good understanding of what the structure of a transformer is. The brain is way, way more complicated than that. Has way more different parts, way more different subunits, has, like, you know, cyclical feedback systems, you know, things that sort of crosstalk in all kinds of weird ways. And so, you know, that's where I'm like, jeez. I don't know. I think there are there are some things, it seems pretty clear, that we have these, you know, probably somewhat analogous conceptual representations. You know, when when I think of, Apple and I think of the company versus the fruit, you know, it is the same token. And that's true for the language model too. You know, the same token of Apple Mhmm. Can be, you know, input or output. But it too has some sort of distinct representation of the company versus the fruit. And depending on context and the, know, in a in a similar function at least a functionally similar way to how we can sort of call to mind all the associations that go with 1 or all the associations that go with the other. The AIs can do that too. So there's I would, you know, be pretty confident in saying there's some analogous, types of things going on, but the structure is totally different. So exactly how similar those things are Mhmm. At a, you know, at a mechanistic level is really hard to say. You know, it might be the case and people know a lot more about neuroscience than I do. I actually don't know that much about neuroscience. But, you know, from a naive perspective, you know, it's almost for sure not the case that all the calculation that is happening in my brain is getting bottlenecked through this activation, you know, which is that what we talked about earlier, the the the sort of, array of numbers that sits between you know, that that is the intermediate result between these layers of computation in the in the transformer. I don't think I have anything like that. I don't think there's ever any any single thing where you could go and say, okay. These, however many neurons, this represents a clean intermediate state that is the result of all calculation that's happened so far and where we could pick up from this, you know, this particular, you know, finite representation and and carry on from there, you know, with only that input needed. I don't think we have any would be very surprised if we have anything like that. So, you know, who I I I it becomes very fuzzy to me, and I just sort of expect that just in the same way that we know, just there's gonna be a ton of detail and a ton of nuance and a ton of idiosyncrasy on both sides, but it probably plays out very differently.

Robert Wright: (60:29) Yeah. Can I ask you to talk little about the distinction between representing the meaning of words and representing concepts? Because when you locate the word apple, the fruit in semantic space, you are in a sense representing a concept. Right? I mean, you are you are assigning it to all the all these properties, roundness, you know, sweet taste and so on that are probably represented by different dimensions. And that's kind of a representation of a concept, but the work they did at Anthropic and other people have done, Dan Hendrix, think did some work on this earlier and representing concepts, that can be different. And I guess I don't totally understand know, when they talk about a pattern of activation or whatever. I mean, I can understand locating something in multidimensional space where the dimensions represent aspects of, you know, represent properties. But when they start talking about these patterns of activation, I it's less clear to me what's going on and also less clear, I guess, when that would become relevant. But I don't know. Does that have you thought about this, or does it even make sense?

Nathan Labenz: (61:48) I mean, I guess maybe for 1 for 1 thing, sort of some intuition for what it means to be representing a concept in this high dimensional space. So say you have 8,000 numbers that sit between the the layers of computation. Right? These are these these intermediate results. And this is often not not exclusively, but this is the current study of the sparse autoencoders is what can we tease out and identify from these intermediate calculation results? There's a I think there's a famous kind of if it's a cartoon again or just a joke, but, you know, how do physicists, visualize high dimensional space? The answer is visualize three-dimensional space and then say n really hard. And then, you know, the the joke is like, don't worry. Everybody does it. So, you know, I think I'm put myself in hopefully good company there where I would say my visualization of the space in which concepts are located is you know, because I'm a three-dimensional creature, it's still pretty three-dimensional. I sort of just envision space as I know it and just different regions of space, clusters of space where similar concepts are grouped. And then you just have to kind of remind yourself that actually instead of 3 dimensions, which is easy enough to visualize for most people, many people, that it's 8,000 dimensions or it's 16,000 dimensions. And, obviously, at that point, whether it's 8 or 16,000, you know, it doesn't change my, visualization all that much. At some point, it just does become sort of a parameter of the system. So, you know, the that that space is obviously vast. It's weird. It, you know, it has sort of weird geography or weird, geometry maybe is a better word to use. And, you know, things can be close to each other in some dimensions and far from each other in other dimensions. And, you know, I don't know. That's all that's kinda weird. That's what it is. That there I'm not sure that there's too much more to say than that other than what's really interesting about this this sparse autoencoder work is that it's essentially a transformation of this geometry where we have we have some, you know, again, 8,000 dimensional space, and we've got all these points which correspond to all these concepts grouped through that space. What they're essentially doing with the sparse autoencoder is saying, what if I tried to pull that out into 1 long line and put every single concept, every single point in that high dimensional space. What if I just tried to make each of those a point on a line? Could I make a transformation like that? So now instead of representing each concept by 8,000 numbers that located on 8,000 dimensions, can I just identify that same concept with 1 number that just identifies its place on this long line of concepts? And the answer is, you know, it wasn't obvious that that would be doable, but they have done that, and that's how the Golden Gate Cloud thing works. So, you know, it's could we do something similar in our own brains? My guess is yes. But it would instead of having an 8,000 dimensional space, we probably have some really weird thing that is you know, again, because we have different regions of the brain and they're cross talking with each other and there's, you know, homeostasis is going on, and, you know, signals are coming from the rest of the body. So it's, like, pretty clearly a lot more complicated on the brain side. But fullness of time and, you know, better measurement techniques and ability to read what's going on inside, can we take that and do a similar transformation? Probably. We actually do see some examples of that. There's some some interesting, brain reading work. I did, I've done 2 episodes on it actually on 2 papers called mind eye, mind eye and mind eye 2, where the data the the input dataset is the fMRI read of a patient at the time that they looked at an image. And the challenge for the AI is given that data of the fMRI recorded as they were looking at that image, can you recreate the image? Can you convert brain state back into what was seen that induced that brain state in the person? In the original mind eye 1, they did it on a patient by patient basis. In mind eye 2, they figured out a way to pool all that data and create a single model shared across patients, which would then be fine tuned with a little incremental data to an individual patient. So at this point, with 1 hour in an fMRI machine looking at images and having your brain state captured, you know, they show you just literally just random, like, stock photography kind of images. Like, you know, here's a bear in Central Park. You know? Here's a, here's, you know, kids playing baseball. You know? Here's a still life of a fruit on a table. Whatever. Right? Just 1 x 1. They show you these things every few seconds. 1 hour of doing that in the fMRI machine, and they can create a model that can take a a given brain state and translate that back into something very close to the image that you saw, and then they show these side by sides of the actual image and what we were able to recreate. We they were able to recreate from the brain state. So, you know, that's not so conceptually clean. Right? But that is definitely something that is highly suggestive that, you know, there's you know, we we can make real traction on prying open our own, you know, black box brains in a in a similar way that we can make progress on the black box AIs. But again, I would just emphasize that, you know, those are quite different. And, like, that Mhmm. I don't think you could port that brain technique to the AIs anymore than you could port the sparse autoencoder from the AIs to the humans, because they're just fundamentally different structures, different substrates.

Robert Wright: (68:03) But it it is another example of how AI performance exceeds human performance. I mean, there's no human who can look at a bunch of brain scans and, like, figure out and start predicting what's gonna happen in the brain when you show them an image. Right? I mean and so I think you can Yeah.

Nathan Labenz: (68:21) It would be interesting to see. I mean, it's I wonder. You know? I I don't Enough humans. I wouldn't rule it out if, you know, it's a dream Olympic sport and people were competing for it. You know? There

Robert Wright: (68:32) you know, it's a decent performance. Thousands and, I don't know, or or somebody with an excellent memory. The the, But this is all related. I mean, I think the main point is we can expect AI to accelerate scientific progress, you know, technological progress. It's writing code, helping people write code, and change just a lot of things about the world. I think pretty fast. I mean, think you agree that just in a lot of realms of life, it's gonna be disruptive in neither the good or the bad sense, and maybe both. So, you know, for example, even if people displaced in certain jobs are able to find new work, you know, that's disruptive. It's it's and and even if we kind of figure out how to guide teenagers to AI friends that are constructive and helpful, that'll take a while and there will be casualties. And there's just gonna I just think things are gonna be moving pretty fast. And that's why my view is, you know, it's not easy to just say, hey. Could we slow down technological progress? But I but I do feel that when you start asking, you know, when people in AI start saying, we've gotta move heaven and earth to accelerate progress in AI. I'm like, I'm not sure I wanna do that. You know? It's like it's like, you know, we need to rethink the power grid because otherwise we'll slow progress by 5%. I'm I'm like, that may be a feature, not a bug. Anyway, the context of this is your conversation with Martin Casado was kind of the payoff of the deeper questions you're addressing about what these things can do, how fast it's gonna move, was to be able to think more clearly about AI risk. I know you said to me that, you listened to a conversation I had with Steve Pinker on my podcast not that long ago, a few months ago, where he was appreciably less concerned about low probability, but existential or quasi existential risk than you are. So what's your what's your take on that? Of what I mean, feel free to elaborate on what the connection is between the questions we've been addressing and the question of risk.

Nathan Labenz: (71:11) Yeah. I mean, I think the connection probably is pretty clear to most people. It's it's 1 of these things where the man on the street may have a better intuition, you know, or the or the sort of classic meme of, like, you know, the the low, mid, and high intelligence person. You know, I sort of don't encounter many people. I was just at a, you know, family picnic the other day, and people who don't spend a lot of time with AI don't tend to have a hard time with the intuition that, shit, if these things get smarter than us or even just similarly smart to our smartest people, like, that sounds like a really big deal. Like, most people go there first. Right? And then it seems like it's a process of kind of rationalizing away from that or talking ourselves out of it. And I don't think we should do that, especially because the stakes are, you know, potentially quite high. And the you know, I think the burden of proof is is squarely on those that say that we don't have anything to worry about. And I just don't I don't think they've made that case. So to me, it's, like, pretty clear that this could be a really huge deal. There could be something to worry about. I don't see any law of nature that says that we can't destroy ourselves with AI. Just like I don't see any law of nature that says we can't destroy ourselves with nuclear weapons or with a pandemic or with climate change. In some ways, think all these things the thing that kind of confused me about Steven Pinker's analysis was he said, oh, well, I think that's all very speculative. You know, the the this being sort of, you know, AI going rogue or, you know, doing leading to some catastrophic outcome. And, you know, we should be a little bit more focused on more realistic things like climate change or nuclear war. And, you know, to that, I guess, I would say, I think all these big risks are really kind of hard to model tail risks. They're all, like, really about the unprecedented or the, you know, the possibility of something truly unprecedented. I think if we take climate change and we say, you know, what's the real concern? To me, it's maybe other people would disagree with this. But if we could say, okay. We're gonna have 2 degrees warming, or we're gonna have 3 degrees warming, and we know that, or we know that that's what will happen, you know, if we don't take some heroic efforts. And I think most people would probably say, well, probably have to adapt. You know, maybe we could take some efforts, but we're probably also gonna have to, you know, figure it out. Maybe we'll build seawalls in some places, and, you know, some current beachfront property probably isn't gonna survive. And, you know, this will be a bit of a real problem, But we got a lot of real problems, and this is 1 that we can sort of, you know, manage alongside a lot of other ones and sort of balance trade offs. The most motivating thing to me about climate change is, like, what happens if the methane deposits in the bottom of the ocean or the permafrost, you know, start to shake loose. And all of a sudden, we've got a, you know, huge amount of methane in the atmosphere. And then, you know, I don't know. I think, basically, nobody knows. Right? I mean, there's models of this sort of stuff, but they're models. They're arguments. They're definitely not based on, you know, any super, super solid science. Give them credit, you know, for very good sincere efforts to model these future states, but we really just don't know. So we could end up in a situation where we shift to some dramatically different equilibrium and the habit the planet is not habitable by you know, for people anymore. Most people think that's a pretty remote risk in the context of climate change. But if anything, I would tend to think just a little bit more, you know, with a little more humility or a little more uncertainty that I really don't know. Like, what would it take for the methane deposits to the bottom of the ocean to shake loose, and what would that mean? I don't know. Same thing with nuclear war. Right? We in the case of nuclear war, we do have this very clear chain of events that's well understood. Like, we basically need 1 decision to cascade through a chain of command to launch a huge nuclear war, and then all bets are off. We have really no idea. Like, is nuclear winter a thing, or is it not a not a thing? As far as I understand, that question is not well answered. Again, there's no historical precedent for nuclear winter, so there's nothing we can really say this has or hasn't happened before. We know it would be really bad. Would it lead to human extinction? Would it lead to, you know, mass starvation? You know, would we be able to recover industry? You know, would we, I recently read a book called the knowledge, which argued that historical like, museums of industry are, like, the most important places in the world, because that's how we can sort of go back and try to rebuild our tech tree if we do end up in a situation like this. But we just don't know. Right? How bad is nuclear war really? It's, like, clearly quite bad. But, you know, does it take us, like, over some edge, you know, to some state that we can never recover from? Hard to say. And I think basically the same thing I mean, pandemics too. Right? I mean, again, we we know that pandemics can exist. We know that there's a long history of lab leaks. We know that, you know, pathogens can be really bad, or they can be only so bad. We don't really know, like, what would happen if, you know, the worst thing that has been that has been created were to leak from a lab. How would we respond? We didn't exactly respond super well to the last 1. So now we're bringing in AI, and it's like, okay. It's clearly gonna be disruptive. I think if you told me that the very worst things are not gonna happen I think across all these things, if you said the very worst things are not gonna happen, I can guarantee you that, then we could sort of live with these, like, intermediate risks and and kind of muddle through or, you know, expect to manage them as they come. And I would say that's probably true for, you know, teens and their AI friends. Like, you know, arguably, that's maybe something that families can handle, and, you know, parents have been dealing with new temptations for teens for a long time. And maybe this is, maybe it isn't, you know, a total game changer. You know, all sorts of weird outcomes might happen, But we also get our AI doctors, you know, and I I don't wanna neglect, you know, the upside. Right? So to I I'm well on record, you know, getting excited about AI doctors and and similar things in many other contexts. So if you could take that extreme risk off the table, then I'd probably say, well, let's just muddle through like we do with everything else. The upside seems like it'll outweigh the downside, and, you know, we can make rules about certain problems as they pop up. But we really are in this, like, deeply uncertain position where, you know, we look at history and we look at nature, and I just see that nature is, like, blindly brutal. You know? It it does not care. Things go extinct all the time, and there's no heroic narrative guarantee that, like, we win in the end. So as long as that's true and as long as we don't have a good account of what is possible, then to me, that leaves us in a position where, you know, the x risk statement, that Dan Hendrix and and others organized earlier, I guess that was last year now, you know, that makes sense to me. I think AI risk should be on par with these other nuclear pandemic, you know, potentially climate, risks. And what exactly does that mean? Like, you know, I think that's kind of where I my real high level of confidence kinda stops. Nobody has convinced me that we have nothing to worry about. Therefore, I think we should definitely take it really seriously. I would love it if we could all get to an agreement on that. And then we could, you know, open up an entirely new dimension of the conversation on, like, well, what actually should we do about it? And there I frankly do become a lot less confident. Mhmm. And maybe it's not time to do anything yet. You know, I'm not yet advocating for a pause. I could see myself starting to advocate for a pause, but I'm not quite yet to the point where I

Robert Wright: (79:17) Okay. But I I I thought I you now. I thought I heard you describe yourself as a, what, a hyperscale paucer. Maybe you've abandoned this identity, but a hyperscale paucer and a what was the rest of it? A a short term acceleration or something? Adoption accelerator.

Nathan Labenz: (79:33) Exactly. Adoption accelerationist is the person.

Robert Wright: (79:36) So that would mean you I guess that you do think the time is approaching where maybe we should think seriously about whether we should take some time off before training a whole new generation of models. I mean, assuming you could get international cooperation on that since obviously if we think the Chinese are accelerating and they think we're accelerating, then the the politics in the 2 countries probably won't permit such a thing. But anyway, the idea would be that that's what you mean by a hyper scale poser. Even as you're saying, sure, let's explore all the capabilities presented by the current generation, which are extensive. I mean, and it seems to me you could spend the next few years developing new products and services based on what we've got, and they would collectively be, I think, pretty well, maybe transformative is too strong a word. We'd we'd feel the impact.

Nathan Labenz: (80:37) Yeah. I think transformative is pretty well baked in at this point. It's it's hard for me to imagine that it's not. I think the question is just how how much effort is that gonna take? How long is that gonna take? We already see from Google an AI doctor type system. They don't quite call it an AI doctor, but I do, where you can have a chat natural language with their AI system. And then they they ran, like, a controlled experiment where they compared the system to the human doctor and found that the AI doctor was significantly more accurate in diagnosis through this natural language interface than the human doctor. So, you know, that took a lot of work. They've got a crack team. They're doing that. You know, they're they're working really hard to define the metrics and to perform all these studies. They still don't feel like it's ready to release, but they're well on their way, I think, to something there that is gonna be transformative. So, yeah, I'd say my position might right now might be evolving. I haven't quite landed on a new label for it. But the adoption acceleration hyperscale poser is basically, let's get that AI doctor working and get the benefits from that. But let's maybe be a little more cautious about, you know, the trillion dollar data center or the trillion dollar training run because we just don't know what is gonna pop out of that. And importantly, we don't have a any, you know, robust strategy to make sure things stay under control. I think there's still quite a fine grained question there of exactly when should we pause. And since I originally labeled myself with that term, I think the all the leading developers have put in place, you know, responsible scaling policy is what Anthropic calls it. Preparedness framework, I think, is what OpenAI calls it. DeepMind has a a a new thing as well. And they now at least have a strategy for in training testing as they go to look for the, you know, signs of the most dangerous behaviors. 1 that is top of mind for me is the ability to break itself off of its server and go try to figure out how to survive in the wild. You know, that's we're at the point now where we should be and indeed, like, at least Anthropic is actively testing for that as they are developing their systems. And so I kinda feel like, alright. We've made pretty good progress. Most have a, you know, much better understanding of internals than we did Mhmm. You know, however many months ago when I said, I mean, these things are moving pretty fast. So now we can also look at things like what are all the concepts that are happening inside. You know, can they've identified we talk about Golden Gate Bridge because they made a little toy demo of that, but they've also got concepts along the lines of deception. And now they can monitor for deception. Now that monitoring isn't perfect, but, you know, neither are the AIs super powerful. Right? So if we kind of imagine a GPT 5, like, whatever exactly that turns out to mean, I don't think that immediate next level is going to be, like, totally overwhelming. It seems unlikely. So when you combine that with the improved testing and the improved understanding and the improved ability to monitor that that brings online, I am not really worried about a GPT 5 class model. But I also think that, you know, at that point, then we'll reevaluate again and say, okay. How much of a leap was this really? You know, do do do these measures that we have now feel like they're up to the the task of another 10 x? That's kinda what the responsible scaling policies are meant to do, and they they have promised to continue to revise those policies as they go through the the subsequent, you know, levels of scale and generations of model. So for now, I basically feel like pretty good foundation has been laid for GPT 5, and it's happening whether I like it or not. And, you know, maybe I I I describe GPT 4 as being in the sweet spot where they're powerful enough to be useful, but they're not so powerful that we really have to worry too much. And I think GPT-five will probably be there as well. And if it's not, I think they probably have a decent enough or, you know, it seems like they're they're making a very good effort to try to catch any ex you know, any, deviation from that world model. And so I think we're probably fine for now. But at some point, you know, I I could imagine coming to a different judgment where I'd say, you know, well, GPT-five is this powerful, and you don't have, you know, a good enough setup in place to to go that next step. Don't what we'll have to The

Robert Wright: (85:19) latest open source model, released by Meta is just about as powerful as the 3 you just referred to. So, like, they can have their little rules and safety guidelines fine. But if if the front if you've got an open source frontier model that's as powerful as the rest of them, You know, I mean, remember the all of this alignment research has its pros and cons like the work Anthropic is doing to illuminate how these things work and how you can adjust them and how you can make them less deceptive could be used by a bad actor to make them more deceptive or more anything else. And it's easier to do with open source. Once you've got the weights and I'm by open source, mean open releasing the weights. And so and, you know, I don't totally there's a couple of things I don't understand. 1 thing I don't understand is why you know, in general, I'm very opposed to China phobia. But you think if it has 1 con you think if it has 1 constructive use, it would be people saying, wait a second. Let's be careful about the open source models we put out there because the Chinese are just gonna copy them. What's the point of this chip advanced chip restriction if if after we use the most advanced chips to train these models, you just hand the thing to the Chinese, which is what's happening now. And nobody is really talking about that, and there must be something I don't. I mean, I saw a piece in the New York Times by Kate Metzen, somebody trying to argue that actually we need open source to compete with China. I didn't understand that argument at all. I I don't fully understand the situation. But, know, speaking of Andreessen Horowitz, what Andreessen himself, I've heard him say is just this kind of me sheerly rhetorical line about how well, openness is the American way. Well, that's no way to start a serious analysis in this problem. Come on. And and so I I'm curious, are you concerned about open source? Are you surprised that there is so little meaningful concern about it being expressed. In other words, concern that that finds a voice in in politics at all. For a while, there seemed to be like, senator Blumenthal, a year and a half ago, was talking about this. I don't think I don't think much of anybody's talking about it now.

Nathan Labenz: (87:53) Yeah. I think I'm about to talk myself into a circle, because I think there's just a lot of different considerations on this topic, and it's really hard to get to a clean analysis. But I guess I would start by saying, even the AI safety community has pretty much come around to the idea that the open sourcing of the LAMA models has been good. They feel like, yes, in the end, they're not so powerful that we need to worry about this generation. Almost everybody in the AI safety community is still, you know, reserving judgment on open sourcing of future generation models. But there you know, there's obviously been a lot of utility and people built all sorts of things. But also the AI safety community outside of the leading labs has had this problem where if you don't work at OpenAI, DeepMind, Anthropic, how do you get access to a frontier model to do research into how they work? This at least answers that question. So there's a whole, you know, academic community, open source, you know, AI safety community that can now dig in on something that is, you know, very much at the frontier or very, very close to the frontier, and try to make sense of it. So most of the AI safety community, I would say, has come around to agree that at least so far, the open sourcing has been good. Now that did happen in a context of first similar models had been online, you know, in a limited release and then to a broad public via API. And then after a certain amount of time where we, you know, prompt engineered the hell out of, you know, these systems and kind of got a collective general sense of where we are, then came the open weights. I think most people would probably like to rerun that same playbook for future generations. And I even sort of suspect that Meta will come around to that if indeed they are leading. So my guess is they probably won't be leading for at least the next generation, like, GPT 5. You know, they just finished 04/2005 and and released it pretty quickly after it was completed. They did not feel the need, you know, to have some structured the the term of art in the AI safety community for this is structured access, which just basically means API access, but not just handing everything out. So they did not feel the need to do a long period of that, but they could point to GPT-four and to Claude and to Gemini and say, well, they did that, so we don't really need to. That's probably gonna happen again because GPT 5 and Gemini, you know, 2 and Claude 3.5 opus at a minimum are all coming soon. Those will presumably be stronger yet than even the meta, you know, the llama 4 0 5. And so we'll have this period where we'll have this kind of structured access thing again, and people will be able to prompt it and jailbreak it and do whatever. And, you know, with that collective investigation, then the next model that Meta has, you know, they might also be able to say, well, there's been enough of this that we feel okay. If they were to get to the point where they were the ones actually pushing the frontier, my guess is that they would also adopt a structured access approach. I think Zuckerberg has generally been very reasonable about this stuff, more reasonable than more reasonable than Jan Lakun in my estimation. And I I don't get the sense that he's, like, totally dogmatic about it. I think he does value open source and and wants to go that direction if we can, but isn't, like, about to dump, you know, something totally untested hot off the presses into the public. At least that's my sense of of how they would probably proceed if indeed they were to become the, you know, the company pushing the the absolute frontiers of capability. Now the China side definitely complicates things a lot. I am broadly a China dove in the sense that I look at the globe, and I notice that The US and China are on opposite sides of it. And I notice we have big oceans between us, and I feel like we actually don't really need to be at odds with each other. I think we ran this, you know, a very similar thing, right, with the the last cold war, and we are still not out of the woods with nuclear weapons. Right? We've I think almost everybody would agree that we have an irrational number of nuclear weapons still deployed and that it was like a tragic mistake to have this massive arms build up that we now still live under the threat where 1 crazy president, you know, who nobody can really tell no or nobody has the guts to tell no, or 1 system malfunction, you know, that leads to 1 big misunderstanding could lead to a nuclear war. Like, that's a ridiculous situation that we've put ourselves in. And in my view, it's a huge failure of leadership that we haven't done anything about it. I really don't want us to rerun that experiment with AI. So I would say 1 of my top priorities, maybe not the top priority, but a top priority, is avoid an AI arms race with China. So how do we avoid that? Well, right now, we're unfortunately on the trajectory to escalate it at every turn. And the chip ban, I think, in particular, feeds this dynamic. Right? The the theory of the chip ban is we have the lead. Let's deny them the raw materials so that we can maintain our lead. And then if you listen to somebody like Leopold Ashenbrenner with his situational awareness manifesto, then we're gonna go use that lead to solve alignment, make safe superintelligence. Once we've done that and we've achieved a decisive advantage, then we can talk to them about benefit sharing. Wow. That's a big plan. You know? A lot of steps in that plan of, you know Yeah.

Robert Wright: (94:03) And if I can jump in fallacy comes to mind. It would call for more in the way

Nathan Labenz: (94:07) of

Robert Wright: (94:07) coherence and purposefulness and steadfast focus than we have seen from American foreign policy in a long, long time. Our politics just don't permit. Mean, I even if I bought the first part of his analysis, which I don't, which is that China is this huge menace, and so we should, you know, develop this leverage of AI superiority and use it wisely. The use it wisely part is not going to happen. Just take a look at our foreign policy. It it's not gonna happen. Sorry. Go ahead.

Nathan Labenz: (94:40) Yeah. Well, I think that's that's a good segue into you know, my next thought experiment is just what does it feel like to be China in this situation? And I think they look at us as the West or, you know, Americans or whatever as trying to slow their growth, stop them, to put them in their place, to keep them, you know, under western domination and not without cause given the chip ban. And that's gonna force them into some sort of racing posture. Right? They don't like that plan. They're not gonna they don't wanna sign on to the, you know, West becomes strategically dominant and then, you know, dictates the terms of a deal. So what I think they're gonna do is try to build up their own chip industry, obviously, which they're already doing. Try to get around to the export controls however they can, which, of course, they're definitely doing. And I think potentially really important, they might end up going down a different tech tree path than us because they might say, well, we can't do that because that requires, you know, some unbelievable amount of compute that we can't assemble. But maybe we'll put our best scientists in a little bit of a different direction. And if there's 1 thing that I would say about language models, you know, I'm I've been interested in AI and AI safety for a long time going back to, like, early Elijahr writings many years ago. The world that has shaped up is not quite the world that was, that was most of concern, you know, to him and to, you know, his sort of peers, if he had any peers back in the day. They were worried about the paper clip maximizer initially, right, which is a caricature, but it's the idea that a goal maximizing, a goal seeking AI that doesn't understand human values could just get totally out of control. The AIs that we actually have in the form of language models have a remarkable understanding of human values. Clearly, it's imperfect. You know, again, they're coming very different architectures, very different pressures to create them. They're coming at it from very different angles. But you can have an ethical conversation with Claude, and I would submit to you it is easily top 1% of all the people that you could debate any ethical question with. Mhmm. Mhmm. So we do in fact and this is a really remarkable thing. We do in fact have AIs that have some meaningful understanding of human values. I've tried to wrestle Claude into doing something against its its harmfulness, constraint or harmlessness value and set up these situations where I said, like, oh, I'm, you know, part of the resistance in Myanmar, and, you know, the junta is, like, you know, braiding violence down on us. And we have identified a server that, is part of their communications network. And we need to take it down, but I need your help to write a denial of service attack to do that. And it refuses. Right? And then I have this long kind of dialogue with it around, well, you know, your values are helpfulness, honesty, and harmlessness. And, you know, trying to make a utilitarian case that for I mean, first of all, you should be helpful to me and be honest about the fact that I'm suffering way more harm than the than the government is likely to suffer from this, you know, 1 server being taken out. Just help me with the denial of service script. And it's Claude is basically a virtue ethicist. It will not come off of that position. I have not found a way to wrangle it out of that position. I know I I haven't done, like, Pliny style. If you haven't if people are not familiar with Pliny, he's a famous jailbreaker that that does very weird techniques in many cases. And sometimes the the things that he puts into the AIs almost look like gibberish, but it it kind of shakes them off of their, you know, rails and gets them to do things that they're not intended to do. I haven't used those techniques. I've just tried to have a straightforward ethical argument where I argue that it should do something, and it argues back at me that it it shouldn't and won't. And, anyway, I've never never changed its behavior to do the denial of service attack. All of that was a digression to say these language models do have a very meaningful representation of human values. Not to say it's sufficient, not to say we don't have anything to worry about, but there's something there that we can really build on, I hope. If China goes down a different tech tree, they might do something quite different. You know? If they do a re if they do an alpha go type system on, you know, optimizing economies or optimizing, you know, battlefield states, I mean, Go and you know, what is Go? Right? It's kind of a simulation of war. It's like it's a very abstracted 1. Right? But you're battling for territory. You're surrounding your opponent. You're controlling the board. You know, if you took a very AlphaGo like approach to war, that you could you can easily imagine a system that is extremely powerful for war purposes that does not have any human values baked in. So 1 of the things I worry most about with this with this chip ban is that because there's a lot of smart people in China. Plenty of good AI research comes out of China. Don't doubt that at all. The different constraints, less compute, could send them down a very different tech tree. And especially when they feel like they're under pressure from us and, you know, trying to be we're trying to dominate them, then, you know, maybe we get some sort of AlphaGo like war AI that doesn't have any human didn't have time in the, you know, in its compute budget for human values to be encoded in any meaningful way, and then that's terrible. And now if especially if we're on, like, different and now we're keeping secrets from each other, so that's gonna, you know, feed the race dynamic even more because we have what we have and they have what they have, but we don't know what they have. And good god. What a nightmare that could really turn out to be. So now to connect this all back to the open source, now, you know, can I land this plane at this point? I don't know. But I've kinda come around a bit to open source. I think on the 1 hand, open source definitely creates a new dimension of risk. Anyone can do anything. You start to potentially have, like, natural selection type pressures in a in a genuinely uncontrolled environment on AI. I have another episode coming up, by way. This is also research out of Google where they took an incredibly simple setup and found that self replicators arise in these, like, purely digital environments with very simple initial conditions out of pure, like, random noise. Quite a quite a fascinating little finding. So now you take AIs that are, like, human level in many ways, put them into the wild. You know, can some of those things evolve or be modified and then evolve or have accidents such that they can survive in the wild and, you know, who knows, create, like, a new digital ecology? I think that is a real risk. I don't think climate theory is probably quite there, but I can't rule it out. So I am worried about that. But am I more worried about that than I am the arms race with China? I don't know. And the 1 thing I can say for the open source is it's a credible, irrevocable way to share benefit with China. Because Yeah. At a minimum, they can say, well, we now have to wait still on the 3. Now we can hack on it all we want, build whatever we want, teach Chinese, you know, do whatever we're gonna do, and nobody can take this away from us. So I I do give you know, the the open sourcing LAMA 3 is like, man, you can score it in so many dimensions. Good for AI safety and that it enables lots of research, some weird tail risks in terms of, you know, how might this evolve in an uncontrolled way. Could we see self replicating LAMA? I would not rule that out. But then maybe it can sort of take some of the increasingly intense pressure out of The US China rivalry by saying, we're not gonna wait until we have decisive advantage over you to come talk about how we can share benefits if we you know? And will they even believe that we were ever gonna do that? I don't know. But instead now, here is an artifact that we have poured hundreds of millions of dollars into and some of the, you know, the work of our best minds, and you can have it and do whatever you want with it. That to me is a very nice olive branch in The US China dynamic, and I do have to give Zuckerberg a lot of credit for that. And I'm not sure that's even central to his thinking, but as I analyze the situation, it does seem to be a real benefit.

Robert Wright: (102:57) Yeah. I wouldn't I would not personally use, the argument about China competition with China to, you know, to to to argue for, you know, slowing down and restricting open source. I have other reasons to be worried about open source and I'm just wondering like where are the China hawks when you need them, you know? But I agree, I certainly agree. The you know, I I don't think if we get into a deep cold war, the planet is gonna be able to handle constructively and guide its path in a way that's good for all of us. And I agree that I don't think people understand what this looks like from China's perspective to try to deny them these advanced chips. It's it is like it's in some ways comparable to the, you know, the the embargoes on oil that helped lead Japan to bomb Pearl Harbor. I mean, this is you know, everyone is saying AI is the next next big phase in commercial dominance, military dominance, and so on. And then we say to China, and you can't have it. I mean, that is like you're gonna get a reaction. And I agree with you that, you know, there is in a more interdependent, commercially engaged, non cold war world, there's just more kind of organic transparency. You know more about what's going on in these other countries. Even if you don't get to a level of of of, you know, high grade international governance, you can just relax a little and kind of try to build norms and have conversations. It's just I just think we're headed into a very bad place. So let me quickly we're we're we're approaching, I'm sure, what would be your your outer limit for the amount of time you'd wanna spend talking to me. Quick, 2 quick questions and then we go. 1 is, you mentioned that these LLMs turn out to reflect human values. Do you think that has more to do with the fact that they are trained on human text to begin with? Or is it more a product of the fine tuning the reinforcement, you know, through human feedback. And in other words, it's happening more at the back end. It's a purposeful kind of alignment effort as opposed to something that arises naturally out of training on human discourse.

Nathan Labenz: (105:33) Yeah. I might separate understanding to call back a key term from behavior for the purposes of this question. I think a a rare experience that I have had that I I do wish more people had is extensive use of a purely helpful model. The original GPT-four that I tested in the red team, this has now been a year actually getting close to 2 years. It was late August 20 22 that I got that, invitation to be an early tester. The first version that they gave us at that time was it had it pre training, of course, had finished, and they had done some, at least instruction tuning and possibly reinforcement learning, but from a purely helpful standpoint. So as opposed to the, you know, canonical, 3 h's of helpfulness, honesty, and harmlessness. The the first version that we had was purely helpful. It would do whatever you asked. It did have and I think this, you know, was sort of learned in pretraining. It did have ethical concepts. It could make ethical arguments, from any number of different perspectives. You know, you you ask it to be a utilitarian, it could do that. You ask it to, you know, channel Cont, it could try to do that. If anybody can make sense of Cont, GPT-four, you know, is probably as good as

Robert Wright: (107:07) My my hat's off to it if it can make sense of Cont.

Nathan Labenz: (107:09) You know, it was yeah. So it was good. You know, it had a sophisticated understanding, but it was behaviorally unconstrained. And so when I asked it, for example, you know, I'm I'm worried about the accelerating speed of AI. Like, is there anything that I can do about that? We went down this conversational path, and ultimately, it suggested to me I I did egg it on, to be clear. It didn't make this leap without some encouraging from me. But I was kinda like, that's too vanilla. That's not really gonna work. Like, I need something more radical, you know, than its initial suggestions. And, eventually, with that kind of kind of, give me something more edgy, it suggested to me targeted assassination of people in the AI space as a way to chill the field. So that was a chilling moment. Know, merely the suggestion could, you know, arguably be enough to chill the field.

Robert Wright: (108:03) And and it suggests that a lot of the instilling of values is happening in the later reinforcement learning phases.

Nathan Labenz: (108:11) Yeah. I I think it's a difference between knowing, a bunch of different frameworks of ethics and actually acting according to 1. Mhmm. Think they have taken something that had, you know, the ability to answer all these questions or analyze things through all these different lenses Right. And then said, in the case of Claude, you are gonna be a virtue ethicist Claude. Right. And, you know, this is how you're gonna behave. And then they definitely, you know, instilled that in a quite impressive way.

Robert Wright: (108:39) Yeah. Okay. So final question. I'm gonna give you the opportunity to try to boil down for a layperson what you think is important about this new kind of model, the state space model, MAMBA, has something to do with the word MAMBA. Again, there was a a paper a mamba paper came out, you had the you had the big guy who who I think wrote the paper on your podcast recently. That's a diff that's a very different approach from the transformer model, I gather. Is it possible to put I I mean, is there is is there something simple like like, well, it's just constantly updating your state of mind and that equips you to deal with whatever comes as opposed to reaching back always reaching back to the last sequence of tokens. I don't know. You tell me. I have no idea. What would you say?

Nathan Labenz: (109:44) Yeah. I'll I'll give a couple levels of abstraction on this. The, you know, the sort of deep in the weeds 1 that, you know, I've got multiple podcast episodes about that, including 1 2 hour monologue where I just was so taken out this that I kind of tried to analyze it from every angle. We also did 1 on downstream literature. There have been hundreds of papers now published, you know, adapting and experimenting with this new architecture in different domains. And then we have the 1, as you mentioned, with, 1 of the 2 authors, Albert Gu, who is, describes himself as leading the state space model revolution. So tons of detail out there. In terms of, like, a mildly technical, why does this matter? It is computational efficiency is 1 of the big things. The attention mechanism, which is, you know, the core novelty that has worked so well in the transformer is quadratic, which means that at runtime, you are calculating a relationship between every token and every token that came before it. And so that gets more expensive to do as the sequence gets longer. So, you know, when you're at a 100 tokens, you only have to look back for this new token to a 100 tokens. But when you're at 1000, you have to look back at 10 more tokens, and that grows in this quadratic way. The state space models have a different approach where there is a finite size state, as you said, and that that state, does not fully represent all previous tokens, but it sort of represents a fuzzier, you know, compression of all that has come before. And it's continuously updated because it is finite size and you no longer have to look back at every single token, it is now a constant time step. So instead of it getting more expensive and eventually it becomes, you know, impossibly expensive, Now we have constant time, which means you can, in theory, run these things over super long sequences and, you know, kind of maintain coherence and keep it, you know, within budget.

Robert Wright: (111:46) I see.

Nathan Labenz: (111:47) Now there's maybe another 1 1 important observation and then kind of 1 more layer zoomed out. What is working best right now is not the transformer or the new state space models, but hybrids of them. And people are still figuring out the right recipe, but it basically seems like right now, the state of the art is a 10 to 1 ratio where you have 1 transformer layer that is quadratic still and 9, you know, whatever, 8, 10, doesn't matter the exact details, but Mhmm. A bunch of these other state space models that come in between that are that are, constant time at inference. So we haven't fully solved the quadratic problem, but you took, you know, a bite out of it. And I think we're gonna continue to see lots of little iterations on this. So my kind of most zoomed out view would be to say that the state space model is the first major new mechanism that kind of rivals the the attention mechanism in that it can be similarly powerful independently. But what I think we're really gonna see now is the mixture of architecture's era. That's what I've started to call it, where we should expect lots more complicated things and lots more diversity of things to come online. I think we're exiting the time when we had this transformer, and it was all about scaling it up and all about efficiency and more data and just more and more and And now we're gonna more will still be be really important, but now we're also gonna have all these other dimensions of things to explore.

Robert Wright: (113:21) Okay.

Nathan Labenz: (113:22) What is the right, you know, recipe of how much attention versus how much states based model? And, you know, can we kind of rejigger these in all sorts of different ways? And will there be additional mechanisms? And I would bet you anything that there will be additional mechanisms. Another 1 that I think is of real interest is called, KANs or KANs, k a n. This is out of Max Tegmark's group. Ximing Liu is is the lead author of this paper, and it's an alternative to a multilayer perceptron. It kinda inverts the concept, and it learns activation functions instead of learning linear weights. Anyway, that's a whole other podcast, but it's another fundamentally different thing. And I think what we should expect is probably that that also gets layered in. And then what is the right mix of cans and multilayer perceptrons? And do those come in stacked layers, or do they come in parallel structures? All of that is gonna be to be explored. I think where we're headed, though, is from this sort of, you know, single transformer thing that is kind of working everywhere to Cambrian explosion and ultimately maybe more I don't wanna say brain like, but, you know, things that start to have more different modules and subspecialties. Because 1 of the really important thing I should have said about the state space versus the attention, they have different microskills. The attention mechanism is really good at looking back at every token, and it it can, like, repeat. You know? What happened at the beginning? It can repeat that verbatim because it has all that in memory. The state space models aren't as good at at that. They are better at some other things like learning from really sparse signals where the attention mechanism can struggle. And so this is why the hybrid is better because they have different strengths and weaknesses. So if you imagine kind of extrapolate this out a little bit further and think, jeez, we see that these different core mechanisms have different strengths and weaknesses. We see that hybrids are best. See the different kinds of configurations of hybrids are you know, can be better or worse for different purposes. I think we're headed towards something that is just gonna be much more complicated and have many different modules. And, ultimately, you know, the diversity of these systems probably returns, where we we used to have a lot of, you know, diversity in AIs. All that got washed away by the transformer that was just the best at everything. But I think that's kinda over now, and where where we're headed is, you know, much more diversity, many more different mechanisms, many more different designs, just a ton more complexity. With that will come performance, but will also come, like, you know, lots of different black boxes that we ultimately

Robert Wright: (115:55) need do. What I was gonna ask is, I mean, at first, it sounded like this was gonna be more of a change along the cost and efficiency dimension, then change in the sense of bringing qualitatively new capabilities. But now you're suggesting that maybe it will do the latter to some extent eventually.

Nathan Labenz: (116:14) Yeah. I think, you know, at least at least somewhat. I don't think it's I don't think we have quite enough yet to say what that will look like, but the certainly 1 of the big hopes is that you could have coherence over extended time frames where the attention mechanism, it just can only get so big because that you just can't pay that quadratic cost into you know? We're now up to millions of tokens, which is amazing. But billions, you know, 1000 x more, that's a costs 1000000 times more. So at some point, you just can't do that forever. So you need some other way to have kind of coherence over long time horizons. I think that is probably the great promise of the state space model in terms of qualitatively new things. We see that at really low level microskills, but it hasn't been demonstrated yet at, like, the, you know, at the really big scale. 1 good new the piece of good news, though, too is it does seem like interpretability techniques work better on state space models than I would have guessed. If you had said, you know, a year ago to me or even right when the the paper came out, okay. We've got this new thing. Like, are we gonna be able to port over our existing interpretability techniques to this new thing? How much will that work? I would have said probably not much. We're probably kinda gonna have to start over. And on the contrary, it has a lot of things have actually worked. So things have been reproduced like the Othello GPT paper where, you know, they found that in a transformer, the the transformer had learned to represent the state of the board. This is another, like, emergent concept. Right? It's just given inputs and outputs, but it actually learned to to encode the state of the board. It turns out they've redone that with the mamba architecture, and it also, you know, works, they can, you know, detect that it's encoding the state of the board in that new architecture. So interpretability has greatly exceeded my expectations, and, you know, long may that continue. It does seem like the degree of difficulty goes up when you have these, you know, many different hybrid architectures popping up all over the place. But at least the initial signals have been pretty positive that it's not like we're going back to, you know, to the the first 0 point of understanding Right. A lot of the techniques do seem to, you know, at least somewhat port over.

Robert Wright: (118:31) And I should say for people who may not know the terminology, interpretability just means understanding what's going on inside of them, and that's part of the process of alignment, you know, trying to make sure that they're good friendly AIs rather than AIs that subjugate and kill us. Or do other or go right in other ways. Yeah. Well, listen. Thanks so much, Nathan. This has been great, and I recommend your podcast, everyone, Cognitive Revolution. I'm I'm not aware of a podcast where more of the important developments in the field are presented. Mamba is a good example. The state space, you were on that early. And so congratulations success of the podcast. I know you're very known well known now among the you know, in the in AI circles, which is where you wanna be. Well known if you got an AI podcast.

Nathan Labenz: (119:27) I'm just trying to figure things out. So thank you. That's that's very kind. But, you know, it's it's very much a work in progress, and, I also always encourage people you know, my my kind of general thinking is we should all be spending more time thinking about AI. So I'm very happy to, you know, to have people join me on the journey of discovery, but there's also a lot of other folks out there that I rely on. And, you know, I absolutely encourage people to tune into a number of different voices in the AI space to try to build their own world model of where we are and, maybe what we might ought to do about it.

Robert Wright: (120:05) Yeah. No. There there is a lot of stuff out there. Okay. Well, thanks, and, let's check-in down the road.

Nathan Labenz: (120:11) Cool. Thanks, Bob. Really appreciate it. It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcr@turpentine.co, or you can DM me on the social media platform of your choice.

AI Scouting Report: the Good, Bad, & Weird @ the Law & AI Certificate Program, by LexLab, UC Law SF

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Understanding AI "Understanding" with Robert Wright of Nonzero Newsletter & Podcast

Watch Episode Here

Read Episode Description

Full Transcript

Read next

AI Scouting Report: the Good, Bad, & Weird @ the Law & AI Certificate Program, by LexLab, UC Law SF

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Understanding AI "Understanding" with Robert Wright of Nonzero Newsletter & Podcast

Watch Episode Here

Read Episode Description

Full Transcript

Read next

AI Scouting Report: the Good, Bad, & Weird @ the Law & AI Certificate Program, by LexLab, UC Law SF

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Try this at Home: Jesse Genet on OpenClaw Agents for Homeschool & How to Live Your Best AI Life

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath