The AI Safety Debates with Zvi Mowshowitz
Nathan Labenz interviews Zvi Mowshowitz on AI safety, AI discourse, and key figures in the debate. Explore clear summaries and informed insights.
Watch Episode Here
Video Description
Nathan Labenz sits down with Zvi Mowshowitz, the writer behind Don't Worry About the Vase.
Zvi is an information hyperprocessor who synthesizes vast amounts of new and ever-evolving information into extremely clear summaries that help educated people keep up with the latest news. In this episode, we cover his AI safety worldview, an overview of the AI discourse, and who really matters in the AI debate.
Do you have questions you want us to answer, topic requests, or guest suggestions for upcoming episodes? Email us at TCR@turpentine.co
Also, ICYMI be sure to check out The AI Scouting Report Part 1: The Fundamentals https://www.youtube.com/watch?v=0hvtiVQ_LqQ
TIMESTAMPS:
(00:00) Episode Preview
(05:00) Zvi’s Introduction to AI
(07:04) Weekly 10,000+ words / Weekly newsletter
(12:34) Language models
(18:25) AI Worldview
(27:30) Probably of Due
(33:10) Inspirations for Content
(39:00) Audience for Writings
(45:25) Impactful figures’ impact
(48:55) Path of the river
(55:39) Different camps in AI discourse
(01:13:55) Acceleration Front Argument
(01:20:08) Large Language Models Today
(01:27:00) Spendings in AI
(01:36:03) Principles / Virtue Ethicism
(01:43:30) Human vs Non-human Universe
(01:47:32) AI Safety & “Doomers”
(02:02:10) Expectations of Human/AI Relationship
(02:10:30) Future of Online Laws and Ethics
(02:19:10) What do we do next?
(02:34:50) Sources for learning
(02:42:08) Conclusion
LINKS:
Don't Worry About the Vase: https://thezvi.substack.com/
TWITTER:
@labenz (Nathan)
@thezvi (Zvi)
@eriktorenberg (Erik)
@cogrev_podcast
SPONSOR:
Thank you Omneky (www.omneky.com) for sponsoring The Cognitive Revolution. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work, customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.
MUSIC CREDIT:
MusicLM
Full Transcript
Transcript
Zvi Mowshowitz: 0:00 A lot of people who talk about alignment talk about it as some sort of weird Boolean, where if you figure out how to align a system, suddenly everything is aligned, and if everything is aligned, then everything is fine. And I think this is a serious way people fool themselves into thinking this problem is a lot easier than it sounds like it is. But we're going to tell them things like make the most money, make the most people happy, get me the most dates, or whatever it is. And the moment you say most on top of anything, you get effectively power-seeking behaviors, resource-seeking behaviors by things that are better optimizers than you are. And I think Sam Altman understands the problem pretty well, but it seems clear he hasn't cultivated a culture of safety amongst the people he hired. I think one of the biggest barriers to coordination is people assuming they can't coordinate and therefore not trying. We need to engage with the Chinese, and we need to talk to everybody else as well, and we need to explore, could we coordinate on these ways? Which first requires us to understand that we need to coordinate on these ways.
Nathan Labenz: 1:04 Hello and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas, and together, we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my cohost Erik Torenberg. Hello and welcome back to the Cognitive Revolution. Today my guest is Zvi Mowshowitz, author of the influential blog Don't Worry About the Vase, online at thezvi.substack.com. Zvi is an information hyperprocessor with a background in math, trading, economics, game theory, and game design. Over the last three-plus years, first with COVID and now with AI, he's carved out a unique niche for himself as a writer who can rapidly synthesize vast amounts of new and ever-evolving information into extremely clear, comprehensive summaries that help educated people keep up with the latest news. We had a wide-ranging conversation, first talking about Zvi's long-standing interest in AI and his general AI safety worldview, and also comparing notes about how we are each using AI in our respective workflows today. Spoiler: we do both use LLMs for a number of different purposes, but neither of us use it much in our core writing work as of now. We then discuss who really matters in today's AI debates. In other words, who exactly should we be trying to influence with our AI takes? I found Zvi's perspective on this question very interesting. Then Zvi gives an overview of the current state of AI discourse, summarizing the positions of the so-called effective accelerationists, the AI ethics and AI safety camps and the, in my view, unfortunate rivalry between them, and of course the doomers, or as Zvi prefers to call them, the worried. Along the way, we cover Zvi's personal ethical framework, how much value he sees in possibly, but possibly not, sentient AI agents, whether the current AI moment was inevitable, the promise of various approaches to AI safety, the impact of the specter of US-China rivalry on AI discourse, and whether the AI safety movement has been a huge success or perhaps has merely further accelerated AI capabilities progress. As always, if you're finding value in the show, we encourage you to share it with your friends. And also please send us your questions for an upcoming AMA episode, your guest suggestions, especially if you know entrepreneurs or researchers outside the United States who would make great guests, and of course your general feedback is always welcome. You can email us at tcr@turpentine.co or DM me on Twitter where I am @labenz. And finally, don't forget that my new AI scouting report is now available on our YouTube channel. We'll be promoting this more in the coming weeks, but already I've been very pleased to see that a number of early commenters have said that it really has helped them solidify their understanding of AI fundamentals. With that, please enjoy this conversation on AI safety and AI discourse with Zvi Mowshowitz. Zvi Mowshowitz, welcome to the Cognitive Revolution.
Zvi Mowshowitz: 4:21 Thank you. Great to be here.
Nathan Labenz: 4:23 I'm excited about this. You are a leader when it comes to attempting to make sense of the AI discourse, which is, along with everything else, going exponential these days in terms of its volume. So I'm excited to get your worldview on all of this and try to get your sense of the lay of the land, what really matters right now, and hopefully identify the signal that really matters in the increasingly noisy environment that we're in. I usually don't even do this when I'm talking to people who have just published a paper or a project or a product, but because your role is more meta-analyst, I thought it would be good to first just give you a second to introduce yourself, give us a little bit of your history on how you got interested in AI, the perspective that you're coming to it from, because I think that will really help inform people as they listen to the rest of your commentary.
Zvi Mowshowitz: 5:14 Yeah, I want to say none of the things that you said in your introduction seem less strange over time. They will continue to feel bizarre, and I think for a long time. So I've been thinking about AI one way or another for decades now because I was introduced to the Foom debate between Eliezer Yudkowsky and Robin Hanson, just how I got into the rationalist discourse in the first place. And so I've been expecting AI on the horizon and wondering about what we can do about it and expecting the default outcome to be quite bad if we don't actively do something else for a very long time at this point, over a decade. But I had made a deliberate decision not to focus on that, not to make it the thing that I was having shower thoughts about or bothered me so much, because I concluded other people are better equipped to do this. My comparative advantage lies elsewhere. My math is not strong enough to engage in the kind of theoretical math that they were working on back in the 2000s and 2010s early on. And so I instead, at first I was a competitive gamer before I started worrying about this. Then I was a trader of various types, tried to start a few businesses, and then in 2020 COVID happened, and then I started writing about COVID. I'd already had a blog and I was writing a bit about my thoughts, but then I started writing just because that's the way I process the world. So you write down what you're thinking, you try to explain it to somebody else. If you can explain it to somebody else, you actually know what you think, and it forces you to check your sources and justify your reasoning, and I needed to process what was happening. And then people found it useful and I kept doing it and then it kind of snowballed, and now I'm a professional writer. And when COVID ended, I jokingly said on Twitter, weekly COVID posts replaced by weekly AI posts, as if that was like what a nightmare, this is going so fast. And everyone said yes please. And I was like oh. And then I tried it, and I've been writing 10,000-plus words a week ever since.
Nathan Labenz: 7:11 So the 10,000-plus words is a good guide for the uninitiated. My general sense of what you're trying to do in these periodic roundup posts is sort of create the thing that if people were patient enough to not read anything else and then just come read you once a week, they would kind of come away without having missed anything important. Is that how you conceive of the project as well?
Zvi Mowshowitz: 7:35 That's one of the pillars of the project, same with Cognitive Revolution before. It's a one-stop shop where I've got all the links, but if you wanted to ignore AI entirely and the AI discourse and AI risk and AI policy, except for using it for your own purposes, and then once a week you check back, you see what I have to say about the parts that matter, you can figure out if something you need to pay attention to, you can click through, you can explore further, or you can not. And the other big thing is to explain the thinking, explain, build up the world model over the course of weeks and months and many words, and help people think about what's going on, not just lay out, okay here's what I think, here's what you should think. I think that's not as helpful.
Nathan Labenz: 8:22 So how do you just briefly approach the challenge upfront of just ingesting all the information? I mean, it probably goes without saying that you're a pretty fast reader and just naturally given to it. But even so, there's so much stuff that even you can't probably consume everything you'd like to consume at this point. I saw a tweet that you posted recently about the AI x-risk, pro versus con with Tegmark and Yann LeCun and their debate partners, and you asked like should I watch this? So I thought that was very interesting that you're crowdsourcing what you should watch, a reflection of just how much stuff there is. But maybe more broadly, how do you approach identifying what you want to take in in the first place and then obviously working through towards synthesizing that for the audience?
Zvi Mowshowitz: 9:09 Yeah, I find audio content is especially expensive to consume, because I can consume at about 1.3x and still be able to actually process information and arguments of people who are speaking detailed, complex stuff. And for really tough stuff, I can't speed up at all. I want to go at 1.1x. But that's still, compared to reading the transcript, that would be at least 2x, probably more like 3x or 4x that speed. And I can then deliberately speed up and slow down as there's explicit things I want to think about, and I don't constantly have to process the new information while I process it. It's much better. So when I have something with audio content, I have to think carefully, do I want to consume this? And I have to make cuts, especially with lots and lots of three-hour podcasts coming out now from the Future of Life Institute, 80,000 Hours, and lots of other people. And for a while, the answer was on every podcast, no demand. And a lot of that's really good content, but you have to make choices. For reading, my procedure is I have Twitter where I set up a consistent set of lists. Primarily I have a rationalist list and an AI list, and I have my follows, and I consume those, and then I pop out windows for here are some things that have content I want to consider at a future date. If it's a small thing, I use the do it now principle and just do it immediately. If it's going to take more than two minutes to handle it, I'll pop it out. And then at some point later, I'll go through all the AI links and I'll go, okay, what does this have to say? How does this incorporate into what I'm doing this week? Which section would this fall into? And then I build the thing step by step as I have different sources to incorporate. And I also have a Feedly, so I have one for RSS feeds, and that's where I get most of my other stuff. And then keep in mind, these two will alert me to anything that isn't on either of these two. So I can then go find it and then proceed there. Plus there's some things that appear on LessWrong or the Alignment Forum.
Nathan Labenz: 10:58 That's fascinating. So we've got some aspects of our history here that are very similar. I also got initially very interested in AI from the Overcoming Bias, Eliezer and Robin Hanson days back in, like, primarily 2007, I think, was the heart of that. And similarly also felt like, at least as Eliezer was articulating the work to be done at that time, such theoretical math chops seemed required that I basically felt like that wasn't really for me. And obviously the situation has changed quite a bit. I'm very different though actually on the mode of consumption. For me, I can listen to pretty much everything at 2x, and I kind of slot it into my downtime pretty naturally. And I feel like it's more like the actual time looking at a device that feels scarce. That probably also just reflects that I'm not as fast or as fluent of a reader as you probably are.
Zvi Mowshowitz: 11:56 Yeah, different people are very different. Tyler Cowen will read at several times my speed. Otherwise there's no way he could read what he writes about, what he really says he's reading. It's insane. I read faster than most people, but I definitely have trouble with listening, with the auditory processing. I've always had it. I've had problems learning foreign languages for that reason. It's just something that takes more of my energy. And so it's relatively less efficient for consumption. And every time someone says I listened to your writing, this strikes me as absurd. Why would you listen to a podcast of this when you could just read it? But sometimes people do.
Nathan Labenz: 12:33 Is there any role of language models in your workflow today? Like do you use a tool that helps you summarize, or are you dabbling in creating first-draft type of content for yourself, or do you just do it still, all Zvi, top to bottom?
Zvi Mowshowitz: 12:51 The writing is all me. I have found the LLMs are very, very bad at summarizing the things that I'm talking about. They're very bad at expanding your writing in this type of form. I know what style I want. I don't want to transform into a different style. The spell checker basically does most of what I want it to do. I don't think that having it checked for errors would be very convenient. I've experimented a bit with this and been writing Python code and I haven't been able to get anything particularly useful on that. I probably should experiment more, but I don't think they're quite there yet. I think other people will build tools. I'll eventually find those tools useful. I'll start experimenting with those tools. But it doesn't mean I don't use LLMs. I absolutely subscribe to GPT-4. I use GPT-4 and I use Bing and I use Claude sometimes. But the main thing I use them for is for finding out about the world, asking questions about what's going on, asking questions about things I don't understand, checking intuitions, checking to see if the LLM will share my intuitions about a question, and also coming up with more examples of something. If I have an intuition, like today I was starting to write a post about debate, and debate strikes me as something that's kind of like democracy. It's one of these, it's the worst thing except for all the alternatives. And so I asked Claude, name 10 more things that are like that. And it came up with some beautiful examples and I started to lose steam by the end. I got five really good ones. And so stuff like that is really useful. And just, it's better than Google for, well, what was the source of this quote? What's going on with this particular event? Can you narrow this down? Can you extract this particular piece of information? I'm often wanting things that aren't that easy to Google specifically. And also I'm just confused about something and I want to understand it. It's quite useful. And for coding, I still don't code very much, but when I did have something like this going on, my productivity was through the roof when I started using it. That was just obvious. Zvi Mowshowitz: 12:51 The writing is all me. I have found the LLMs are very bad at summarizing the things that I'm talking about. They're very bad at expanding your writing in this type of form. I know what style I want. I don't want to transform into a different style. The spell checker basically does most of what I want it to do. I don't think that having it checked for errors would be very convenient. I've experimented a bit with this and been writing Python code and I haven't been able to get anything particularly useful on that. I probably should experiment more, but I don't think they're quite there yet. I think other people will build tools. I'll eventually find those tools to be close. I'll start experimenting with those tools. But it doesn't mean I don't use LLMs. I absolutely subscribe to GPT-4. I use GPT-4 and I use Bing and I use Perplexity sometimes. But the main thing I use them for is finding out about the world, asking questions about what's going on, asking questions about things I don't understand, checking intuitions, checking to see if the LLM will share my intuitions about a question, and also coming up with more examples of something. If I have an intuition, like today I was starting to write a post about debate. And so debate strikes me as something that's kind of like democracy. It's one of these, it's the worst thing except for all the alternatives. And so I asked it, name 10 more things that are like that. And it came up with some beautiful examples and I started to lose steam by the end, I got 5 really good ones. And so stuff like that is really useful. And just, it's better than Google for, well, what was the source of this quote? What's going on with this particular event? Can you narrow this down? Can you extract this particular piece of information? I'm often wanting things that aren't that easy to Google specifically. And also, I'm just confused about something and I want to understand it. It's quite useful and for coding, I still don't code very much because I had something like this going on, my productivity was through the roof when I started using it. That was just obvious.
Nathan Labenz: 14:38 Hey. We'll continue our interview in a moment after a word from our sponsors. Yeah. That all resonates with me as well. I think pretty much down the line, I don't really use it for my writing. I am starting to experiment a little bit with actually the mobile app because the transcription is so good that I am finding there's at least some utility in actually, I was doing this the other day, walking my kids down the street in their stroller, trying to get them to take a nap. And as they're getting close to nap, I'm talking to the mobile app and kind of telling it what I want to do, what I'm trying to create. Here's kind of my thoughts. It'll take a shot. It is typically not close. But then I find, like, just transcribing, recording again and going just scrolling through its output and just giving it raw commentary on its output and saying what I want more of, less of, do and don't like, et cetera. Over 2 to 3 rounds of that, I'm having some successes where I'm getting to 80% kind of there. And it does feel like when I actually do go back to my computer later and want to write that I'm in a better place than I would have been if I had just sat down blank slate or even just because my typing is only so fast. Right? So it's just getting a lot of stuff out that then I can edit. I'm not sold on that workflow yet. I do think it has some trade offs, and there are some things that the LLM does that you wouldn't have done that you maybe kind of wish hadn't been the final product that kind of still creep into the final product. But I don't know. It's at least kind of starting to work for me a little bit in that way.
Zvi Mowshowitz: 16:19 All the discussions about AI and how much AI is useful and how much it will aid productivity is the question of where is the limiting factor? What is actually slowing you down? What would help you and not help you? In this case, it's a place where your writing experience is giving you a bottleneck and being able to flush out your thoughts, get a rough idea of what you want to do, and that's just not a bottleneck for me at all. So for me, the bottleneck is getting the information. It's knowing what I want to say, how I think about something and the actual processing of that. When I start writing, it's almost like this just, if anything, is free while I'm finishing the part of processing the information. And so having it take this kind of first stab wouldn't actually speed up my process at all because the part where I'm doing that is actually doing work anyway and it's also very quick. So I think a lot of AI, when people say it's not very useful, a lot of people haven't figured out how to use it. And a lot of that, they haven't thought carefully about their own process and exactly what parts of things are easy and hard. Don't try to do exactly what everyone else is doing. Do the thing that's useful for you and then iterate on it and also learn what kind of prompt engineering gets you the thing that you want. I haven't done any of the detailed like paragraphs or pages long things, but I figured out some very basic tricks that work for specifically getting me more of the type of things that I want, less of the type of things that I don't want.
Nathan Labenz: 17:46 That all resonates to me. The other use cases, I'm right there with you too. I mean, coding, good god. Similarly, I code some, but not full time every day. And the ability to just import the right stuff and write the right kind of boilerplate and get me, man, that really helps a ton. So I love that. Okay. Let's talk about your worldview, and then we're really going to dig into all this stuff that's going on in AI discourse. But, again, just want to kind of set out your position a little bit. Could you, this is a big question, but could you take a couple of minutes and characterize your overall AI worldview, and specifically, as the safety debate has really heated up, emphasis on that.
Zvi Mowshowitz: 18:26 Yeah. There's a lot of different aspects to how one thinks about this. And a lot of the problem with the debate is that whenever you're debating somebody, you never know which of the 20 things that are unclear they're actually disagreeing with you about, and often it'll manifest as a disagreement somewhere else. And you won't understand that where they're actually disagreeing with you is something they haven't mentioned because it hasn't even occurred to them and someone might think differently about that. It's so obvious to them. At other times, there will be 17 different disagreements. They don't really have a common cause or don't have a common logical cause but are motivated by the desire to get to an end result in various people or other things of that nature, or they're just sort of going off a vibe and don't really have opinions on most of these questions and they don't really have a coherent model and then you have to get them to think about the thing at all in order to have a disagreement so that you can then figure out what part of this is useful. So for me, I would start with, LLMs are this incredibly useful tool. In their present state, they are almost entirely positive. I think that almost all of the fears and worries about misuse or negative effects of LLMs are overblown and, if anything, reversed, and that the effects of current levels of tech are just almost uniformly going to be positive. However, I do think that this changes as the AIs approach to surpass human level intelligence. I think that this is a very different environment for us to live in. I think it's a very different world. I think that lots and lots of very hard to predict things start happening. And that if we're not very careful, and almost by default, the outcome is the death of every human on the planet, or at least the loss of control over our future such that if you ask most people, including myself, they would see the resulting future to have very little or no value compared to certainly potential futures or compared to what you currently might imagine a normal non-AI future to be. And so you have to draw this big distinction between the AI is a tool. Right now, it's a tool. It has some amount of ability to reason things and figure things out and mimic human intelligence, but it's not like that. And we can use that to greatly enhance our lives and experience and our progress. And the point where the AI starts to become a rival optimization engine, rival set of capabilities, rival intelligence that can eclipse our own. Whether or not there's a curse of self improvement, as Eliezer Yudkowsky talks about, and then suddenly, very quickly after we have something that's human or near human, it suddenly is deeply superhuman and we have no idea what's going on. And then the entire solar system's atoms get rearranged however the superintelligence would prefer and that likely doesn't involve us. Or if it's something that looks more gradual, looks more slow, I think these are all possibilities, not just possible, but like substantial double digit probabilities of various outcomes. And that all of these scenarios offer ways for us to come out of it very well and ways for us to come out of it not at all, if we're not very careful about it. And you have to think carefully about how these things work, what are the problems that are going ahead, and that almost everyone is thinking very poorly about at least some of the problems involved in this, even if they think well about some of the other problems. And there probably aren't any people who are thinking very well about every step of the way, and that includes me. I'm almost certainly thinking poorly about some of these steps and that will be exposed over the coming months and years, hopefully, as we learn more and I think more and I change my mind. But essentially the AI safety problem is if you build a more capable optimization engine, something that's more intelligent, can do more things than we are, that has inherently many advantages over us, it'll be copyable, it'll be faster, it'll be able to instantiate in various ways, be able to rewind every form of state, it'll have essentially limitless memory, et cetera, et cetera, input huge amounts of data, orders of magnitude more than any human could possibly hope for, it's already happened, et cetera, et cetera. So if it can mimic the parts of what we have that's missing well enough, suddenly its capabilities, the sky's the limit. And then it's going to be a tremendously powerful optimization engine that's going to optimize whatever gets pointed at to optimize, even if we try to keep it as a chatbot, which we say, oh, we're not going to give it goals. Well, we are going to give it goals. We know this, we've run this experiment. And so it's going to have goals, it's going to have targets that are optimized according to some set of priorities, and we have no idea how to make it do this in a way that we control what those priorities are, and we have no idea what priorities, if we could control it, we could put into it that wouldn't ensure a disastrous outcome even if we had that ability, either individually or collectively. So all of this are problems we have to solve. And we have to solve it knowing that we don't know how to solve these problems very well in current systems, despite the fact that we are smarter than the current systems, and the current systems are not dangerous, and are not engaging in particularly adversarial actions. But when we start creating these more intelligent, more capable, stronger optimization engines, on every level we're going to face optimizations for things that we do not want, that respond in adversarial fashion, that require a security mindset, where any way for them to fail is the way they definitely will fail. Every time we try to fix problems, we are correcting for the thing that fools us into thinking we solve their problems or that gets around whatever our solution is. And fundamentally speaking, we're trying to get and keep control of something that is smarter than us and more capable than us and more efficient than us with competitive advantage over us. And that is a very unnatural thing to want to happen. And if you think that this is impossible, I am deeply confused. If you think it cannot possibly go badly, I think you're just not thinking about this clearly at all. And if you think we will probably be fine, then I am confused why. I think a lot of people are like, well, we'll figure it out. We will solve these problems and it'll probably be fine. And I think we have to solve a lot of virtually impossible, not impossible in the impossible difficulty level on a game sense, problems. In order to not have this go badly, I think there's probably solutions to that. But these solutions are incredibly hard to find. And as Eliezer Yudkowsky emphasizes, they may not have to be solved on the first try because the moment you transition from a non dangerous system to a dangerous system, that's exactly when a lot of the things that were working before will stop working, your techniques to align the system, as we call it. And it's exactly the time when, if the system isn't aligned, you are in deep, deep trouble and you might not, you, meaning the human species, might not be able to recover from the serious mistake. And we also can lock ourselves into various dynamics and equilibriums and world states where even though we figured out how to align in an important sense individual AIs, we've been able to figure out how to make them act according to something that we told them to act according to, how do the resulting competitive dynamics end well for us? And those are more problems that I think people are not thinking clearly about, people don't have good solutions to. A lot of people who talk about alignment, talk about it as some sort of weird Boolean, where if you figure out how to align a system, suddenly everything is aligned and if everything is aligned, then everything is fine. And I think this is a serious way people fool themselves into thinking this problem is a lot easier than it sounds like it is, because yes, that's part one of the impossible problems, is you have to impossibly figure out how to make the thing do the thing you want at all, but then you figure out what you want, and collectively, we have to coordinate on figuring out what we want, and we have to make this an enforceable, actually instantiated thing, and we have to not screw it up in a possibly stupid way, because the history of humanity is the history of people constantly making incredibly stupid mistakes and having to fix them by trial and error, even when everybody means relatively well.
Nathan Labenz: 26:51 So let me just try to give the super high level summary of a couple of those things. I guess if I were to try to describe your overall kind of expectations, first order approximation, it almost seems it's uniform probability across everything. Basically, radical uncertainty where there's some chance everything could go great. You don't really quite know how. There's significant chance things go very bad or totally catastrophic. And the details of that are also kind of fuzzy, although you gave a little bit more definition, I think, to how that might actually play out versus how things could go well. But is that basically right? Just kind of probability wise, radical uncertainty?
Nathan Labenz: 26:51 So let me just try to give the super high level summary of a couple of those things. If I were to try to describe your overall expectations, first order approximation, it almost seems it's uniform probability across everything. Basically, radical uncertainty where there's some chance everything could go great. You don't really quite know how. There's significant chance things go very bad or totally catastrophic. And the details of that are also kind of fuzzy, although you gave a little bit more definition, I think, to how that might actually play out versus how things could go well. But is that basically right? Probability-wise, radical uncertainty?
Zvi Mowshowitz: 27:37 Yes and no. One of the things that people do is they ask, what's your p(doom)? What's your probability of doom? And the answer to me, the first question is, where does that matter? I'd say that if you're in the single digits or below, that's important because then it starts to be a question of, well, what's the p(doom) of not acting or slowing down or are there alternate possibilities? What are the gains to be had? And you start to think, well, maybe we're getting close to a point where we might want to start taking some chances on that level in order to get the benefits, or you could argue even that's crazy, and in many other contexts, almost everyone on mirrors would agree. If you think your p(doom) is super high, it's in the 90s or 99s or whatever it is, then you have to start thinking about what actions might make sense given that you can't really hope to have a good outcome by proceeding. And so that starts to change your outcome. But if you're anywhere in the middle, there are very few actions that make sense at 10% doom but not 90% doom, or vice versa, because all of the orders of magnitude of payoffs and costs are just bigger than this one order of magnitude of difference here. And so it's not that you have radical uncertainty, it's that it doesn't really pay to be that precise. Being that precise is not as valuable as people might think. And that's one reason why people end up with these kind of arbitrary probabilities. And also there's so many different things in the process to model, many different ways it can change. You hear a lot of people say 5%, 10%. You don't see that many people closely in the middle, but that's actually where I am. I say 50% when I have to make a guess. But I understand that, as the rule is a base, my numbers should be shifting around constantly and it's not that clear. I wouldn't say all these things are equally likely. It's more that these are all likely enough we have to consider them as plausible scenarios and we have to be willing to spend substantial amounts of time and effort and money and sacrifice utility to make sure that there are good outcomes in those cases. But if I had to guess what the most likely outcome is, I'd say FOOM is less likely than non-FOOM, although again, FOOM is potentially alive. And that if I guess that we lost, I'd say a lot of it is, well, we just failed the alignment problem. Maybe roughly half of it is we just failed the alignment problem utterly. Maybe half of it is that we succeeded at the alignment problem as we talked about the alignment problem, and then we specified something that got us doomed anyway because we didn't understand the dynamics implied in what we were doing. But I don't consider these questions to be worth the detailed time and effort to try and get numerical answers from because I don't see any actions that change, right? And again, the question is, are these real things that we really have to deal with or not? And the answer to me is, well, all of them, yes. And that's much more important than, well, okay, exactly where do we go wrong more often? To me, it's an interesting but not very useful question.
Nathan Labenz: 30:35 Yeah, okay. I think that all makes a lot of sense to me. I think I'm pretty much right there with you in terms of not spending too much time trying to narrow my confidence intervals on how likely are various extreme outcomes. It just seems like the lower bound on that, the lower end of that interval, is high enough that it is of concern to me. It seems like it should be of concern to most people, regardless of exactly where you end up. So it resonates with me and strikes me as just generally wise approach to not get nerd sniped into false precision. The simplest version I'm going to try to do it again. The simplest version of why this is something that people should really take seriously. And then I do want to get on to the more discourse side of this too. But just your view, simplest possible form seems to be when something is more powerful than you as a general purpose optimizer, that's inherently dangerous. And even if you get to set it up, it still remains dangerous. Is that a good one sentence summary?
Zvi Mowshowitz: 31:42 Yeah. We're going to give these AIs, even if we manage to give them the goals and ideas that we want, what's called maximalist goals, right? We're going to tell them things like make the most money, make the most people happy, get me the most dates or whatever it is. And the moment you say most on top of anything, or you want the most probability, you want the most number of nines on your success on anything you could do, suddenly, or the thing is simply trying to be selected for more instantiations, or we're just seeing more of the copies of the things that successfully convince humans to instantiate them more or that manage to gather resources or any of these dynamics like that, you get effectively power seeking behaviors, resource seeking behaviors by things that, again, are better optimizers than you are. And by default, those things are going to end up with the resources, those things are going to end up with the power, and that's not going to go well for us.
Nathan Labenz: 32:39 Yeah, makes sense. Certainly, I have never heard one good reason to dismiss that as a concern. So let's leave that there for now, and we'll certainly touch back on some of these aspects as we go through the survey of the AI debate and debate debates, as you put it. For starters, I wanted to ask who you think matters as you create content. Obviously, anybody can come to your blog. Among many other points of confusion, I feel like a lot of people are kind of confused about who are they even trying to reach when they put stuff out in the first place. I've got my own quick model of who matters and why. And I want to just bounce that off you and see how you would react to it and think about it maybe differently. Seems to me like the most important people right now, if you could choose who you would want to influence, would be the leaders of the leading AI developers as they are the ones deciding what to do and not to do and how to approach various problems in actual work making actual systems, which seem to be getting quite powerful quite quickly. So leadership of OpenAI, leadership of Anthropic, leadership of DeepMind, maybe a dozen global companies would seem to be the most critical decision makers right now. And then my tier 2 would maybe be policymakers on the one hand and the general AI research community on the other. Those are maybe, you know, if you think there's maybe 100 decision makers across those top 10 or so companies, there's maybe then 10,000 each of the policymaker and researcher, somebody who could really make an advance in the field kind of sets. And then the broader circle beyond that would be maybe the discourse. All the people that are sort of echoing stuff back to each other, public intellectuals, venture capitalists, that whole, people who have a lot of clout on Twitter but aren't necessarily making object level decisions in the short term. And then finally beyond that, there's the general public. And they feedback onto who are going to be the policymakers and that kind of on some distant level controls, will companies be allowed to do stuff or not? How would you compare and contrast how you think about it versus that starting point?
Zvi Mowshowitz: 34:57 I think this course is incredibly complicated, and you can't think about it in isolation. You have to think about it together with questions like what needs to happen for things to go well? What, if it goes wrong, causes things to go poorly? How do these things influence each other? And these questions are very complex and difficult to answer. So if I had to choose one person to align and compel properly on the margin, to edge toward my preview, I would definitely pick Sam Altman, right? Sam Altman is more important than Joe Biden in this sense. And then Dennis Hassabis, Dario at Anthropic, potentially the CEOs of Google and Microsoft, Zuckerberg and others at Meta become prominent because they are the highest on the totem pole that clearly don't take this seriously. And it would be great if they did take this seriously. It would substantially make the situation better. And then, how important are they versus how important are the politicians? How much do politicians actually drive policy? How much are they willing to stick their necks out? How much do their opinions, do they have real opinions? Do they matter? If we convinced Joe Biden, what would happen? I don't even know if it would be that big a deal or how likely he would be to understand the issues involved because they're very complicated and technical and he has infinite demand on his time and is very old and I'm sure his amount of compute left is shrinking in some sense, certainly available for this. And then we have a system with a lot of these veto points and negotiation points, and it's very unclear who causes what to happen. One of my disagreements with Tyler Cowen is Tyler puts a lot of weight on academic consensus and publishing papers to try and convince a bunch of academics to then try and convince potentially a bunch of behind the scenes policy people, then maybe convince a bunch of lawmakers and regulators to do the things that would be better. And I think it's possible that that's an important vector, but it's confusing to me and unclear whether that vector is actually open to us or how effective it would be. And I don't see how to run experiments to actually prove it one way or the other. Similarly for the public, right? The public currently doesn't really know what it thinks, hasn't really understood the situation very well, but tends to be pretty worried about AI when you actually show an AI and ask it basic questions. And how much does that matter? How much does that drive legislation? And then another key question is, how much do various different paths lead to good interventions that are well selected and designed to actually ensure good outcomes versus they cause misaligned actions. So one worry you would have is the population gets really scared about AI, they demand action, and then we prevent people from using AIs on a variety of current productive activity in ways that make our lives worse but don't make us any safer from the real threats. And so that's a reason not to go down the public line, right? If we convince the public to be scared, we're not going to impart this very specific, complex message that we need to control the training of advanced frontier models because that is where the danger lies, that is what could go wrong. They are just going to latch onto whatever their minds can see because they are not going to spend endless hours on this. They are not specialists, right? And I would do the same if you talk to me about something that I spend very little time on.
Nathan Labenz: 38:34 So how does that play into your writing? It sounds like, I mean, almost by definition, you're not writing for the general public. Anybody who's reading you, that's enough to qualify as being part of the public intellectual set, at least. But do you actively think about an audience of OpenAI leadership, Anthropic leadership, or do you just write what you think and hope that works its way to them?
Nathan Labenz: 38:34 So how does that play into your writing? It sounds like, almost by definition, you're not writing for the general public. Anybody who's reading you, that's enough to qualify as being part of the public intellectual set, at least. But do you actively think about an audience of OpenAI leadership, Anthropic leadership, or do you just write what you think and hope that works its way to them?
Zvi Mowshowitz: 39:01 I think there's a certain amount of you write for yourself always, where your writing's going to suck. And so you write the things that I would want to see if I hadn't put the effort into making this, the things that explain things to me. And then the question is, who else are you targeting? And I try to become more and more careful about who am I targeting and who matters. Because one of the things that I do think is the number of people who actually have a major influence on events, who sculpt what's going to happen, is relatively small. Not that other people collectively don't matter, but that mostly they are not individually worth trying to target in that sense. And certainly, I do have the case of, I'm going to target people who can handle very dense, very complex amounts of information, massive amounts of information, and actually want to process it and make sense of it, and can engage with it on the level it deserves to be engaged with. If you can't handle that, I'm counting on people who can handle that to then get the information to them in some number of steps by putting distillations into a form they can process or are willing to invest the time to process. It's not a knock on them. If you gave me a 10,000 word per week complex dense thing on most topics, I wouldn't read it, even if it was very good, because I just don't have that kind of time. So in my mind, there's a handful of public intellectuals who are sticking their neck out and making impactful, meaningful arguments about this. They show up time and time again. I try to engage with them specifically. I talk to them, and I talk to people who are reading them, I talk to people who are paying attention to them. More specifically, I do hope that Altman and Sabis and Amadai are reading what I have to say, but I think even anyone who works at one of those labs, who can then pass the information along, who can then spread it to other people, who can then pass the core concepts and the most important parts onto people like that and onto people who work around them matters a lot. One of the things I worry about most at OpenAI is that I think Sam Altman understands the problem pretty well, but that it seems clear he hasn't cultivated a culture of safety amongst the people he hired, such that a lot of people who work at OpenAI don't understand safety the way that Sam understands it, and the culture of the people he hired drives your company's actions, to a large extent. And so he has much less ability to steer in these ways than Dario does at Anthropic where he did, in fact, do this, even though I'm not convinced that Dario had a better understanding or a worse understanding of what it actually takes to be safe at the end of the day than Sam does. I think these are serious things to consider. I don't know as much about Meta, but I've had conversations with people who work at all three organizations as engineers. And there are definitely people who care deeply about ensuring good outcomes in all three. They have at least one reader in all three, and hopefully they forward things on. But yeah, it's a combination of also people who work in alignment, including outside of the labs, people who are in the discourse, are out there being people who make the arguments, public intellectuals who actually matter. But there might be a million public intellectuals, in some sense, out of 7 billion people, but the vast majority of them don't care about AI, and the vast majority of the ones who do care about AI still aren't really providing any inputs to the way AI goes. They're more of what I think of as a lesser tier with respect to AI. So I have this metaphor of on level one, you have people who are generating new ideas, new ways of thinking about things, new synthesis. And on level two, you have people who are capable of processing things from people who are in level one. And at level three, you have people who are only capable of processing things from level two because, again, of the amount of time investment and expertise they have in this particular place, and necessarily because they couldn't do something else. You can definitely move up to a higher tier, lower number if you want to put in that effort in many cases. It doesn't mean it makes sense. And so then, basically, you want to go down the pyramid. So the question is, who is going to be able to explain this to somebody down the line, or who is going to be able to persuade other people on the same level as where I am? Those are the people I'm most talking to. But I do think there's a handful of people whose viewpoints matter a lot, and if they were to change their perspectives, change their way of looking at things and make arguments in the other direction, it makes a ton of difference. And we've seen a number of those people actually make this change. Not necessarily the people who were previously out there making lots of the most impactful arguments, but it's often people who just, by nature of their careers, by nature of their experiences, they command tremendous respect. Like most recently, Douglas Hofstadter, right, author of Gödel, Escher, Bach, who pretty much everybody I know looked up to for that I Am a Strange Loop, all of that work. He thought that superintelligence, like full AGI was very far until super recently. And he has not flinched from being loud about, now that I think it's near, I look at what's about to happen and this horrifies me, this scares the hell out of me, I don't know what to do. And a few months ago, you have Yoshua Bengio, you have Geoffrey Hinton. These are a substantial portion of the voices out there. And on the other side, you have a handful of people who are cited the majority of the time when people cite reasons not to be worried, and sometimes their arguments are good and sometimes their arguments are quite bad. But if even one of those people, of the five or 10 people, were to come around, I think that would make a significant difference, from my point of view. And if a lot of them were to come around, it would be a complete game changer. I don't think that most people are doing anything other than looking to a very small group of people for guidance, either directly or indirectly.
Nathan Labenz: 44:49 Very interesting. So it almost sounds like you've mentioned you have private conversations, discourse with some of these, and then others I'm sure you're not in touch with. But it almost sounds like you have a list somewhat explicitly of, like, these are, say, 10 people who, if they were to change their minds, would make a huge difference to how things go. Would you be willing to venture actually outlining a list of some individuals that you think are most crux-y for the broader path of history?
Zvi Mowshowitz: 45:19 So I think there are also a lot of people in positions of governmental authority or power who would be tremendously impactful if they were to change their minds. Certainly, if any senator were to come out pretty strongly with good views, well explained, I think that would buy a tremendous amount of credibility. Certainly, if someone like Biden were to do so, it'd be huge. The Prime Minister of The United Kingdom came out as concerned about this, and I think that definitely made a significant difference. I think there's a broad category of many people who are not currently having a voice in the debate, who if they were to suddenly make it clear that they cared about this pretty deeply and became loud, would matter quite a lot. Certainly hundreds, possibly thousands on that level. But in terms of people who are being very loud on that side of the debate, I hate to spend more time on him than we should, given his behaviors, but for example, if Yann LeCun, he's been very, very outspoken, very loud. He works at Meta. He is the third godfather of AI, and the other two have joined the cause. If the third one were to very profoundly and loudly say, oh, I realized I was being stupid, I was wrong, and we could all welcome that, he started making relatively good arguments and making them for a more nuanced, balanced position or even for a, this is dangerous, anything remotely accurate position, I think it'd be complete overturning of the chessboard, just that one person on his own. And there's a number of other people who clearly make a big difference because, in many cases, they are one of the few people who are being very loud in a credible way because of their previous reputations about accelerationism and the need to actively not take safety precautions or the inability to take safety precautions. And that's why I pay attention to them and I try to address them as much as possible, to probably convince those particular specific people themselves and say, you are making a mistake. You're wrong. And I respect you enough that I think that if I were to point this out, that you might change your mind. And in some cases I've written, in some cases like Marc Andreessen, I said, okay, that's not going to happen. This person is not interested in listening to arguments, in trying to figure things out for themselves. They have made their mind up for whatever reason, for better or worse, it is what it is. In other cases, I still hold out a lot of hope. I think you have to try because it's a lot of how things can go well. Also, simply people at the labs, often being willing to stand up, I think Sam Altman did in fact do a lot of this in a very positive direction recently. He's come out in favor of very good regulatory principles and ideas and of the dangers that are out there. There are other things he's doing that are not as helpful, but this alone was tremendously helpful. I think that Jack Clark and Dario at Anthropic could be making much stronger statements that I think they would endorse in private, even now, about the situation that I think would help a lot. I'd like to see Demis Hassabis speak out more. I'd like to see the CEOs of Microsoft and Google speak out more. These are people who would, I think, be tremendous game changers if they emphasized the actual problems involved. But the list could go on and on. I could name any number of people who would be tremendously influential. And that includes any of the standard people who are just above reproach, people who lots and lots of people listen to, who aren't particularly weighing in on us. I think you'd see a lot of change.
Nathan Labenz: 48:39 Interesting. Yeah, I think one big question I've been very uncertain about over time is to what degree is this just going to kind of play out how it's going to play out versus is there a sort of more contingent model of history where the local decisions that individuals make and the very small scale dynamics actually could make a huge difference. This one seems tough because it's like, on the one hand, you do have these massive trends of web scale data, web scale compute, a whole field that is pushing on this. But then on the other hand, you do have these decision makers who do really seem to matter. It seems like you do think it is ultimately kind of the path of the river of history really matters to what outcome we get.
Zvi Mowshowitz: 49:28 Yeah. I think strongly the path of the river will determine the outcome we get. I think there are, I have a lot of muddled uncertainty about this. We might live in worlds where the paths of the river are all almost entirely good. We might live in a world where the paths of the river are almost entirely all extinction. We might live in a world in which it's entirely up for grabs, and it's very hard to tell. Certain aspects of the situation are very, very difficult to stop. We're not going to stop Moore's Law until the laws of physics stop Moore's Law. We're not going to stop algorithmic improvements and hardware improvements in some general sense. We're not going to be able to contain the things that have already been released. We're not going to contain the profit motives. We're not going to prevent there from being mass investments into AI, into AI applications, unless we're willing to take very strong actions. A lot of these ships have sailed. We have very tough game theoretic problems to solve, but certainly the path from there to here has been very path dependent in many ways. If we step back, a very small number of people realized the transformer would be the transformer. The people who founded all three major AI labs that are pushing things forward did so specifically because that's directly downstream of Eliezer Yudkowsky and his warnings about the dangers of AI, which resulted both in a substantial advancement in AI capabilities faster than we would have otherwise had, and people who have a much better than we would have otherwise expected understanding of the problem being in positions of authority over all three labs. And so there are good things about this and there are bad things about this. But if a very small number of people, potentially even one person, had made substantially different decisions about what to emphasize, the discourse would look very different today. The state of AI would look very different today. You need a different timeline with a different game board. And I think the arguments for whatever's going to happen is going to happen, are something along the lines of either we've already passed the points where we can make these important distinctions and it's kind of too late, which I think is not impossible in some sense, but also a sense of there are various competitive dynamics you can't avoid, that the pressures are too strong, that the humans just don't have the coordination capacity, don't have the in practice ability to make the sacrifices necessary to change the eventual outcome that it's just fated. And all I can say is, well, not with that attitude. I think one of the biggest barriers to coordination is people assuming they can't coordinate and therefore not trying. You see this especially in China, where I think that we definitely don't have strong evidence the Chinese are in any mood to cooperate with us on pretty much anything, on any level, AI or non-AI, certainly regarding world transformational technologies. But we also do not seem to have taken the first step of asking nicely or putting an indicator of interest. We need to talk to China. We need to engage with the Chinese. And we need to talk to everybody else as well. And we need to explore, could we coordinate in these ways? Which first requires us to understand that we need to coordinate on these ways because there's no we to talk to them. There's no them and there's no we until we understand collectively, in some sense, what has to happen. But certainly that's not determined. I see lots and lots of decisions made every day that raise or lower slightly the probability of things being steered in any particular direction. And the regulations on this and the ways people look at this are incredibly fluid. I would even say that if you look at the history of media surrounding AI, that the decisions of individual script writers and directors regarding a very small number of properties had a substantial impact on our debate. Imagine if Don't Look Up hadn't been made. It's very easy to imagine a world where that project wasn't greenlit. And yet, it clearly had a substantial impact on the way people looked at things, even though it had nothing to do with AI at surface level.
Nathan Labenz: 53:42 Yeah, The Terminator also comes to mind there as if there had just been no Terminator series.
Nathan Labenz: 53:42 Yeah, The Terminator also comes to mind there. If there had just been no Terminator series...
Zvi Mowshowitz: 53:48 Right. Well, what if The Terminator wasn't very good? What if it had been given to a bad director and they hadn't hired Arnold and it sucked? If it was just a normal action movie that happened to have this weird plot, and then nobody thinks about it. Instead, we have lots of people going, "Well, this could be like Terminator." And other people are going, "Well, Terminator is a terrible metaphor." And other people are going, "Actually, the Terminator scenario itself is highly plausible if you've got time travel." And other similar related things. There's a lot of things that media actually gets remarkably right, remarkably often. And it's surprising to me on reflection. I'm watching Person of Interest right now, and they get a tremendous number of things correct on a technology that didn't exist and that they couldn't possibly have understood. They also get plenty of things wrong, of course. And there are some things that are intentionally wrong for television purposes. But it's amazing how much it's like, no, this is how LLMs perform. This is what big data is about. This is the kind of inferences you can draw. This is the kind of capabilities that emerged when you didn't expect them. This is the kinds of issues you have to think about when you're dealing with the emergence of these kinds of systems for the first time. And what are the ethical considerations and who has power over this thing? How do you limit it? How do you align it? What's the alignment tax you're paying? They don't use the words we use, but they ask a lot of these really hard questions. And they come up with highly plausible ways that something might play out or the ways that characters think about them might answer them in a dramatized world.
Nathan Labenz: 55:20 You said a minute ago there is no us and them until we figure out what everybody thinks. China being another dimension of this entirely. Just zooming in on the local Western AI discourse, there definitely are some camps. So I wanted to get your survey of the camps, how you see them, which maybe can be merged or could at least work together versus which are sort of irreconcilable. I guess I would break it down for starters into four or five camps. On the "what me worry" end, you have the EAC, effective accelerationists. You can maybe try to steel man that for me if you can. So that's nothing to worry about. Then there's the "current concerns are most important." That's often called the AI ethics community. They often seem to be accusing people of existing in a category that I don't really think anyone necessarily exists in, but that would be the AI safety hype marketers. That's sort of the accusation that people are making bad faith arguments about risks because they want to drum up interest in their own products. So that, I would say, seems to be an accusation from the AI ethics folks against maybe the next camp, which is the AI safety folks. Basically, we should definitely take this seriously. It seems like it could be a real problem. As you mentioned, all the leaders of the key developers are seemingly in that camp today, with maybe Yann LeCun from Meta as a notable exception. And then finally, you've got your doomers. We're so screwed that, as one person memorably put it, time to start spending more time with your family. Is that how you would break it down as well? I mean, that's the typical summary that anybody trying to summarize this would give, but is that how you also see it, or do you conceive of those groups a bit differently?
Zvi Mowshowitz: 57:24 I see it a bit differently. There's a lot of similarities between our models. A lot of it is like when you go to Europe and there's 17 different parties, and mostly they all lie on a spectrum. There's just like, here's the pirate party who just believes in ending copyright, has nothing to do with anybody else, and the position on economic redistribution is "I don't care, I'm just giving you my copyright material." Or they might actually care as well, but also be really strong about copyright material. When I look at these groups, I think we start with peak accelerationists. I think, they call them e/acc. So I think these people are very strongly in favor of just full speed ahead, and then the question is, what's going on there? Why do these people have that belief? I mean, I'm effectively e/acc in most places, the same way that people talk about, you know, the famous "You're an atheist for every god but one. I just don't make an exception." And so they're just not making the exception. They don't understand what makes AI different from all other technologies and why this is something we shouldn't do, whereas everything else that had this kind of promise, we would do. Traditionally people say, AI and bio risk, like creating bioengineered plagues and other risky biological research of particular types and trying to create artificial general intelligence. Well, you might not want to do those two, but you want to do pretty much everything else. You want your fusion power, you want your new medicines, you want all your new techs, you want to build all the things, you want to...
Nathan Labenz: 59:02 You want our VR holidays.
Zvi Mowshowitz: 59:04 Yeah, you want to enter space, etcetera, etcetera. Do all the things, except for this particular thing because they have concerns. And they're like, no, everyone always has reasons to suspect any given tech. There's always talk of the difficult jobs. There's always talk of "this will destroy the nature of the family" or "this will destroy our way of life" or "humans will somehow be killed by this horrible new thing." We've heard it all a thousand times before. And I think some of that is in fact a legitimate way of looking at the problem from one direction, like a way of grasping the elephant and a thing to be aware of when thinking about the problem. But basically what's going on is people either don't understand why this is different or choose not to understand or don't want to understand why it is different, or just have some sort of weird, like they generalize so thoroughly as just blinding optimism, or they see no alternative, and so they're presenting a front where they think we should do it anyway. So like Roon on Twitter is, I think, a very good accelerationist thinker in the sense that he fully acknowledges that it's very easy for us to get ourselves killed and there's tremendous danger. But he advocates eyes open, do it anyway. Like the spice must flow, as he puts it. There's no choice. There's no plenitude. And I think others are in fact of this rough opinion, but they don't say it out loud. This opinion of, well, if we don't build AGI, then our civilization has cut off all these other avenues for progress so thoroughly that we are doomed. So even if it might kill everyone, or it might cause various other catastrophes, that's just a risk we're going to have to take. These are problems we're going to have to solve. We must boldly go and make it work out. Boldly going is never safe. Technology is just the least safe technology in the universe.
Nathan Labenz: 1:01:06 Yeah, we're proof of that.
Nathan Labenz: 1:01:06 Yeah, we're proof of that.
Zvi Mowshowitz: 1:01:08 Yeah, the most dangerous thing that's ever happened. Why should we think that creating more of it that isn't us isn't the same thing to do? That's crazy. What are you thinking? But also, one could argue it would be crazy not to do it. There's so much available promise, so much value, so much wonder, so little alternative. If the future, go do it. Or they just think that it's inevitable. I think some people do generally buy the China argument of better us, or various versions that we have not needed than somebody else. And that somebody else will be less responsible with it or have worse values or both. And so there's just no use in holding back.
And I think there are various people in this group of various levels of being genuine about it, of understanding or not understanding the danger, of internally facing down what actually is going on. And I try my best to give people the benefit of the doubt. To treat as many people as possible as actually understanding things as well as possible while also being in as good faith as possible and wanting good things. And I do think most accelerationists definitely want a good future for the human race and want a good future in terms of what they see as utility. And I will stubbornly cling to this as much as possible.
I think there are a handful, there's kind of a fixed category that's beyond e/acc, which could be called the extinctionists. They're the machinists. They're the people who in the three-body problem say this world belongs to Trisolaris, except for AI. They think that AI is the future. They call it speciesist if you say anything otherwise. They think the AIs have value. They'll be more patient. They'll have more complexity. They'll think more interesting thoughts. They're what matters, and we might die, and that's okay. And to them, I disagree. I just completely disagree. We have the right to have preferences and to not look. Some of these people think you're not allowed to have preferences. Objectively, in some sense, AI is better. So you're not allowed to say, I'm human and I have children, I'm going to have grandchildren, I want that to continue. Because that's, how do you justify that? I'm like, I don't have to. You could have wrong arguments about that, but I don't have to. And so there are good accelerationists and there are less good accelerationists.
And the accelerationists I respect most acknowledge that there is a problem, advocate working hard on solving that problem, but think that slowing down doesn't help you solve the problem, introduces more other problems, or it's just impractical and just think, well, we might as well be full speed ahead. So you can divide the accelerationists into the ones who are useful, good faith accelerationists who want to solve our problems together, and the ones who think that the way we solve all our problems is we just keep going forward without worrying about our problems and our problems never materialize, as opposed to every problem in human history. We've had tons of other problems in human history. We solved them by actually solving them. We solved them by realizing there's a problem, that if we don't do anything, it's going to be very bad. And then figuring out what to do about it. So these problems have been very hard. Some of these problems have been like, well, it's easy, but if you literally just treat it like it doesn't exist, it's going to be a real problem.
So then, the safety hype manufacturers, they don't exist. I'm going to flat out deny the existence of the safety hype. I think that there is no motivation for this. It does not actually make sense. Doesn't benefit anybody, and the people who say that they're talking their book, I think it's a combination of they're talking their book instead. They're trying to explain why the opposite outcome should happen, so they attribute bad motives to the other players because they have those motives themselves. Or they're just so cynical and so stuck in a mindset where everybody's always talking their book, even if they aren't. But they can't imagine the idea that anybody could possibly say anything that wasn't profit motivated.
There was a post today, in fact, for example, on Pirate Wires, it was like, well, everyone is just talking their book, from all points of view. Everybody is just saying whatever their incentives say to say. Because I think this person just can't imagine that there are people on many sides of this who are saying what they believe, who are advocating for what they think is good for the world, which would also be good for them. Like, I don't get to say, well, I think the world is going to die, but in the meantime, AI companies will be really good. So I should support accelerationism. I also don't get to say, well, I think it's all going to be fine and create a paradise, but I can get more subscribers if I say we're all doomed, so I'm going to sacrifice the future of humanity for readership. How would that make me better off? If the world's going to be wonderful, I want that wonderful world to live in far more than I care about anything else. I think it's all just silly. If you want to stop this amazing technology, you want to stop it because you're afraid of it. You want to stop it because you think it's a big deal. It's a problem.
I don't think that means that everybody who comes to a doomer position is doing so for logically great reasons. Like any other position, a lot of people come to that conclusion because they had a vibe or the people around them were worried or they've hallucinated or come up with a different problem that's actually kind of solvable or they're imagining a highly unlikely scenario as almost inevitable, just like every other point of view, they do exist.
So I think then jumping ahead, I'll get to the middle afterwards, you have the doomers as such, but I think this is a wide variety. So you sort of enter this space, not when you say there's an X percent chance. There are people who are described as doomers, like Paul Christiano, for example, who then you ask them what's their chance of doom, they're like 10%, 20%, something reasonably low. But they talk like doomers because they are saying, well, if we don't treat this problem as an impossibly hard problem, if we try to play work, we're still playing Guitar Hero on medium, we will definitely utterly die unless we get super lucky. Like this would be very bad. But they think, well, with an extraordinary effort, we can achieve the seemingly impossible. We can make this happen. But we understand that we need to shout from the rooftops just how dangerous the situation is to get that to happen so that we can then solve our problems. Same way that if I didn't think that we were capable at all of coming together and making these extraordinary efforts, I wouldn't be at 15%, I'd be much higher. A lot of that is we will in fact do things that are very impressive. In some sense, will make an extraordinary effort. Will figure things out that I don't know how to figure out and we will solve our problems.
And then there are people who actually think, Zvi Mowshowitz, you think that probability of doom is more than 90%, even more than 99%, and we need to do whatever gives us any chance at all of success, or alternatively, the problems are so incredibly hard that if you don't have a mindset that we are super doomed, none of the things you do will be useful. If you can't solve our problem by treating it as a normal problem, you have to treat it as an impossibly hard problem, have any hope of your solution actually moving us towards the real solution. And if you try to solve an easy problem, you're just not helping or maybe even hurting.
So then there are people who talk about AI ethics. I think that's a very strange middle ground where I think some people are like, well, there are problems with AI because I'm against AI, and I'm going to take every problem I can find with AI, or who generally just see disaster on every turn because they're anti-new technology, they're anti-growth, they're anti just intelligence in general in some sense often, and they just see the downside of everything. And so they tend to emphasize these ethical or mundane concerns over the doomer concerns, but they would endorse the doomer concerns probably as well if they thought it was useful in stopping the technology. They just don't see it as especially useful in stopping the technology, or they don't see it as directly addressing the particular harm that they worry about as directly, so they don't pay as much attention.
And then there are people who see the short-term harms as real and see the longer-term harms as irrelevant. And it's not clear to me, again, if they even are coherent, are you thinking about the long-term harms? I think a lot of them, if you ask them, is there in fact a danger of potential risk? They'll respond with, you know, no answer. Like just no opinion whatsoever. Just we can't think about that right now, that's a distraction, is their real answer. They actually haven't thought about it. And it's unclear if they don't care. Like we'll cross that bridge when we come to it, if we come to it, for now. Like I really think there are some people who really do think that that's what civilization should dedicate itself to, is fighting bias, fighting misinformation, all of these little micro causes compared to the existential map. And one can argue whether or not the thing they think, how important that is, but I think it's a lot less important than whether humans live or die at all. And I'm not sure they agree. I think some of these people think, well, if we can't solve these problems, we don't deserve to live, or something like that. I don't know. There are lots of different people all over the map. We're trying to characterize them.
But I think of the AI ethics people as the people who put what they call ethics above, what actually matters to me. Above survival of humanity, generally getting good outcomes for people, generally ensuring there'll be things of value throughout the universe. These are things that I care about a lot. And I also care about most of their concerns. I think those are important concerns. But they see my concerns as a distraction from their concerns. I see their concerns as potentially a distraction from my concerns to the extent that they use that as a reason to dismiss my concerns. But we don't have to do that. We can say we have to solve both these problems. And in fact, these problems have complementary solutions. Any solution that helps solve my problem, that helps us keep humanity alive, will help fight bias. Any solution that helps us get the AI to do what we want them to do, that helps keep them under our control, will prevent the AI from spreading misinformation, prevent them from manipulating their users. It will help with all of that.
And when they do the work on their thing, it's at least not harmful. I welcome that work. I welcome all of that work. But when you do things like argue the distraction thing, that's when we get into a fight. We get into a fight because you choose to make it one. You choose to think of this as a zero-sum game, as opposed to we all benefit when we do all of this work, is the way I look at it. And if you have a proposal for mitigating near-term harms, I will compare the costs and benefits and I will hopefully support them if it's a good idea, but it can't be our focus. And if we only have so much room to pay attention to these things that we have to prioritize, and I know where my prioritization is, and I hope that I, that will also help with other things, that's what we have to do.
And so, in general, I would say you have the accelerationists, you have the people who are, I call them the worried, people who are worried about AI killing everyone because doomers is kind of pejorative and we don't want to, I don't call you e/accs. Why are you calling me doomers? But yeah, I'm going to own it on occasion. We have people who are worried, which makes it clear that you don't have to be so worried to be worried. 10% is worried enough. People who think we need to take this kind of action that won't just happen on its own. We have the people who actively strive to distract more, to complain about other AI risks and people who like. And then I would say the AI safety hype manufacturers, they aren't real. Instead, you have a pretty bad faith faction of the ethics people and a bad faith faction of the accelerationist people who are claiming this to try and win an argument in ways that don't really make sense.
Nathan Labenz: 1:13:34 Cool. I think that's pretty much all in line with my perception as well. A couple of just point-level follow-up questions on a few of these camps. On the accelerationist front, is there any decent argument? What's the most compelling thing you've heard that would give somebody reason to believe actually, yeah, there is, even if you don't think it's the majority case, there's some decent chance that this could just go really well and actually turn out to be easy and never really was anything to be worried about in the first place? Nathan Labenz: 1:13:34 Cool. I think that's pretty much all in line with my perception as well. A couple of just point level follow-up questions on a few of these camps. On the accelerationist front, is there any decent argument? What's the most compelling thing you've heard that would give somebody reason to believe that, actually, yeah, even if you don't think it's the majority case, there's some decent chance that this could just go really well and actually turn out to be easy and never really was anything to be worried about in the first place?
Zvi Mowshowitz: 1:14:05 Well, give me the hard question, right? It's not just that there's an easy hard question, which is what's the best argument for why we should go boldly, right? Which is it won't be any better if we wait. Basically, there are arguments that by waiting you create what's called overhangs, where we have the ability to do something much more powerful, we choose not to, and we encourage bad actors to be the ones who are first movers because the good actors restrain themselves. And we risk there being too many people who have capabilities in question. And sooner or later, someone's going to do the thing we're worried about, and the only way to stop an irresponsible person with an AGI or creating an AGI is create the responsible AGI first. Or that only by getting close to AGI can we possibly be able to do the proper research that will let us figure out how to make it safe. And so the time we spend putting on our thumbs now just gives us less time later. We should instead accelerate right to the edge because we can judge where the edge is, which I don't think we can, but we can much better judge where the edge is. And then we can take our time, then we can put it on our Manhattan Project, huge amount of effort, whatever you want to call it, then we can solve our problems and ensure a good outcome. I think these are not facetious arguments. I think these are reasonable arguments. I think they're overcome. I think they're not what carries the day, but I understand why someone would advocate for them. The argument, well, why should we think that everything is just going to be easy? People make various arguments of this type. A lot of them are content free, right? Andreessen literally said, well, the good guys always win, right? And we're the good guys is implied, right? So win, which means the AI must be aligned. Like it's that simple. And obviously that's kind of nonsense. But you can say things like, a lot of people say, well, we have the coons, or some other people that often say things like, well, you know, if it was sufficiently intelligent, it would be smart enough to realize that its goal was stupid or that morality is correct and that it would therefore do good things because it was smart. Scott Aaronson has expressed views like this, for example, I believe. I've seen it a number of places. I think it's a bad argument. I think there is no reason to think that a more intelligent thing would be inherently moral or would deliberately and consciously reject its goals in favor of goals that we like. I think that what we'd be hoping for is something like, it turns out that if you kind of vibe on Ellen's style on the things that are good, and you enshrine a bunch of messy principles that don't really have logical coherence, that's kind of how humans act normal and nice to humans, right, one could argue. And then they sort of generalize this thing and they realize they're supposed to be in a consistent, reflexive way, friendly and positive and cooperative and care about other people. Then they actually internalize that conduct the best way. Having positive incremental goals is the best way to get positive final outcomes that will be approved of, that will allow them to achieve their goals in general. Therefore, you would have these humans that are pretty friendly in ways that correlate with intelligence, or you could argue correlate with intelligence pretty well, but you have something even smarter than us. You can think harder for longer, process more data, get wiser, because it again has more experience, can process more data. This thing will then converge on some sort of place that humans would converge on if they had more thought in terms of what is good in life, what should be done, or possibly a good and positive answer, as long as we give it nudges in the right direction. We don't have to be very specific, but we'll just sort of, it's an attractor state, right? The good outcome is an attractor state. That we don't have to hit a point of measure zero on a giant graph of exactly the thing that does the thing that we want. We don't have to specify the goal. We don't have to understand what we want either. You just let the AI figure it out. The AI will be vaguely well meaning and the AI will cooperate and then the dynamics amongst many of these AIs that are positive or the way created a positive will lead to a good outcome. And so all we have to do is not intentionally make this terrible. Obviously, you create chaos GPT and you set it loose, bad things. Most people would agree with that, but no one would be so stupid as to. So you just do the ordinary best thing which is some combination of RLHF and constitutional AI and basic ingraining of good principles and then you do various tests. And then when it starts to exhibit bad behaviors, you correct it. And no, it won't suddenly, it won't then learn how to do this via deception. It won't learn how to pretend it's thinking what it's not thinking. It won't secretly record data in places you're not looking so that it can survive various iterations. And it won't pass things on to the successor states and do all these tricky things. It won't figure out things we never thought about but do these things. We just do the obvious basic thing because that's what most intelligences that we've ever known have basically done. And that's my steel man case for why this is relatively easy is that, yeah, it's kind of what naturally happens. Just it'll just sort of happen, would be the optimistic case. And to be clear, no, right? That's not how this works. That's not how any of this works. I don't think it's completely impossible.
Nathan Labenz: 1:19:49 Yeah, I was going to say, large language models do seem a heck of a lot closer to that than I would have expected five plus years ago. Right? If you just ask me, okay, we make some AI and it's an AI that's breaking through and it's becoming really useful and people, all attention is turning to it. And you don't tell me anything about the nature of that AI, how it was trained or anything. And then you said, how much do you think it will kind of understand or be able to kind of act in accordance with human values? I think we're way on the high end of my expectation with LLMs, which is not to say that I disagree with you, I don't think, in the big picture, but just that holy moly. There's some sense in which we're very lucky, I think, with the nature of LLMs. Right? A much more kind of AlphaGo type lineage probably doesn't produce anything along the lines of what we do seem to kind of learn in some abstract sense just from all the language.
Zvi Mowshowitz: 1:20:52 So I think of this as we may have been lucky and we may have been unlucky, and it's really hard to tell which until crunch time, in the sense that you can tell a lucky story where these things are much better at grokking more or less what we mean and more or less what we care about in these kind of fuzzy, messy ways than we would have expected. They're not going to be ruthless optimizers because it's just not the nature of the mind that we've created in some broad sense. And so, sure, we're getting human-ish things, maybe at the end of the day in ways, and then if we scale them up, they'll still exhibit these characteristics and then we're in a much better state. However, the flip side of that is that you get these things that are impossible to predict or understand, that are very squishy, they're very inscrutable, and that are not being particularly coherent in many ways, and that if they go off the rails, it'd be very hard to prove or detect they went off the rails in a meaningful way. They're going to misunderstand us and what we actually care about, importantly, and we have no hope of actually injecting the exact thing if we figure out what it is, because they just can't take exact things at all in that sense. And then maybe that's just not good enough, right? Maybe even the best version of that just doesn't get us what we want and will start to break down pretty fast. One thing is to note that there are often these very rapid transformations where the best solution to a type of situation or problem, the very nature of it changes and you're best served by throwing out all your previous work and then putting in a lot more work on this more intelligent, more work level and do a better job. And so, for example, deception or telling people what they want to hear, having a direct, how are you going to respond when you see this, by directly modeling exactly how the micro biases work is one of those things, right? And there are many things like that. I've been trying to write out this argument in more detail the last few weeks and try to get it right. But essentially, right now, with its current level of understanding, just trying to sort of messily represent human preferences and reflect them is the best AI can possibly do with its current level of capabilities, right? Its deception wouldn't work, but it doesn't really understand what we care about very well. Whereas if it understood very well, it's going to tell you what you want to hear. Absolutely, it will tell you what you want to hear. There's no particular truth optimization going on here, except insofar as we punish it for finding out that it's lying. And if you had an AlphaGo style problem where it's just optimizing, if you knew exactly what you wanted, you optimized for exactly the right thing, maybe you could win, right? Outright win, get something great that would then do the things you wanted. So it's kind of, if you get a fruit or if you get a very precisely defined optimization target, and you hit it correctly, you win, and if you miss, you lose. And there's not that much middle ground and there's not much room for you to correct an error, right? The extent that you made a mistake, that's the mistake. And if that means you still get some value out of the world, you get some value out of the world, you can't get back the part you lost, right? What the loss is lost, debt is debt. Whereas, with these messy LLMs, you don't lose that way. You don't have this really scary thing where it suddenly decides to perform this really rigid optimization that you would never approve of. But you also don't get the good thing here. You don't get this precise ensuring that things go well. You just have this kind of messy situation. And this leads back to my concern over competitive dynamics, where if you unleash a bunch of messy human-ish things that have messy human-ish alignment, but they're also better optimizers than us, they're more capable, and they're more efficient, and they have all these advantages. Yeah, they don't immediately turn around and kill us on purpose or anything, but they don't actually do us that much good, necessarily. That's not a sufficient condition for this to turn out well. And that's assuming that our current methods don't break down. So if you tell the story that I told, which is actually, you give them a bunch of feedback as to vaguely what we want and they kind of generate some processes that kind of internalize the type of things that we want and that generalize and they turn into a genuinely about as aligned to us type of aligned thing, well, if you suddenly created a bunch of things that were able to quickly outcompute us with numerous advantages that had about our morality, you wouldn't survive very long. Certainly, you wouldn't ever meet your grandchildren. But this idea that there is much hope in that level of approach, I haven't really understood why, right? Even if it succeeds, and also I just don't think it happens automatically, right? I think that what happens is the AI suddenly does a phase change thing, right? Where when it goes from 10 to the x to 10 to the x plus 1 or 10 to x plus 2 cycles, it suddenly realizes that it can do a completely different thing to get the outcome it wants that it's being rewarded for, and does that other thing instead. And then we find out that all the things we did before are suddenly useless. I think that's by far the more likely outcome.
Nathan Labenz: 1:26:34 Yeah. Grokking as a phenomenon, I think, is one of the most underappreciated and under, I mean, people don't even know about it broadly. But that paradigm has shaped my thinking probably more than any other over the last couple years of the moving from memorization to generalization and the unpredictability of that seems like, yeah, just such a critical point. Going back up the stack then, I had one question on the accelerationists. But the AI ethics and safety, it seems like, from my perspective, probably really dumb that these camps are so at odds with each other so often. And I would say there's blame on both sides of that, not to blame present company at all. But there is kind of, I'd say both sides kind of say, your concern is a distraction from my concern. And that has kind of, the battle lines have kind of hardened along those lines. If there's any hope that that could change, I mean, we're starting to see a little bit of thaw. And I guess my understanding of why there's been maybe some thaw and could be hope for more working together in the future is maybe that as such focus and attention and resources come to this area in general, maybe there's sort of an omnibus spending kind of thing that everybody can kind of say, yeah. Much like we have in these big tent, big budget bills that get passed. Right? I don't care about what you're doing. You don't necessarily care about what I'm doing, but we're all kind of doing something that this is going to fund, and so that could be good. Maybe that kind of brings camps together into a, hey, what matters is we invest in this. We invest a lot. And if all of a sudden there's going to be huge governmental funds poured into it, then there's kind of enough for everybody to get funded and do their thing. You don't look too pleased with that scenario or hopeful for it.
Zvi Mowshowitz: 1:28:35 I wish the primary intervention that made sense was to pour a bunch of funds at this. That would make things so much easier because then yeah, absolutely, we can all agree, 100 billion here, 100 billion there. I get my money, you get
Nathan Labenz: 1:28:47 your, pretty soon you're talking AI safety.
Nathan Labenz: 1:28:47 Pretty soon you're talking AI safety.
Zvi Mowshowitz: 1:28:50 Well, pretty soon you're talking about money, hopefully. I don't know if you're AI safety. I think it helps, but I don't think this is a case of throw money at a problem and then hide the cost of getting that money back from the taxpayer by just having it be an amorphous spending bill. I think you have to impose actual costly restrictions on what companies can do, on what models can be developed and deployed and used in various ways. And some of those restrictions solve one problem, some of them solve the other problem, and you can require these people to pass certain tests and to go through certain hoops. And again, there's competing for a kind of limited resource here in a much more clear way, and this makes it harder. I do agree that there's no reason for these characters to be distinct. To the extent they are, there's no reason for them to be at each other's throats. I think a lot of the reason why the AI doomer side of this is so against the AI ethics people is, one, they know the AI ethics people are going to try and completely sideline the newer people. They're going to try and ignore all these concerns, get everybody to ignore all these concerns. They see that in advance, right? It's the, well, Alphaville is going to attack Betatown, and Betatown doesn't like Alphaville, right? That's just how it is because you see the conflict coming. You're fighting over limited resources. And I'd be fine with it. But part of it is their arguments are incoherent because they imply our problem, and then they just deny it, right, to a large extent. We see this and we get really frustrated. It's like, well, of course, if these people were like, well, the AI will never be able to fool people enough to take over the world. Misinformation from AI generated content next year is a serious problem in our election. It's like, really? You can have either of these positions and it's fine and it makes some sense. It's really hard to have both in some important sense, right? Like you think that these much less capable systems that we fully understand and that we can manage in real time recovering from our mistakes are these dire threats that require these responses, and yet you can't understand why scaling them up a lot will present that same threat. And that's just, people like, at one point, someone was like, it's like being on a bus and the bus is about to go over a cliff and someone worries you're going to be late for work. Well, yes, you're going to be late for work. But it's very hard to reach that conclusion without noticing the bus is about to go over the cliff, right? And if you know the bus is going to go over the cliff, maybe you should have other problems that you should worry about more. But at the same time, once we steer this bus away from the ledge, we have to get you to work on time. It's important. It matters. And I do think that's an example of a metaphor that's too dismissive and pisses people off for no reason. And we have to move on to things that are more positive, right? I don't want that to be the go-to. And so I cited it as an example of both how it's easy to interpret the problem from the side of someone worried and how it's easy to be somewhat unfair to those people who are on the other side at the same time. But there really is this inherent logical contradiction going on a lot of the time, where they just don't want to see it because they can't imagine it. They can imagine things in front of their face, not the things in the future. And also, we've seen this time again, where these types of concerns hijack the political process, the regulatory process. And then, we used to call our concern AI safety. And these people came in, they just started using AI safety to mean don't say bad words. And now we have to find new words for what we're saying. And they pretend they're us. They pretend that they're doing the thing that we're doing, and they're just not. And it doesn't mean their thing is bad when their thing is not necessary, which is incredibly frustrating. And so, yeah, we end up at each other's throats, and I wish we didn't. As I said, I think there's room for us to all cooperate on this, right? Because we all want the same things. But also, I just think that they're objectively wrong about the near-term concerns, like the AI ethics people. And also that they're wrong about their preferences in many cases, right? And so we have this disagreement over what is specifically going to happen. If you unleash this AI generation, are people going to be fooled by these deepfakes? Are they going to think that their girlfriend cheated on them because they saw an AI generated picture? Are they going to think that Donald Trump was in fact killing Godzilla and that Joe Biden was impaling innocent young virgins or something? Like, I don't know. And then they'll be fooled and they'll vote for Trump and it'll all be terrible. I'm like, no. It's not going to happen. We're going to be able to deal with it, is my opinion. And that's one disagreement. Another disagreement is just they think that certain things are really, really important that I don't think are as important relative to the potential utility gains that are involved. We have this ongoing disagreement in general about progress and about technology and about growth and about flourishing humanity, many of us, right? And the ethicists, people who call themselves ethicists these days, right? Medical ethicists are like this, bioethicists are like this, and so on. Their attitude is harm is what matters. Right? The specific harms that I am worried about trump your, they call it greed or desire or wants, but they trump the good that can be done. I think it's just not true, right? I think these harms should be considered, they should be mitigated, we should do our best to avoid them. But, when we don't, if they were the people who don't do challenge trials for COVID vaccines, right? We could have saved so many people if we went to a few people who would have gladly volunteered. We didn't do it because the ethicists told us no. And the ethicists are constantly, in the medical context, telling us no in a way that will save lives, where everybody involved in an agreement is consenting. They don't care. They think it's not okay. And this is usually less egregious than that, but it's the same idea, right? They think that there are deontological rules that matter more than utility involved, right? Or than other rules and other principles. And I think they have the wrong rules. They just follow a set of references that's very different.
Nathan Labenz: 1:35:46 I'm sure you have a reconciliation for this, but I think one place where people might be a little confused about your views is, obviously, in that last moment, you sound quite consequentialist, utilitarian, whatever. Earlier, when talking about sort of AI descendants as a path and how much value you would ascribe to something like that, it sounded like you were not willing to go so far on utilitarian thinking as to say, well, if the robots have a lot of utility, then that it's good enough for me. So I wonder, could you kind of unpack those underlying principles a little bit more? Is it a certain kind of utility that you care about the most, or how do you think about that?
Zvi Mowshowitz: 1:36:28 So I'm a virtue ethicist, first of all. I'm just flat out. If I have to pick one of the big three to describe how I think. I do think that if you can't express your view in all three at the same time, your view is ill conceived and you haven't actually figured out what you want. So I should be able to define a utility function. It's just going to be really complex. And with the human amount of data and compute, virtue ethics is the correct way to express what I want. Similarly, deontological rules are very, very useful, again, because we have limited compute, limited data, and it's very useful for ourselves. And these calculations become intensely complicated and involve lots of terms we just don't know. We have radical uncertainty about many aspects of the universe and we lack the data to do the calculation properly. And the utility of the deontological implementation is higher, right? So you should use that. But the reconciliation is very simple. I care about myself and my descendants and my fellow humans, and I don't care on the same level about these guys. And it doesn't mean that I think they could have no value, but I don't think you can just do math. I noticed that you can't just do that. Certainly, there are people I know who would say things like, well, a chicken has this much value. Each worm has this much value. If you simulate this many worms on this many GPUs, it might generate this much value. You have them all have orgasms all the time and it would have this much value. And I'm like, no, you've obviously done something wrong. You're taking your metric and you're treating it like it's real. When it was supposed to be an approximation, a way of figuring out how to handle situations, and you're taking it out of distribution, right? This is exactly what we're worried the AI would do in many cases, right? We teach it what is good by giving examples in a distribution. And it learns in that distribution how to optimize for the right outcomes. And then we take it out of that distribution by either treating it with problems that weren't in the training set or aren't reflected in the training set, or just giving it capabilities that anyone in these situations that we were modeling had at the time, and that inherently makes every situation radically different, or a number of other things. And then we say, okay, what happens now? And the answer is you got something weird. You got something that doesn't make any sense reasonably often, or that we wouldn't prefer because the data that you have doesn't actually tell you the answer, right? Utilitarianism is a good way of correcting for the fact that you have to do math, and two of something is more important than one of something, and 20 of something is actually sometimes 10 times as important as two, but not always. And it's not that simple. And also, human preferences are not inherently very coherent or very consistent. You can try to make them as consistent and coherent as possible. You'll figure out what you actually prefer on reflection. But if you just say, okay, you take a system with a contradiction, right? It's well known. If you take any system of math, you start with a contradiction, you can prove anything you want. Right? So if you're a utilitarian and you write down an inexact equation as an approximation and you start manipulating those inexact equations a lot, you're going to get nonsense results if you're out of distribution with enough steps. And one of those nonsense results is like, well, maybe we should simulate a bunch of worms having orgasms. This is dumb. We all know this is dumb. And I can accept that if you build a bunch of AIs and they wipe us out, that universe, I think most of those universes, the majority of them are pretty close to zero utility. I evaluate them as the AI is not good, the AI is not bad, doesn't love me, doesn't hate me, I don't love or hate it. It's just using my atoms for something else. And that's probably unfortunate, but whatever it is I care about is not its default thing. I don't really care about something that's optimizing for a relatively simple metric. I don't really care about something that's optimizing for many of the things that might be somewhat technically complex, but which aren't particularly interesting. I mean, it's like effectively pure power seeking is not very interesting to me, is not very valuable to me. And also, I really don't think that you have to take this sort of disembodied, non-positional utility perspective and say that's what you have to prefer. I think that's not okay. If we live on our world, the Vulcans live on their world, the Klingons live on their world, we're allowed to prefer that humans survive when the Klingons die, the Klingons live when the humans die, even if under some abstract utility, there's no real difference. We're allowed to care about ourselves. In fact, I would say it is a key part of my philosophy, that you need to care about yourself and those around you more than random other people, or the entire thing falls apart and doesn't work, nobody gets anything. If parents don't care for their children more than they care for other people's children, it does not go well. Right? And so in some sense, you can realize that there's a contradiction and there's a conflict and there's a kind of mistake, but it has to be that way in some important sense. I don't have philosophy in my blog because I know that philosophers who have spent lots and lots of time on this could rip anything I think to shreds. I understand this. But at the same time, I have thought a lot about these things, and I believe strongly that you are allowed to care about yourself and care about things you care about and define your own utility function and what you think are important things that exist. And I've made my choices. And again, these people who call themselves ethicists have made very different choices. And I'm not saying they're not allowed to have those preferences legitimately if they have those preferences. I can say I think they're wrong, in the sense that I actually think they do not in fact prefer the world they're claiming to prefer on reflection to the world that they're claiming to disprefer. And that if they actually reflected and reconciled their positions, they would have to change their minds. But maybe they really do. I don't know. It's a different kind of disagreement, fundamentally.
Nathan Labenz: 1:43:09 Let's imagine all the humans are gone, and then there's some AIs or no AIs. And then within some AIs, you could have unfeeling, no qualia, no subjective experience AIs that are just paperclip maximizers or whatever. And then on the other hand, you might have some AIs that have some sense of feeling and subjective experience, and maybe they can even do some of the things that we could do. Like, they could appreciate some art or whatever. But they're, obviously, going to be still quite different from us, presumably. It does sound like you distinguish between those. Like, you would rather have the art appreciators than the non-art appreciators, and you'd rather have more than fewer. But then at some point where people start to push on, well, what if there was enough of them? Or, what if they, what if there was a trillion trillion of them? Then at some point, you just kind of get off that train and say, well, I'm not going to allow you to push the multiplication problem to that extent because I just feel like that's out of distribution entirely. And so I kind of fall back to a more brute force fact of my own ethics, which is I care about myself. And so you can't push me that far. Fair?
Nathan Labenz: 1:43:09 Let's imagine all the humans are gone, and then there's some AIs or no AIs. Within some AIs, you could have unfeeling, no qualia, no subjective experience AIs that are just paperclip maximizers or whatever. And then on the other hand, you might have some AIs that have some sense of feeling and subjective experience, and maybe they can even do some of the things that we could do. They could appreciate some art or whatever, but they're obviously still quite different from us, presumably. It does sound like you distinguish between those. You would rather have the art appreciators than the non-art appreciators, and you'd rather have more than fewer. But then at some point where people start to push on, well, what if there was enough of them? Or what if there were a trillion trillion of them? Then at some point, you just kind of get off that train and say, well, I'm not going to allow you to push the multiplication problem to that extent because I just feel like that's out of distribution entirely. And so I kind of fall back to a more brute force fact of my own ethics, which is I care about myself. And so you can't push me that far. Fair?
Zvi Mowshowitz: 1:44:23 I certainly think I have a preference order over the nonhuman universes. I am confused about how much I prefer those universes and how much probability of survival I would trade off to get the better universe if I fail versus the worse universe. I think that's unclear stuff. But I do think that, yeah, it's not that multiplication can never do it. It's not that you couldn't convince me with a sufficiently convincing case on reflection that there's a situation in which I prefer certain long distance futures to other long distance futures. But I also have a sense in which it's not my role to prefer. The system doesn't work if people do that. We're supposed to prefer that the things that we care about survive. And there's no reason why, in theory, we couldn't spread those things out to spread throughout the universe in some sense. But I think this is an area where anyone can configure it well, right? I don't think that anybody really knows what does and doesn't have certain types of properties or what these completely out of distribution universes, right, with these completely out of distribution things, what they should think about them, how they should prefer different configurations to other configurations. And this is one of the dangers, that we're potentially setting the configuration of the universe into motion relatively soon with no idea what we actually prefer and what matters to us for real at this scale. And that's really scary because if we don't know how we reach an AI, we don't even know... I don't just mean that we haven't figured out what we think. We just legitimately are deeply confused, and our feedbacks are not going to reflect anything that we want here. And then we're going to sort of iterate on that somehow with some sort of system. It's a giant disaster waiting to happen. I wish we, if I had a long reflection, I would take it, right? If I had the opportunity to suddenly not age and live for 100,000 years and have perfect memories and talk about this stuff for a long time and just do philosophy. That sounds great, but we don't got it, right? We have to solve all our problems first before we get there. It's so important to us before we can do that. And so we have to have these squishy messy views on how to do these things and huddle through the best we can. But yeah, I notice that I prefer not to die and I prefer my children not to die and I prefer them to have children of their own and I prefer that the human race continues in general. And I think I am very much permitted to that preference, and it is good that I have that preference. You're welcome to try and talk me out of it, but I don't expect anyone to succeed.
Nathan Labenz: 1:47:16 Yeah. I won't attempt that here. Certainly, I'm not the best person to try as I think I'm again, mostly very similar in my position, including even just, I would say, probably weighting virtue ethics more heavily in my own thinking than a pure utilitarian consciousness, whatever. Popping it back up then. I was asking my point level questions on these different camps. And we're getting to the doomers, aka the worried. I think people who are coming to this relatively recently and who are encountering this line of thought relatively recently, first of all, have no sense for just how long it has been around and kind of how deep it has run. I wonder if you could kind of give a little bit of a short story from your perspective of what the AI safety evolution has been over the last 15 years since you kind of first got interested in it. And I think it's especially interesting to help people understand how much kind of care and I sometimes call it AI safety respectability politics has gone into just trying to be, and not always succeeding, but trying to be the sort of people who can be trusted or can be respected or who don't seem crazy on other dimensions. So I'd love to just kind of hear that story from your perspective a little bit.
Nathan Labenz: 1:47:16 Yeah. I won't attempt that here. Certainly, I'm not the best person to try as I think I again mostly have a very similar position, including even just weighting virtue ethics more heavily in my own thinking than a pure utilitarian calculus. Popping it back up then. I was asking my point level questions on these different camps, and we're getting to the doomers, aka the worried. I think people who are coming to this relatively recently and who are encountering this line of thought relatively recently, first of all, have no sense for just how long it has been around and how deep it has run. I wonder if you could give a little bit of a short story from your perspective of what the AI safety evolution has been over the last 15 years since you first got interested in it. And I think it's especially interesting to help people understand how much care, and I sometimes call it AI safety respectability politics, has gone into just trying to be, and not always succeeding, but trying to be the sort of people who can be trusted or can be respected or who don't seem crazy on other dimensions. So I'd love to just hear that story from your perspective a little bit.
Zvi Mowshowitz: 1:48:36 From my perspective, I hadn't given AI any thought for a while. And then I read this series of blog posts by Eliezer Yudkowsky and Robin Hanson, and they were engaging in this debate over what would it look like in our distant future when there was artificial intelligence. And as these arguments sunk in, as I thought about the implications, I realized that if this technology were to exist, everything would change and that Eliezer was basically correct, that by default, if you built something smarter than a human, then it would be able to build something smarter than itself still because humans were able to build that thing, and it would be able to scale up its resources and capabilities pretty quickly, again, by default, right, not automatically, not necessarily automatically, but by default. And if you didn't have this thing able to preserve some property of what it was optimizing for in a known provable way, that thing would change in a weird direction and also was remarkably hard to specify what we actually wanted anyway, and that this thing would take control of the future. We didn't use these terms that we use now. It's almost hard to remember exactly what terms I was using back then, but it was a long time ago, 15 years or something like that before now. And at the time, people like Eliezer and MIRI were trying to solve this problem by creating mathematical systems and stringing together various different problems together that would allow a system, while it was self-improving, to preserve provably its utility function, such that whatever you put into it as the thing it was trying to accomplish, it would still accomplish that thing when it was done. Because we didn't think it was an LLM. We went with GOFAI, good old fashioned AI. It was basically a logical enterprise. And if you could preserve that, well, that was not an easy problem. And then you had to actually make it so that preserving it meant you had a thing that was worth preserving that would then lead to a good outcome as well. And so, at one point, there was a meeting that I was a part of where I was with a bunch of people from what was then called Singularity Institute, now it's called MIRI, and we outlined, okay, here are the seven things that you need to solve. Solve these seven things, then you can solve the alignment problem on GOFAI, and also you've kind of built the AI as well, like incidentally along the way by doing so, certainly, or you're pretty close and you have to understand a lot of the principles involved in that. And then I was told at the end of the meeting, and so before I was told this other thing, forget where they are. Like, it's an infohazard for you to know what they are because you might tell somebody else, you might act in a way that reflects what they are and we don't want this stuff getting out because it helps to build the AI. I actually did forget. I remember one of the seven things, which is a pretty non-dangerous one to remember, but forgot the others. And throughout the years, they did really hard work to try and figure out how to make progress on this type of agenda, this agent foundations trying to make agents that could act predictably, act in ways that you could prove things about, rely upon, would work under iteration and improvement. And this was deep, hard mathematics, right? Hard enough math that when I asked if I should work on it, I was told no. And they worked on this, it was a very small number of people, and this effort basically didn't make much progress. We got a number of things written, we figured some things out, some very interesting advanced things, but we never got close to a solution. And then, over time, as we better understood all the ways we couldn't align an AI, all the things that made the problem harder, we didn't make progress in that sense. We discovered all these different reasons why an AI was really hard to align, and we explained them really well, we developed a better decision theory than anybody else, which is one of the reasons why, and all this time, people often said, why don't you go into academia? Why don't you, not me specifically, but like, Eliezer, or people like that, why don't you go get PhDs? Why don't you write formal papers? Why don't you submit them for review? And the answer to this was that these people are not inclined to listen to and understand and fairly judge and make arguments, that the field evolves too slowly in this type of way because it takes decades to have any influence and by the time you do it's too late, like they're obsolete because it'll be treated as theoretical, it'll be treated as like you don't have any evidence, we're not listening. And the one place where they attached the theory was in decision theory. Decision theory is an academic discipline which asks, okay, how do we decide what's the best thing to do? What's the best decision theory? And academic decision theory basically has two decision theories. They have evidential decision theory and they have causal decision theory. Causal decision theory is the common sense decision theory you have, which is calculate what's the expected value of doing X, what's the expected value of doing Y, if X is higher, do X, if Y is higher, do Y. That makes perfect sense. It's a sensible thing to try. The problem with this is it's vulnerable. It has no way to commit to things, for example, without making physical punishment. It has to make credible precommitments or commit to anything. Otherwise it will just defect when it's time to go to the room and either cooperate with the police or not because that benefits it directly. So CDT agents have this set of weaknesses. They won't pay Parfit's hitchhiker, they won't cooperate in the prisoner's dilemma. They can't coordinate with each other very well. And they also respond to threats. And there are some other problems as well. But basically, they have lots of well-known, very bad flaws. There's also evidential decision theory. Evidential decision theory said, do the thing that will make you happiest to have learned that you just did it. Which is a very strange sounding proposal for how to make decisions. And there's some obvious places where this does really dumb stuff. Where you look at it, what the hell are you doing? It also solves some very weird problems pretty well. And so there's basically a lot of arguments back and forth about this problem is better here, this problem is better there, and how this is obviously wrong and that's obviously wrong. They're both obviously wrong. And so a bunch of people working on the rationalist problem worked out this thing that's called logical decision theory or functional decision theory. We went through various iterations, which is act as if you are choosing the output of the logical process that you are using, which basically means that things that are correlated to your decision, things that are made that if decisions are being made the same way you're making your decision, you should act as if you have to make those decisions as well, even if they were in the past, the future, or separate in space, and obviously you have no physical control over them. You can still do this because that will lead you to make better decisions. It will lead to better outcomes. And so this is just vastly better. Once you understand it, once you work it out. And there have been papers including a 100 page academic length, academically formatted, properly written paper that is very good, addresses the previous literature, addresses the concerns, goes step by step, does all the things, and then just gets ignored. And to me, this is the place where our contribution was very clearly incrementally making progress on the existing scientific literature and providing a clearly superior solution to a problem rather than warning about some weird, esoteric long term danger. And it just got ignored. I guess we could have all pursued a bunch of PhDs and tried to establish credibility and spent half our lives trying to get people to listen, but it makes sense that we didn't in some important sense. We just instead assumed other systems of knowledge. And meanwhile, Eliezer builds up together with people who decided to work with him, his entire rationalist scene on the principle that, okay, I tried to warn people about AI, but they couldn't think well enough to understand my arguments. If they understood my arguments better, this is what he was thinking. I'm not endorsing this line of thought entirely, but he thought, well, okay, we need to teach people to think. And once we teach people to think, they will be able to understand my arguments. He wrote the sequences and generally educated the entire community, figuring out how to think better about things in general, in the hope that we'd think better about this one thing in particular. I think it basically worked to a large extent, but not to the extent that it needed to work in order for us to succeed. And so in some sense it's a success, in some sense it's a failure, but it greatly enhanced my life and the lives of many people around me and many other thinkers. So I'm very grateful for that. But basically, I'm thinking about this AI situation for a while, and I'm thinking to myself, okay, we have to solve this very hard problem, we're making progress on it, but also how close is AI really? No reason to panic, go about your life, it's fine. Then DeepMind happens, and then OpenAI starts to happen. And the origins of OpenAI, you can variously look at Sam Altman, Elon Musk, and the interaction with Demis Hassabis, and concerns about Google, and concerns about what Larry Page said about being speciesist, and other claims about exactly what happened. But basically, this company gets founded by people who are worried about the power of AI, the same way that Demis Hassabis founded DeepMind because he was worried about the power of AI and the dangers of AI because of Eliezer. This is what they say themselves. And so, this entire time, we've been talking about infohazards and holding back what we thought were ideas that might enhance AI capabilities. Well, it turned out the very fact that AI was dangerous was a dangerous idea all along. All of our ideas about capabilities, yeah, it would have been somewhat accelerationist to share all of them right away, but this was so much worse than all of that. As it turns out, it got people excited like, oh, look, this might kill us. I've got to get involved in that. That was really unfortunate, but that's what we saw. And then they started going down this line of these things that we didn't have any ability to control or understand, even in theory, and just run more and more compute at them, and then couldn't stop it. And even had organizations and various people invest in the effort to try and get voice, to try and be able to influence it for the better. Which I think didn't have zero effect, but mostly, I think was a mistake. Very clearly a mistake. We should have been more outspoken about how much we didn't want to go down this road from the beginning. But, you know, it's water under the bridge now. Here we are. And the concerns I've been expressing for a very long time now, like 15 years in various forms, they're not mainstream exactly, but they no longer sound crazy. They're no longer something you can dismiss out of hand. And they now have a lot of very credible people, very much authorities and experts and key players in the industry, endorsing them. And that's a tremendous place to have gotten in some ways. And now, we're in a world where the alignment problem is much more fluid and there are real systems to be worked on where you can make what appears to be incremental progress. And a lot of the question is, does incremental progress do anything? So a lot of the debate, one of the debates that we haven't discussed in the discourse is the extent to which, if you solve how to make the current systems do what you want, if you figure out how to align them for practical purposes now, to what extent are you developing useful techniques and methods that then scale up and then will work in the future when they're dangerous? And to what extent, even if the current techniques don't work, are you learning key information that will then allow you to develop the techniques in the future that will evolve from the current techniques and those techniques? Or is this solving an easy problem that does not actually help you with the hard problem? The hard problem is completely separate. Or something in the middle where it's a completely separate track, but doing the easy track first gives you insight into what the hard track might look like. You learned a thousand ways not to build a light bulb, which helps you build a light bulb. But you can't actually build a light bulb that way. And if you think you do, then you die. But if you know you can't, maybe it's very helpful. There are a number of ways to think about it, a lot of arguments about it, a lot of it's very technical, a lot of it's very detailed, and has a lot of branches, and people think about it in very different ways.
Nathan Labenz: 2:01:53 If I understand you correctly, I think maybe one of the things I maybe see most differently or interpret most differently is to what degree has the Eliezer school of thought won versus lost by shooting itself in the foot by becoming
Zvi Mowshowitz: 2:02:13 Sort Zvi Mowshowitz: 2:02:13 Sort of.
Nathan Labenz: 2:02:13 Accidentally accelerationist? I'm guessing that maybe hinges on a core question around to what degree are AI systems of roughly the capability that we have now kind of inevitable given the data and compute inputs already coming to exist? Because I guess my theory right now is it sure didn't seem like it took all that long from the time we got webscale data and webscale compute to get some pretty good working architectures, namely the transformer and derivatives. And it's hard for me to imagine a counterfactual history where if deep learning kicked off in 2012, and that's 10 years into Google or something, and then 2017 we get the transformer, it's hard for me to imagine a 2023 where we're not pretty far down that path, kind of even if Eliezer had never been born or whatever. And so I guess I look at it as a pretty big win that those that are doing the development at least have a certain healthy fear of what they're creating. Because it seems awfully easy for me to imagine a counterfactual world where the technology continues to progress and the leaders are just totally uninterested in any sort of concerns. They just don't want to hear about it. But it sounds like you see it more the other way, if I understand, more that if there were no Eliezer or Aliasers inspiring these people to get involved, then what would your expectation be? That we just wouldn't have this technology at this point in time or something different?
Zvi Mowshowitz: 2:03:53 I think it's unclear, but the way to think of it is, if there's no DeepMind, there's no OpenAI, there's no Anthropic. If DeepMind didn't have OpenAI, they would be moving quieter and probably slower, and in fundamentally safer ways, right? Like Google is kind of a system that doesn't like to give up and deploy powerful systems to do real things as quickly as possible, as widely as possible. There wouldn't be this orders of magnitude expansion of money. Now, I don't think you keep the lid on this forever, right? As the amounts of money required to do interesting things continuously goes down, at some point, some open source person or some relatively less funded lab makes a GPT-2 style thing at some point, which doesn't cost that much money. People see the potential, they start scaling up, it happens some number of years later, it's my baseline scenario, and then we're off to the races again. You could also argue that some amount of safety concern is inevitable, the same way that some amount of capability is inevitable, because these arguments are hard to avoid once you start seeing things. A lot of people who woke up to the risks of AI in the last year didn't wake up to them because Eliezer said something.
Nathan Labenz: 2:05:08 ChatGPT was enough.
Zvi Mowshowitz: 2:05:11 Douglas Hofstadter and Geoffrey Hinton and a bunch of other people, they didn't know we would get cascades. They knew 30 years ago that if this happened, humanity was in deep danger because it's just obvious to them. It's like you see all the writing back in the 50s about how this is obviously a danger, Alan Turing. But what they did say was this is just not going to happen anytime soon. We're going to succeed. I'm chasing the mailman, but I'm not going to catch him. They didn't think about what would happen if they caught him or they were about to catch him until they were like, oh my God, I might catch him. Now what? Uh-oh. He's actually really bad. I don't like this. Now what? And like, Eliezer gave them a better, you know, it made it easier to think, wow, but like, how much easier? I'm just not clear. Whereas, the key concern is that just because you're thinking that safety is a thing, just because you understand that you have to solve a problem, doesn't mean that you appreciate the problem in a way that allows you to actually try and solve it for real. And this, I think, is where it is. Despite like, if we had Yudkowsky in, if we had full, MIRI style security mindset enabled awareness of the true level of danger that I think we're in, which is less than, yes, it is, right? But if they understood just how impossible the problems are in front of them in terms of game level difficulty, not in terms of can't be beaten wrong. And understood how important it was now that we're entering this realm where potentially useless is not dangerous, to proceed with the proper amount of caution and understood things like it might kill you during the training run, not just after you deploy it. Understood the ways in which you had to structure these systems to minimize the chances that bad things would happen. Understood that your solutions had to be really, really robust and thought out carefully and couldn't just be scaled up kind of vague pushes in the right directions. Like the chance of that working is very low, and it's very hard to test to see if it's going to work without doing something dangerous, and they haven't thought hard about how to do that either, and just generally had a much, much more radical approach to safety, then I would say, yeah, trade was worth it. But I don't think we're seeing that particularly, right? Like I think that Anthropic was the one that claimed they were doing that pretty consciously. But if you look at their investor pitch deck, and you look at what Jack Clark and Dario were saying and not saying, it doesn't reflect that level of paranoia. It doesn't reflect that level of caution. And they may feel that with the competition, they don't have a choice. They're not going to win. But it's still, they're not doing the thing that makes me feel better about it, right? And there's a question of whether or not like someone who's marginally better makes it a three horse race versus a two horse race is making it better or worse, right, even if they are in fact better in that way, and it's not entirely obvious. It's always the question, you know, is a little bit of safety just enough to fool yourself into thinking that you're not going to get yourself killed and getting yourself killed, or is a little bit of safety good? You could also look at it as, if it weren't for our obsession with safety, these things wouldn't be useful, right? Because like any work on alignment is likely to have immediate utilitarian impact on the mundane usefulness of the current models. If we didn't have RLHF operating really well on current GPT, we wouldn't have ChatGPT. We wouldn't have our current explosion of investment. What we have is we have this kind of pretty useless thing that gives not very useful outputs and that will often say things that are pretty racist, right? And then like, what do you do with that? Google's had this problem for years, right? Where they were able to make pretty strong chatbots with pretty strong intelligence, pretty strong generative AIs, but they couldn't align them well enough to make them a consumer product they were willing to release. And so they had inaction, right, in those cycles. And so you could argue that by focusing on these alignment questions, we enabled the capabilities that enabled the explosion, enabled these orders of magnitude of investment. Otherwise, we should be sitting here with nothing. So I think it's all very complicated. I try not to think too hard about the counterfactuals from the past because we can't unwind time. That part is not realistic. We have to move forward from the game boards that we have. And so, you hope that these people are sufficiently aligned, sufficiently concerned, can be made so, can be worked with, and you hope that the actual structure of the problem is such that it can be solved.
Nathan Labenz: 2:10:13 So what do we do now? Many parts to that question, obviously, but do you think that the Overton window is now wide enough open that everybody can just be honest and say what they think? Do you think that things like Eliezer's, you know, we should be willing to call in air strikes on rogue data centers is a good thing to put out there or sort of still too inflammatory and hard for people to hear that it's just like counterproductive in its own way?
Zvi Mowshowitz: 2:10:40 So I don't want to be too dismissive of concerns about that type of statement. But, you know, it was two by four yesterday, right? And one of the things people were celebrating was free speech. One of ways they celebrated it was to point out there was this woman in England that got sentenced to 13 months in prison, think it was, it was just over a year, for being a troll on the internet, right? And like, she just said mean things on the internet on social media. Nothing particularly bad happened. She was just being an asshole on the internet. And like, she apologized for it in court with her lawyer for being an asshole on the internet because it's over a year in prison. So like, for being an asshole on the internet, men with guns came to her house, put handcuffs on her, dragged her to prison and locked her up for a year, right? Like, we enforce all sorts of laws, and all of our laws are enforced at the barrel of a gun. This idea that there is some sort of radical transformation because it's this different thing that we're going to enforce is kind of silly. It's kind of unreasonable. What's going on is that Eliezer is owning up to the implications of what it means to legislate, what it means to prohibit, right? What it means to prohibit something from happening is that you will use force, if necessary. You will start by asking nicely, and you may use economic persuasion, but if necessary, you will, in fact, use men with guns, or if you are facing opposing men with guns, you will use men with tanks or bombers or whatever it takes, if necessary. There's an armed revolt against The United States and you have to bring out the equipment. What happens every time people defy the law and just walk back down and start shooting at us, we escalate because we have to. You need to have laws. You need to enforce things. And so, what we're proposing is law. We're proposing regulation. We're proposing these rules. And these are rules and regulations that are much milder than we would apply to many things that are far less dangerous. Like, this is not an outlier in terms of the degree to which we are impinging on people's freedom of action, certainly not, you know, freedom of speech or freedom of privacy or anything like that. You're talking about restricting a very specific, the very specific uses of a very specific type of very technical hardware manufactured only under the most difficult conditions of anything that man has ever created. It really wouldn't be very difficult to track. We have export controls for all of them already. We're basically prohibiting this thing from going to China as it is. And what we're proposing is that if you want to use that, you try to potentially create a more powerful intelligence optimizer than a human, right, to go into the realm where that's possible, that you need to get permission to do that. And maybe we don't want to grant that permission under any circumstances anytime soon, right? And I think that is hardly out of the Overton window, hardly unreasonable. And yes, it implies that somehow, in theory, if Russia were to smuggle enough A100s or whatever the two systems down the line's successor is into a data center in Russia, and they were thought to be training AI that might kill us all, we would do what you would do if you thought that was an existential threat, and you take it out. But that's not going to happen, right? This is an absurd thing to say because the whole point of what I was trying to say is if you make clear what you are willing to do, you don't have to do it, right? If you make it clear that under no circumstances we tolerate this very specific action, you will take whatever countermeasures are necessary, and they believe you, they don't do it. It would be crazy to do it, right? Why would you go down that path, right? Nobody wants that. Whereas if you don't commit to such actions, then they do do it, right? If you just voluntarily say, oh, I'm not going to build the thing, but you know, if you build it, I'm not going to stop you, it doesn't work, right? So, you know, the hope is that simple economic rules and regulations and trackers and other such influences are able to act on this very, very complex supply chain and this very, very advanced set of systems such that this is not something anybody can just do. And in fact, if you follow the logic of the Chinese can't have these chips, and they can't be allowed to train advanced AI systems, well, then why are we letting anybody in the world train them on our data centers and rent our compute whenever they want to do this? It doesn't make any sense. Why are we encouraging open source models that are powerful enough that China can literally just copy and run? Doesn't make any sense. And yet, at some point, in capabilities, and you have to draw that line, you can't simultaneously think these things are as important as we think they are and not put restrictions on them. Even if you don't think they're existential, there are many other good reasons to try and do this anyway. But certainly, if you think there's an existential risk, you know, everybody understands, I think at this point, the only way to prevent these things from coming into existence that doesn't impede tremendously on our liberty and on our economic value and our economic success, and it might actually work is to target chips, target concentrations of chips, uses of chips, target the training of intense training runs, large compute runs, and frontier models. Exactly where we draw that line, you know, the 10 to the 23 FLOPS, the 10 to the 25 FLOPS, the 10 to the 26 FLOPS, and we try to define exactly what capabilities they're allowed to have. We try to use evaluations to try and see what can we do in a gray zone where it's permitted, but only under certain conditions. Does it make sense to have a gray zone? How do we address this over time as algorithmic improvements occur? These are detailed questions that we need to sort out, that I don't have the best answers to. But I don't see an alternative proposal on the table that could possibly work, right? To not put some limit on how many FLOPS you need in a training run or some similar restriction and to enforce that restriction, absolutely, unless you have specific permission and only grant specific permission under very, very clear, detailed circumstances, is the only way that we have. Any other method of control will be circumvented, right? You could try to say, okay, don't turn them into agents. Can't use AutoGPT. 100 lines of Python code. That's not going to work. You can say, don't use them for this particular application. Okay, those are just words. How do you intend to do that? It's not going to work. There's nothing else that anybody has proposed that has any chance of working. And we are very fortunate in this sense, right? We are very fortunate that the thing that enables AI training runs is in fact the most complex, hard to manufacture thing that has ever been created in the history of the world, and that basically friendly countries that value US economic dominance control the entire supply chain. It did not have to be true, but it is true. And given that, we have a unique opportunity to exert real control over what happens next, that gives us a chance.
Nathan Labenz: 2:18:50 We have to take it. I want to just quickly get your thoughts on any other promising paths to safety. I'm a big fan of mechanistic interpretability work on its own merits, even if it's not a great contributor to safety, although it seems promising to me in its own way. And then also kind of want to get your take on so what do we do from here? Right? If you want to see those kinds of rules put in place, you can hear fairly similar things from the likes of Sam Altman, as you noted earlier. But then what do you do about that? Do you follow Tyler Cowen's advice and start going to the journals? It doesn't sound like you're too bullish on that. Do you start writing more Eliezer style fiction? Go back to the Terminator and try to scare the voters into electing the right people? Do you kind of go lobby at some agency? Nathan Labenz: 2:18:50 We have to take it. I want to just quickly get your thoughts on any other promising paths to safety. I'm a big fan of mechanistic interpretability work on its own merits, even if it's not a great contributor to safety, although it seems promising to me in its own way. And then also kind of want to get your take on so what do we do from here? Right? If you want to see those kinds of rules put in place, you can hear fairly similar things from the likes of a Sam Altman, as you noted earlier. But then what do you do about that? Do you follow Tyler Cowen's advice and start going to the journals? It doesn't sound like you're too bullish on that. Do you start writing more Eliezer style fiction? Go back to the Terminator and try to scare the voters into electing the right people? Do you kind of go lobby at some agency?
Zvi Mowshowitz: 2:19:38 I don't see anything that I love as an alignment approach. I think it makes me super optimistic, but I see a number of things that we can try that would at least find us ways not to align in AI, that would alert us to dangers, that would help us understand the problem at a minimum, and maybe lead to something that would lead to something that would lead to a solution or something like that, and maybe we'll get really lucky when these things will work a lot better than I expect. Interpretability, I think, is one of these dual use technologies that could because interpretability also helps you utilize the AI and helps you make the AI more capable. So it's not entirely safe to work on that, but I do think it's worth working on. I do think it's a good idea. And I do think that a significant chunk of our resources should probably go towards various interpretability effort. So far, it hasn't gone that great. We've made some strides, but we are not very good at understanding GPT-2, using GPT-4. And the way I think of this as, okay, interpretability starts helping you when GPT-5 can understand GPT-6, right, or GPT-5 and a half. The whole goal is for something to understand something smarter than itself because otherwise you have a chicken and egg problem, right? You don't get to trust the thing that's telling you the information. So if I can interpret something that's more complex than me, that's amazing news. That's the target we have to hit. Right now, we haven't hit a vastly easier target, right? And we're not anywhere that close to hitting it, but we can try. And in general, I talk about what is the thing to do, regulatory open window. This has to go hand in hand, obviously, with a ton of observability and other safety work. Because if we don't figure out the solution to the problem, eventually, algorithmic improvements will catch up to us. We don't have infinite time, no matter what we do. We have to solve these problems as best we can. We're going have to unless we take our shot. All we're trying to do is buy time. We understand this. And maybe we'll get super lucky and the algorithmic improvements will, I guess, curve out. And then we'll be able to stay here indefinitely. But even then, every day, we make the conscious choice not to break those limits. And that, again, won't last forever. And also, there are other things to worry about. We have to eventually do this. So we have to solve these problems. In terms of what we individually can do, I definitely think it's a case of some of us should do one thing and some of us should do the other. Some of us should work on safety directly. Some of us should try to solve these problems. Other people should work on lobbying and trying to talk to people who make decisions. Other people should write academic papers. Some people should write blog posts. Other people should try to orient, understand the world, communicate as best they can. Other people should work within the labs. It's not we don't know which of these things is the best approach. I am in fact going to make an effort to get some of the papers written because I don't have that much hope that it's going to work, but we will learn in the attempt, right, if we make this serious attempt. At least we will learn why it fails or how it fails. And we'll be able to demonstrate, oh, look, this failed, and here's it failing. And we will also probably think well in the course of doing that exercise, right? It wouldn't be as efficient as thinking well without that exercise, if you were open on something else, it's pretty good. It's the reason people write papers and figure out things eventually. I do despair about the speed of the whole enterprise. I do despair of Tyler is asking for economic style papers because he's an economist, but that will only convince economists, right? And then you have to repeat this process 20 times because different people have different perspectives. And a lot of them are going be less amendable than like Tyler's is relatively a good fit, right? You should be able to argue carefully in his mode a lot of the outcomes. And a lot of people believe that the sort of the need to do something is vastly over the term, right? If you had to prove a model with the journals that doom was going to happen, I think you'd be in a lot of trouble because it's the kind of thing they just reject inherently for just not being scientific or something, or not having the right data or evidence. But I think there's other models that you can, in fact, get pretty into the mathematical modeling needs of if you really wanted to. I don't think it's clearly a natural exercise that you would do if you were just trying to see the truth. But if you need to go through this process, you can do it. Right? And maybe it's actually a good idea. And again, division of labor, I think there's a lot of people who want to help with this problem, who have a variety of skill sets, who are kind of throwing up their hands. The point that I'm most frustrated with Eliezer is at the end of every podcast, people ask him, so what do I do about this? If I'm terrified, I think we're all going to die if we don't do something. And I want to do something. What's the something that I should do? And he doesn't have anywhere to send money. He doesn't have anywhere to work on a concrete problem. I mean, he has to try to orient, try to think about the problem. These are good things to suggest. Right? Try to figure it out for yourself. But we do have to be better. Right? I do think there are now a number of AI organizations that are rapidly expanding, that have net positive impact and expectation, have their hearts in the right place. I do think you have to worry about dumb money, right? If you go into a charitable ecosystem, or an ecosystem of work, and you start throwing around money without understanding it, and a lot of other people do the same thing, the incentives get all worse. And bad action drives out good. So you can't just steal that stuff without thinking about it carefully. You have to orient, you have to get your own independent opinion, you can't just trust whoever is telling you things or that those fans go badly. But absolutely, there should be support for lot more alignment efforts. And in fact, have very good news today. I don't know if you saw it before the podcast came out, but OpenAI made an announcement that they're devoting 20% of their acquired compute so far. There's going to be a lot less than 20% of their compute over the next four years, because they'll be scaling up their compute as fast as possible, and it doubles every year. But still, 20% of their existing computer over the next four years to working on alignment, to try and generate a more automated, more advanced alignment process. I have deep skepticism for the strategy they are pursuing. I think that as something that could possibly hope to scale, to solve for human level or beyond human level intelligence alignment, it's pretty hopeless, right? Or not necessarily hopeless, but none of the things they're announcing they're going to work on address the things that make that problem hard. The hope is that they might make it easier to try and address the hard problem because you will be able to iterate faster on the problems when you build the tools, and the compute will potentially allow people to try things that aren't listed there. It's basically, as I said, build a human level alignment researcher to help with their alignment work. And as I understand it right now, have the days, I'm still processing the announcement, Ethereum is not free. This doesn't quite mean actual human level alignment researcher. Because if that was true, it would be what is commonly proposed as the worst idea, which is get the AI to solve alignment for you, which basically means put the AI's first human level problem as the alignment problem, which is one of the hardest places to ensure that your AI is actually aligned, to test to see if it's doing the thing you want, to see if it understands what you want, is a great way to get absolutely killed. It's a great way to absolutely get a misaligned system, so much more likely than so many other ways. But what I actually, I think, meant was simply have a human level way of evaluating whether a given output seems aligned. And humans are pretty bad at this. So getting an AI that's as good as a human, a generic human, doesn't seem like a crazy thing to ask for. However, I still think it's a bad idea, and I'll be I haven't written about this, I don't understand it yet. Right? I understand things by writing about them. But basically, the only way to align for real on human preferences is to get humans who are thinking deeply and well about their preferences and who deeply understand the questions involved. You can't learn from training data and human feedback things that aren't in the training data and human feedback. They're less they're more sophisticated than what you're being given until the point where you can pull out the complexity. So if you have a very large amount of data and a very large amount of feedback, you can draw inferences and correlations and build up a model of all of that, where this is this thing under gradient descent that is more powerful than the thing that's driving the individual components. But there's still an upper limit if you're trying to decode what was going on, but if it wasn't in the thing that you were decoding, it won't be there to find. So you're going to need it not just the equivalent of the random person that you're paying a normal hourly wage to go chat with the chatbot or to go just check for some offensive language or something. You need someone who is pretty intelligent, who is thinking carefully about all the implications of what they're saying, and is wanting to give you pretty sophisticated feedback. And as a parent, this echoes very much in when you're observing and trying to give feedback to your children, right? You need to be very smart, very careful about exactly what feedback you give to your kids. And with humans, it's much, much worse because they're operating at much less compute and infinite words of magnitude less data and less cycles than AI. So every little bit is going to have so much power to reshape their world and their way of thinking. But you have to think very carefully on so many different levels about, okay, what caused that output? What is this person thinking? What does this person understand and not understand? What are this person's incentives? What can I say in response to that that will cause their update to be the update that I want? Because one of the things that I increasingly discover is that by thinking about AIs, think about humans, because we are neural networks too. And we also gather in data and use our compute to try and update our connections to have the best outputs possible and achieve our goals. And a lot of it remains very true, although obviously also a lot of it's very different, but it can help you really think about yourself and others and understand, right, how to use the analog of these tools to get a better model and to get better outcomes. And so when you do it with a child, you have to think very carefully on multiple levels about, okay, what's the important salient things here? What do I have to do? How would he respond to various different things I might say? What would that lead to? What vibe would that cause in the conversation? Same with the elements associated with vibes, people associated with vibes. What might this cause him to be curious about, not curious about, am I like that? What are the think five moves ahead, think 10 moves ahead. Try to figure these things out. And you are on a limited compute budget too. You have to do all of this instantaneously. You have to do all this continuously. You're doing the flight at the same time. I like to make it fast. So when you're dealing with an AI, you have some more leeway because you have so many more individual data points. You can afford to be less precise in some sense. But if you're trying to train something that's going to be much smarter and more capable, then I really do think you have to be very, very precise and careful because otherwise Goodhart's law is going to bite you, right? It's going to optimize to the actual procedure that's being followed, not to the thing that we're trying to mimic, and that difference is going to be very, very bad for you. Partly because of manipulation, but probably just because you are going be wrong about what you want. This means if you develop AI to help you with the AI in this sense, right, you have a serious problem because you are going to Goodhart's Law and something that's been subject to Goodhart's Law already. And so you get metagoodharted and metagoodhardened is not good. If you gave him a telephone effectively, right? It's like, tried to explain Nathan, right? What Zvi wants and cares about. And then Nathan watches Alice, and Alice tries a bunch of things, and Nathan's like, no, no, no, she wouldn't want that because of that. So she would want that because of why. And then Alice tried to teach Carol. And Carol is not going know what the hell is going on. Right? Carol is going to have no idea. This is not going to be preserved. It's not going to be distilled carefully and preserved. So I don't know that there's that much hope in doing this, but it does provide the ability to iterate on various techniques and try things and test things very quickly. You can learn a lot by doing that. So I don't know. We don't have a solution. So it's easy to pigeonhole and shoot down anything that you propose, of course, that will never work. But what will, right? The answer is I don't know.
Nathan Labenz: 2:34:00 Well, that's probably an appropriately sobering point to wrap up for today. I think, for long form, but still only 1%, as long as what it took you to read to create it, your blog and the regular updates there that just kind of run down all of the news and give, I think, some of the clearest, again, most concise analysis is one of the go to places on the internet for keeping up with everything that's going on with AI. And I appreciate all the hard work that goes into it. Last question, I think, me today is just what other things do you recommend that people look at? That could be other safety approaches that we haven't talked about or particular individuals or groups that you think deserve more attention, respect, resources, support of any kind? And then what about just other sources that you think are particularly enlightening? If somebody wants to not just read you as a consumer, but get toward your level, what are the places that you would recommend that they go that you find value in?
Nathan Labenz: 2:34:00 Well, that's probably an appropriately sobering point to wrap up for today. I think, for long form, but still only 1% as long as what it took you to create it, your blog and the regular updates there that just kind of run down all of the news and give, I think, some of the clearest, most concise analysis is one of the go-to places on the internet for keeping up with everything that's going on with AI. And I appreciate all the hard work that goes into it. Last question for today is just what other things do you recommend that people look at? That could be other safety approaches that we haven't talked about or particular individuals or groups that you think deserve more attention, respect, resources, support of any kind. And then what about just other sources that you think are particularly enlightening? If somebody wants to not just read you as a consumer, but get toward your level, what are the places that you would recommend that they go that you find value in?
Zvi Mowshowitz: 2:35:08 Yeah, so as always, you have to think carefully about what you care about and what makes sense for you. Don't just copy what someone else is doing. But I would say, I cultivate my set of sources because that's the set of sources that's helpful to me. But if you want to get involved in orienting around these problems more seriously, first of all, you can ruthlessly steal from me. You could literally just follow my rationalist list, you could follow my AI list, you can look at my people I follow, and you could follow them if they seem to be AI-oriented. There's also a bunch of people who are not AI-oriented there because I have other things I follow as well. But if I was starting to orient, I would look at the people who are in AI that you think are processing information well or offering good ideas, look at what their information sources are and what makes these openly accessible. And then I would pick and choose based on my own reading, which of these are helpful to me, which are not helpful to me. Similarly, I look at various blogs. So if you're reading my weekly blog, whenever you see something, you can decide whether that source is something you want to systematically follow, same way I do when I see it in your source, whether it's a one-off. And you build up a series of assets that way.
So the next question you have to ask yourself is how technical do I want to be? There are several different types of AI developments you might want to follow. There's what I call mundane utility, the standard capabilities and use cases of like, look, I can do this week. And for that is a very different, almost often non-overlapping set of people you would follow. And there are any number of places you can get that at any level of detail. Then you have the people who are engaged and making arguments with each other, and think carefully about how much of this you actually need. Would be my actual advice. I cover it, but if you aren't covering it, you should only follow that to the extent that either you want to evaluate the arguments or you want to make the arguments and get involved in the arguments and try to shape the discourse. That might or might not be why you want to get involved. If you want to work on technical stuff, you probably don't want to devote too much effort to the discourse and vice versa. So you probably want to choose an area of specialization to a large extent. I've chosen to specialize in the discourse because that just sort of someone had to in the situation. A few weeks of that concentration is probably enlightening, but after a while it repeats.
So you've got your AI debate, which is the nominal topic of today's debate and discourse. And then you've got the actual work on the alignment problem. And then you've got the policy questions which are associated strongly with the debate. So the question is, you should decide where you want to focus your time and effort and to what extent. You might want to cover all of it, but I think you actually just want to focus on one of these things and not the other. And we definitely need more alignment research. We need more people who are trying to solve the actual problem. And so the other reason you might want to be paying attention is you might want to contribute money, either in investment or contributions, or you might want to otherwise know which things deserve support or deserve vocal support, including anything from alignment effort to regulatory effort to an academic paper or place for granting them. There's a wide variety of efforts now for advanced AI safety in this sense, and there are some EA organizations, some similar areas where you can find a number of projects. You can also often find links to these things from the Alignment Forum, people's Twitter posts, stuff like that. And there are formal places like Lightspeed Grants where you can try to get involved more explicitly, directly if that's something you're interested in. But again, I would say don't try to jump into that. Don't try to just say, okay, I'm going to just write a check tomorrow because it's in the right area. This is a lot more of a complex situation. There is no equivalent of GiveWell's top charities or some other generic good people you can definitely give money to and maybe it's not optimal, but you don't have to worry about what you're doing. So you need to orient first, unfortunately, before you do any of that, and then see what probably gets you excited and understand what you're doing, and then proceed from there.
I would say if there is one limiting factor right now, more than any other, it is people willing to work on the problem directly, the technical problem itself, and that is in short supply. There are only a few hundred alignment researchers on the planet. This is orders of magnitude wrong, and there are a number of people who are eager to help you pursue that goal if you are serious about it and you have the right skill set and you are thinking well about it, but thinking well about it is hard. So yeah, the key is then to orient carefully. I wish I could be more specific. I wish I could be more useful. And I am still trying to figure these answers out despite spending most of my time on these types of problems, but it is a really hard problem. I would say that in terms of what I'm doing, there are quite a lot of people who are attempting to write in real-time about AI, and I would discourage trying to be one of those people, especially if you're not already deeply enmeshed in the situation, just because it's kind of a rock star-y thing where there's a lot of people who want to be that person and where you're competing for attention with others. And so write to help yourself understand. Don't not write anything. I think writing is great, but don't try to write to help you or others keep up with the pace of developments. Write more in terms of orienting about more fundamental, slower-moving things, do something you'd write, and then maybe post it somewhere, maybe don't.
Nathan Labenz: 2:41:50 Zvi Mowshowitz, thank you for being part of the Cognitive Revolution.
Zvi Mowshowitz: 2:41:55 Thank you so much. Yes, in some ways, Cognitive Revolution is exactly what I'm worried about, but in other ways, exactly what we need. So thanks for having me.