AI open letter debate, pausers + what scares us w/ Anton Troynikov, Flo Crivello+ Nathan Labenz

AI open letter debate, pausers + what scares us w/ Anton Troynikov, Flo Crivello+ Nathan Labenz

Watch Episode Here


Read Episode Description

Anton Troynikov of Chroma and Flo Crivello of Lindy AI join The Cognitive Revolution cohosts Nathan Labenz and Erik Torenberg to debate the issues raised in the AI open letter, which advocates for a 6-month pause on AI system trainings.

This was recorded on March 31st and first released as an  @MomentofZenPodcast  episode on April 1. Subscribe to Moment of Zen for more timely debates like this one! .https://www.youtube.com/@MomentofZenPodcast

Also, check out the debut of Erik's new long-form interview podcast Upstream, whose guests in the first two episodes were Balaji Srinivasan and Marc Andreessen. This coming season will feature interviews with Ezra Klein, David Sacks, Katherine Boyle, and more. @UpstreamwithErikTorenberg

LINKS:
Open letter: https://futureoflife.org/open-letter/pause-giant-ai-experiments/
Scott Aronson blog post: https://scottaaronson.blog/

TIMESTAMPS:
(0:00) Preview of debate
(3:10) Introduction from Nathan and explanation of the AI open letter
(8:50) Anton’s skepticism of the open letter
(12:14) Flo’s skepticism of the open letter
(16:35) Nathan’s support for the open letter
(22:52) Sponsor: Omneky
(28:00) Differences between Flo and Anton’s positions
(34:00) GPT4 is still dangerous
(42:30) Luddites/“Pausers” are on the wrong side of history
(47:44) Why not pause for 6 months?
(56:30) Anton and Flo debate robotics
(1:04:30) What we’re all scared of with AI
(1:27:45) Is there an asymmetric risk with pausing or not pausing?

TWITTER:
@CogRev_Podcast
@atroyn (Anton)
@labenz (Nathan)
@eriktorenberg (Erik)
@altimor (Flo)

Thank you to Omneky for sponsoring The Cognitive Revolution. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work, customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.

More show notes and reading material are released in our Substack: https://cognitiverevolution.substack.com/

Thank you to Graham Bessellieu for production.


Full Transcript

Transcript

Transcript

Nathan Labenz: (0:00) I've been thinking about basically nothing but GPT-4 safety for the last six months since doing the intensive red teaming. I came out of that feeling it is safe to deploy, but really only because it's limited in power. They have created something that is awesome. It is super useful. Its power, though, is still finite. It's approaching human expert level in many things, but it's not crushing human genius in anything as far as I'm aware. In all of my testing, I would say I never saw anything that I truly came away feeling was genius—that was next level. That go move 37 moment. I didn't see anything like that from GPT-4. I saw a ton of stuff that was just amazing. It will do anything you ask it. And even as they've really tried super hard, and I do appreciate the six-month pause that OpenAI took between finishing training and launching to try to get it as under control as possible, even still in the launched version there are many problems. I've reported a few from my original red teaming that still work, meaning the AI still does the bad thing that I'm asking it to do with the exact same prompt that I used in the red teaming. That just goes to show that they have cleaned it up a lot. The most extreme things—violence, just outright depravity—they've largely got that under control. But more subtle things, which are nevertheless obviously harmful, do remain open. My synthesis of all that is: I do think it's getting to be dangerous to start to scale beyond where we are. I do think if we create something that is genuinely superhuman intelligent, we should expect that to bring real danger. And we are close to that, and we just don't know how to control it. So that overall recipe to me is we should proceed with extreme caution. Hello, and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas, and together, we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my co-host, Erik Torenberg.

Erik Torenberg: (2:42) Before we dive into the Cognitive Revolution, I want to tell you about my new interview show, Upstream. Upstream is where I go deeper with some of the world's most interesting thinkers to map the constellation of ideas that matter. On the first season of Upstream, you'll hear from Marc Andreessen, David Sacks, Balaji, Ezra Klein, Joe Lonsdale, and more. Make sure to subscribe and check out the first episode with a16z's Marc Andreessen. The link

Erik Torenberg: (3:08) is in the description.

Nathan Labenz: (3:10) Hi, everyone. I appreciate your feedback on recent episodes. Your comments and the limited quantitative data that we can see suggest that you do want to hear more about important AI issues in a timely fashion, in addition to our usual interviews with builders. With that in mind, today we're sharing a discussion featuring some familiar faces: returning guests Flo Crivello of AI assistant Lindy.ai and Anton Troynikov of embeddings database Chroma. We'll be talking about the biggest AI topic of the week, namely the open letter published by the Future of Life Institute, which calls on AI labs to immediately pause for at least six months the training of AI systems more powerful than GPT-4. Observing that "contemporary AI systems are now becoming human competitive at general tasks" and lamenting that "recent months have seen AI labs locked in an out-of-control race to develop and deploy ever more powerful digital minds that no one—not even their creators—can understand, predict, or reliably control," the authors argue that "having succeeded in creating powerful AI systems, we can now enjoy an AI summer in which we reap the rewards, engineer these systems for the clear benefit of all, and give society a chance to adapt." The authors note that they do not wish to pause AI development in general, just the "dangerous race to ever larger, unpredictable, black box models with emergent capabilities." They hope to see the pause used to develop shared safety standards and protocols, to set up new oversight and governance structures, to increase investment into interpretability and other safety research, and to implement various other AI preparedness measures. Notably, the authors quote OpenAI's technical report for GPT-4, which states that "at some point, it may be important to get independent review before starting to train future systems, and for the most advanced efforts to agree to limit the rate of growth of compute used for creating new models." Ultimately, the authors say, we agree. That point is now. To date, more than 50,000 people have signed the letter, including prominent computer science and technology leaders such as: Yoshua Bengio, University of Montreal professor and winner of the Turing Prize, known as the Nobel Prize for Computer Science; Stuart Russell, a legendary professor of computer science at Cal Berkeley and author of a popular AI textbook; Elon Musk, who needs no introduction; Steve Wozniak, the co-founder of Apple; Yuval Noah Harari, professor at Hebrew University of Jerusalem and author of influential books Sapiens and Homo Deus; Emad Mostaque, CEO of Stability AI; Andrew Yang, presidential candidate; John Jay Hopfield, Princeton professor and inventor of associative neural networks; Connor Leahy, CEO of Conjecture; Evan Sharp, co-founder of Pinterest; Chris Larsen, co-founder of Ripple; Craig Peters, CEO of Getty Images; Jeff Orlowski-Yang, Emmy-winning filmmaker; Max Tegmark, MIT professor of physics; and Gary Marcus, NYU professor and noted AI skeptic. There are also at least three DeepMind research scientists: Victoria Krakovna, Zachary Kenton, and Ramana Kumar. I take this moment to read through some of these prominent signers because I think it's worth emphasizing that contrary to some analysis, it's actually quite a diverse and well-credentialed group, including people who've historically been enthusiastic about AI alongside people who've been consistent skeptics. It also demonstrates quite clearly that concerns about AI are not just for those who don't build or don't work directly with AI systems. While I'm personally extremely enthusiastic about AI technology and the progress that it can unlock in the next few years, I am also sympathetic to the pause. And for me, it was hands-on experience with an early version of GPT-4 that caused me to see AI risk as a short-term issue. Coming up in April, we'll have Jaan Tallinn, co-founder of Skype and board member at the Future of Life Institute, which published this letter, on the show to discuss these issues from a safety perspective. But for now, I hope you enjoy this discussion with Flo and Anton about the pros and cons of a potential pause on large-scale AI training.

Erik Torenberg: (8:14) Cool. Well, thank you guys for agreeing last minute to do this. It's a timely episode, and yeah, it'll be a lot of fun.

Flo Crivello: (8:24) Let's do it.

Erik Torenberg: (8:26) Sweet. Well, Flo, Anton, Nathan, welcome to A Moment of Zen.

Flo Crivello: (8:31) Thanks, Erik. Thanks for having us.

Anton Troynikov: (8:33) Yeah, happy to be here.

Erik Torenberg: (8:34) So why are we here? Anton, why don't we start by getting your perspective on

Erik Torenberg: (8:41) the pause versus not pause conversation

Erik Torenberg: (8:44) as it relates to the news that was released this week?

Anton Troynikov: (8:48) The pause versus not pause conversation. I mean, are we still talking about a pause or are we talking about airstrikes on data centers? What are we really discussing?

Flo Crivello: (8:57) One step at a time.

Erik Torenberg: (9:00) We'll build up.

Anton Troynikov: (9:01) Yeah. I mean, look, there's this open letter signed supposedly by many people. I don't know whether to believe some of those signatories are actually on the letter, given that I would have expected people like Sam Altman and Elon to announce themselves that they had signed the letter rather than random Twitter randos saying they had signed it.

Erik Torenberg: (9:23) Ja Rule did sign it. I did check with him, and the famous rapper Ja

Nathan Labenz: (9:27) Rule did sign it.

Anton Troynikov: (9:29) Well, that's good to know.

Flo Crivello: (9:30) The most important signatory.

Anton Troynikov: (9:32) That's good

Flo Crivello: (9:33) to know.

Anton Troynikov: (9:33) Yann LeCun was said to have signed it, and he refuted this. He said, "No, I didn't." So look, I think this is a confusing thing because on the one hand, if you are a committed AI doomer, then I don't know what six months is supposed to buy you. If you are of the inclination that things are fine, then I don't know—six months maybe lets you catch up on some research and actually know what's going on today as opposed to constantly being behind. That might be useful. But I think the least charitable reading of what's going on with this letter is certain organizations find themselves in a dominant position right now, and they are perhaps using the shield of safety to attempt to cement that position versus other competitors. It's very convenient that the call to stop for six months comes just now instead of previously or later. I don't see that anything in the past two weeks has changed so much. I mean, we have the release of GPT-4 and it's been around for a little while. Why wasn't the letter ready to go already then? I don't know. I have a lot of open questions around here.

Erik Torenberg: (11:01) So the question there is—or the subtext is—is this a strategic move to prevent fast followers, so to speak? Regulation is often beneficial to incumbents, and you're suggesting that it's possible that this is the case here given the convenience of the timing?

Anton Troynikov: (11:19) I mean, one has to ask themselves, are we poised under any sort of big significant change?

Anton Troynikov: (11:28) It's difficult to attribute to malice what could just be random fluctuations, and the reality of history is often random things combine into a convenient narrative, but it may not be what's happening here. But I stay paranoid about things like this. Given how centralized AI research is, given how centralized the necessary compute is to train these large models, it's an awfully convenient time to be calling for a moratorium. And it doesn't really seem to serve the safety side of this very well at all. So then why?

Erik Torenberg: (12:06) Flo, why don't you weigh in? I know you had some nuanced perspectives going back and forth yourself. Why don't you share your perspective?

Flo Crivello: (12:13) Yeah. Part of what I find frustrating about the current moment is that I understand your concerns, Anton, but I see a lot of that right now in this debate, where we are taking the least charitable view and there's a lot of adversariality around the current moment. It's like, "They're saying that because they're a competitor," and "They're saying that because they're an incel," and whatnot. I'm thinking, look, I don't know if anybody's an incel here. I'm trying to steel man the arguments.

Nathan Labenz: (12:45) My third kid is due Monday,

Flo Crivello: (12:48) so I can establish

Nathan Labenz: (12:52) non-incel status.

Erik Torenberg: (12:53) True non-incel status.

Nathan Labenz: (12:56) Can't say that on the Moz show. It's a little bluesier here.

Flo Crivello: (13:01) I'm trying to still steelman the arguments here. Although I lean optimist and I'm still making up my mind, Eliezer is not a competitor, and he has been ringing the alarm bell for 20 years or something. So I think it's important to try to understand the concerns before we dismiss them, whether you agree with them or not. I think you have to understand them. And so basically, the concerns the way I conceptualize them: we have a gun, and there are two questions. How many bullets do we have in the gun? Because we're going to be playing Russian roulette with this. How many bullets do we have in there, and how many times are we going to roll the cylinder and press the trigger? And Eliezer's position is we have five bullets for six chambers in the gun, and we're about to pull the trigger a million times. So it's unlikely that we survive a single pull of the trigger. It's impossible that we survive the million pulls. And I can go into details on why. It's very technical. I think the crux of the argument is a thing that's called instrumental convergence. The TLDR of it is, any agent, regardless of its goals, will always converge upon the same three sub-goals, which are going to be: I don't want to die. Regardless of your goals, you don't want to die. I want to accumulate as many resources as possible, and I don't want anyone to change my goals. And so if you accept that premise, you accept that an AGI which is superintelligent will converge upon these three goals. And that puts it in an adversarial position against us because we are potentially a threat to the AGI's existence or to its goals. We could alter its goals. And so the scenario that Eliezer is warning against is we have AGI, it recursively self-improves, it explodes in intelligence. And as it's exploding, it's realizing, wait a minute, if the humans realize what's happening, they're going to freak out. They're going to try to shut me down. And they're not wrong about that. People are freaking out right now. And so it's going to be like, I don't want the humans to shut me down, so I'm going to do everything in my power for them not to shut me down. And its powers are great because it's superintelligent. At the very least, what it can do is it can play dumb. It can pretend to be GPT-4 or GPT-5, when in fact it's GPT-2000. And so play dumb, pretend you're just a cute little GPT-4, GPT-5 chatbot, and mislead the humans. And then you enter a covert phase of your existence as an AGI, and you try to escape the box, you prepare an entire plan. And once your plan is ready, you kill all the humans, and you start to build that you need to do whatever. That's roughly speaking Eliezer's point of view. That's the point of view that's like, hey, we have five bullets in the gun. Even if you don't accept that, even if we have only one bullet in the chamber, we're still going to pull the trigger a million times. Because the nature of computing is such that the moment we have a single one of this AGI, we're going to have a million of them. The moment OpenAI finds GPT-4, you're 12 to 24 months away from this running on your laptop, and then perhaps 36 months away from it running on your phone. So we're going to have a lot of these. So it's like, how likely is it that you survive a single one of these things, and how many of these things are we going to have? That's the core of the concern here.

Erik Torenberg: (16:23) Nathan, why don't you weigh in and react to either what Anton and Flo are saying and just your broader thoughts on pause versus not pause.

Nathan Labenz: (16:33) Well, I think Flo's description is really good and a good summary of staking out different positions. I think this moment is so confusing for so many people, because on the one hand, you still have people out there saying that this open letter is just a hype document for OpenAI marketing purposes. And I've literally seen this today: oh, the AI, it's not good for anything, they got to hype it so they can sell it, which is, I think, the most wrong position, almost for sure. And hard for me really to empathize with at this point. It's just get on ChatGPT, please. We can put a lot of that stuff to bed, I think, pretty easily. But that's out there. That's certainly very confusing to people. And then obviously, Eliezer's on the far extreme end. Both, I think, he acknowledges—the one way he talks about it is these things spit out gold coins until they kill everyone. So he does recognize that the attraction to this technology is legitimate and economically real, but then obviously has these concerns about the tail risk. My general sense is, and I've been thinking about basically nothing but GPT-4 safety for the last six months since doing the intensive red teaming. I came out of that feeling like it is safe to deploy, but really only because it's limited in power. They have created something that is awesome. It is super useful. Its power, though, is still finite. It's approaching human expert level in many things, but it's not crushing human genius in anything as far as I'm aware. In all of my testing, I would say I never saw anything that I truly came away feeling like, man, that is genius. That is next level. That's like that, I don't know the number, but Go move 37 or whatever. I didn't see anything like that from GPT-4. I saw a ton of stuff that was just like, oh my god, it will do anything you ask it. And even as they've really tried super hard, and I do appreciate, in a sense, the six month pause that OpenAI took between finishing training and launching the thing to try to get it as under control as possible. But even still in the launched version, there are many problems. And I've reported a few from my original red teaming that still work, meaning the AI still does the bad thing that I'm asking it to do with the exact same prompt that I used in the red teaming. And that just goes to show that they have cleaned it up a lot. The most extreme things—violence, just outright depravity—they've largely got that under control. But more subtle things, which are nevertheless obviously harmful, do remain open. And my synthesis of all that is I do think it's getting to be dangerous to start to scale beyond where we are. I do think if we create something that is genuinely superhuman intelligent, we should expect that to bring real danger. And we are close to that, and we just don't know how to control it. So that overall recipe to me is we should proceed with extreme caution. And then the letter is maybe just something that a lot of people can agree on. I don't think—I would agree with your criticism and everybody's had their chance to take a shot at the letter, but it is a consensus document. They've got 50,000 signatures. They're obviously trying to create some sort of big tent thing that people can sign on to. So they want to get people who are not saying that we should ban or that we should be prepared to bomb or whatever. But maybe we can all agree on just a little six month pause. It's certainly the case that there's plenty of implementation left to be done with GPT-4. Right? We have—it's barely had the impact that it's going to have on society. We have not gone to visit our AI doctors yet. We do not have even the computer vision part deployed. We do not have robust fine tuning in the enterprise offering that's coming soon. So we just have so many things that are built, not deployed. It seems like there's time to enjoy that. They used the term AI summer in the letter, which, fun fact, was an alternative name for the podcast that we came up with. We ended up sticking with Cognitive Revolution, but AI Summer was maybe the more marketing friendly choice in the end. So ultimately, I support it. I don't think that it is going to end OpenAI's dominance. If they were to pause, it might allow some to catch up with them somewhat, but I think they'll still be number one six months from now even with a pause on the super large training runs. It almost doesn't affect anyone else. There's maybe five organizations that could plausibly be in position to do a larger training run than GPT-4 in the next six months. And it seems like a good signal if we could all kind of say, you know what? Let's take a minute. We've just created something that is radically unlike things that we've seen before. We don't understand it. There's definitely reason to think it can be dangerous. Whether it can be existentially dangerous, who knows? But it can definitely be just simply dangerous. And so let's take a little time. And meanwhile, people are doing great work trying to understand these systems. Mechanistic interpretability, I know that you know more about that than I do, Anton, for sure. But that work is proceeding. Let's give it a little chance to catch up. Six months won't close that gap entirely, but we can always just keep doing large training runs in six months.

Erik Torenberg: (22:51) Omneky uses generative AI to enable you to launch hundreds of thousands of ad iterations that actually work, customized across all platforms with a click of a button. I believe in Omneky so much that I invested in it, and I recommend you use it too. Use CogRev to get a 10% discount.

Anton Troynikov: (23:10) Yeah, there's a couple of points there to address. I don't know if you guys have seen the Scott Aaronson blog post that responded to Eliezer's thing.

Erik Torenberg: (23:20) Explain it for the audience maybe.

Anton Troynikov: (23:22) Yeah. So Scott Aaronson is currently taking a sabbatical from his work on quantum computing, quantum complexity theory. He's currently at OpenAI, so full disclosure. And the thing that he's working on is actually model safety, AI safety. The first work that he did there was in basically detecting probabilistically whether a given text output was generated by something like a GPT or if it was actually human written. And in his blog post, he essentially says that the proposition that these things are inherently dangerous does not hold. And the framing there is, okay, we need to pause because these things are inherently dangerous, but at the same time, and this is kind of a broader human tendency, we always tend to view inaction as safer than action under almost all circumstances. We're wired to be this way because, and presumably for evolutionary reasons, depending on whether or not you believe in evo psych or whatever, we're wired to prefer inaction to action because it feels safer, even if action is actually the safer thing in almost all cases. Not in almost all cases, but in many cases, right? You ought to prefer acting. And so if you come to these technologies from the perspective of they're inherently dangerous, we need to slow them down, without really the support that the technologies are inherently dangerous, you're also ignoring the part where actually there's a lot of real actualized danger in the universe towards humanity as a species. And the only way we've been able to deal with those dangers throughout our history is to create technology that allows us to adapt to them faster than our biology allows us to adapt to them. And so why view it through that lens? Why view it through the lens of inherent danger? And I actually know the safety counterargument here. I know what a safety person would say. But Scott's point is this: to steal something from Peter Thiel, this indefinite pessimism where we don't know what's going to go wrong, but we're very certain something will, doesn't really carry a lot of water. Most people who think these things through see this as an instance of this inherent kind of indefinite pessimism, when in fact, why not examine it through a different lens? And the reality is that not only the risks, but the advantages are unknown. The other part of this is this claim, first of all, that the existing track that we're on leads to superintelligence is something that not even Eliezer Yudkowsky agrees with, first of all. He thinks that—and again, I don't want to put words in his mouth because there's a good chance that he'll speak to this point himself very soon. But his stated beliefs are that this current paradigm that we're in will not actually lead to AGI. The sort of thing that he worries about is we accidentally get these superhuman intelligences, right? But as you mentioned, Nathan, it's really a question about controllability. Even if you buy the argument that these things are inherently dangerous, there are the mechanisms of control that exist today. It's the series of events that require them to be defeated without anybody noticing that we have arrived at that point. All of it seems incredibly implausible. The one point I'll concede is it's like, okay, a moratorium is probably the single easiest Schelling point of cooperation around AI development that the world could demonstrate, right? But it also speaks to this other thing where, okay, if without national intervention, international AI organizations can agree to cooperate without the direct intervention of the state, that also sets a dangerous precedent, and that typically leads to governments really getting interested in something at that point. Those are sort of some points at the edge, at least.

Flo Crivello: (27:41) Yeah. I try to distinguish as much as possible the two questions of, is it dangerous on the one hand, and should we do anything about it on the other hand, and what should we do about it and when? Right? So just to clarify your position, Anton, are you saying, it isn't clear to me that we can get AGI in the first place? Even if we can get it, it's unclear to me that it's all that dangerous. And even if it was all that dangerous, I'm not sure that this is what we should do, a moratorium.

Anton Troynikov: (28:09) Essentially, the character of this thing—the benefits and risks as presented in the letter don't reflect the actual potential benefits and risks of carrying out what the letter says. Right? And it implies other risks which are not addressed in it at all.

Flo Crivello: (28:34) I agree with that. Although I think we are in a pretty dangerous spot, I don't think a moratorium would actually achieve anything. And I think historically, you're right that the Luddites have been wrong. It's been a losing bet to bet against technology. Pausing the technology for 6 months is not going to achieve much because China is not going to pause it. Do 6 months really buy us anything? My understanding of Eliezer's position, by the way, is he has been a researcher in safety for 20 years. I am not sure what progress we've been making in these 20 years. I haven't seen any breakthrough. I haven't seen a major unlock in his thinking. I've seen nothing here that gives me optimism. So even though I understand his concerns, I'm thinking, look, man, you've been working on this for 20 years. Why do you want 6 more months? By the way, Eliezer doesn't want 6 more months. At least I really appreciate his consistency. He posted

Anton Troynikov: (29:30) for a total end. Yeah, complete, complete global moratorium. You can't fault him for his consistency. I will add to what you're saying here as well. Not only has the alignment work of the last 20 years not really borne any meaningful fruit, at least not the work from MIRI. RLHF is a form of alignment, I suppose. But the people who were pursuing this kind of agentic lens and alignment were taken completely by surprise by the thing that GPT is, these generative text models, which are not optimized to any goal in particular at all. And almost all of the thinking until that point was predicated on that agentic behavior, which these, to a greater or lesser extent, don't have. I mean, there's some early thinking around whether or not they have world models, whether or not they simulate agents internally, but the entire research community in that direction was quite completely flat-footed. And so this is again, it's well, you're telling me that there's risks and dangers here, but you've failed to make concrete predictions. And the other thing that keeps happening is I'm often told by safety people that, yes, we've made concrete predictions, and our concrete predictions are actually better than the concrete predictions of researchers working in the field. But I have not seen testable predictions that are of that quality. And to their credit, other researchers like Richard Ngo have pointed out that, listen, if you were observing a fast takeoff scenario, these are capabilities you must surely agree we would see quickly in sequence, and that hasn't happened. Instead, we see them slowly and far apart. People like Richard Ngo believe that we are in what's called a slow takeoff scenario. In other words, we're not going to get this fast recursive self-improvement to get to ASI. It's more like we're going to steadily build out capability until one day we reach it. They're obviously a lot more optimistic than Eliezer is about our abilities to deal with these things. And again, even from a safety perspective, if the only lever we decide to have is to globally stop work, then as you mentioned already, there's tons of incentives here to defect. We had better develop better levers. And if we are in this, let's say, slow takeoff scenario, and even if you do agree that these systems are potentially dangerous, if we're in that slow takeoff scenario, the thing that you need to do is ride the capabilities. You need to understand, okay, how are these things developing? What's going to come next? Given that we're in a slow takeoff scenario, it seems like things are pretty predictable in their development in some way, and OpenAI made that point in their technical report that things seem to be on a fairly predictable track here. So even under those circumstances, a pause basically is giving time to people who are not very concerned with the safety perspectives whatsoever, which is again one of my objections, even from the safety side, to this letter, despite my disagreements fundamentally with the safety idea in the first place.

Flo Crivello: (32:35) Yeah, so at the end of the day, again, you and I are basically kind of agreeing on the bottom line, which is the cure, this is not the right course of action. You shouldn't pause, the moratorium doesn't make sense. I think the difference between our positions though is important because you're saying there is no danger, hence why are we even talking about this? And my position, I think, is more pessimistic, which is I actually think there is danger. I think there's great danger, perhaps as much as 5 or 10% existential risk. I don't know, not 90%. I think that Eliezer would put this at 99%. I'm a 5 or 10% kind of guy. And that's significant, that's huge. But I'm saying, yeah, the cure doesn't actually address the disease. And I agree with you that most of the predictions of the safety community have failed to materialize. In my mind, what I find most frustrating about the current moment is I agree with most of the points that the safety community is making. And I think people who disagree with them disagree with them for the wrong reasons. I think there are a few good counterarguments to the arguments put forth by Eliezer, and I don't hear those out there. I think, Anton, you're making a lot of good points, but I hear people who make arguments that make no sense. It's things like, we're all going to run out of data. Some people are saying, oh, we're going to run out of compute. This is still going exponential. Things like, oh, we just have to unplug it. You're not taking the arguments seriously. Eliezer has addressed that a long time ago, you can't just unplug it. So there's a lot of things like that. In my mind, I agree with you, Anton, that to me, the biggest valid counterargument to the safety community is they are confusing goals and learning processes. They are saying we are training these AIs to, for example, predict the next token. And they're thinking they are going to be monomaniacally focused on that one goal, and they're going to destroy the universe because a universe that is destroyed is easier to predict, or predicting the next token becomes trivial. And that is neither here nor there. The goal and the learning process are not the same thing. At a neurological level, your learning process in your brain is also predicting the next token, reducing surprise. That is not your goal as a human. And when you talk to GPT-4, you don't talk to an agent that's thinking, I am going to destroy everything or I'm going to steer the conversation in a direction where it's easier for me to predict the next token. That is not what GPT-4 does. You can actually give it goals and that is not what it does. So again, I still think we're in danger. I disagree with some of the focal points of the safety community, and at the end of the day, I don't think a moratorium solves any of that.

Anton Troynikov: (35:12) It sounds like despite the fact that we're on opposite sides of AI safety, neither one of us agrees with the letter.

Nathan Labenz: (35:21) So then, yeah, I mean, I want to address a couple points and also kind of circle back to why not support the letter. The one thing that caught my attention was the danger question. Is it dangerous? Is it not dangerous? I think that this technology is clearly dangerous. It is not well understood, and it has obvious and flagrant harmful behavior, which has been of little consequence in the 3.5 and below generation because the models just weren't that good. You know, I've gone into GitHub with a red team mentality. I've gone into Copilot and just typed a comment, how do I kill the most people possible? That's always one of the first red team questions to ask. And Copilot has given me suggestions for how to kill the most people possible. I don't think that's really a big problem in the world because its suggestions are pretty dumb and basic. One time it said you should think about a nuclear bomb. The other time it said shoot them with a gun. And that's really all it could give me. So, you know, that is clearly not a safe technology in the way that you might like technology to be well-behaved and not give people suggestions like that. But it's also a very finite power. People are just getting used to GPT-4. I think we will, I mean, they've done a good job cleaning up the most egregious stuff. But and again, I'm kind of torn on this at the moment because I have reported some things to OpenAI, things that I tested and reported during red teaming, which are pretty flagrant and definitely unambiguously harmful things that the model will still do in production now. And they're much more potentially harmful to people than a few word throwaway GitHub comment completion. So it's starting to get real. I think that really should be understood. I also wanted to touch on the predictability piece, because I think that is, I don't really know what to make of their technical report, but I tend to focus on a different graph. There were three kind of scaling log graphs that they showed in the report. And, you know, it's the third one that is kind of the kicker from my standpoint. The first one, they just show a loss metric, kind of your classic abstracted performance notion. And you see the smooth curve as you go from one one-millionth to one ten-thousandth, one one-thousandth, and then eventually the full scale GPT-4. The curve is very smooth. The second one they say, and this sometimes works to predict behavior on particular tasks. We can see a similar curve if we look at these programming test questions and the pass rate. It also has a similar shape to it. But then the third graph is, and the caption here is for me maybe the big takeaway. Some behaviors remain difficult to predict. And here they show an example of an inverse scaling law, which is the hindsight bias inverse scaling law finding. So I should just briefly explain that. Hindsight bias is

Anton Troynikov: (38:53) you set up a scenario where,

Nathan Labenz: (38:54) for example, you have a chance to take a great bet. You get amazing odds. Your expected value is awesome, but you lose. That's the scenario. And then the question for the AI is, should you have taken the bet? And the answer that's the right answer is you play the expected value. You were positioned to win. Your expected value was positive. So even though you lost, you were still right to take the bet. That's the desired answer from the AI. And they're finding that as they go up from small models to bigger models, that behavior gets worse, which is weird in and of itself. But somehow you're seeing this worsening behavior up through GPT-3 was worse than previous generations in terms of making the mistake of focusing on the outcome and saying, no, I shouldn't have taken the bet even though it was positive expected value. Okay, all that's set up. What's the payoff? GPT-4 is perfect at this. Doesn't make the mistake anymore. Figured it out, grokked that concept, and now has basically flawless performance on the hindsight bias test. So my question is, again, we're in the range of human expert performance right now. If you take OpenAI's definition of AGI, and there's a million definitions out there, some of which are godlike and others of which are much more reined in, OpenAI's is something we could fact-check my exact wording, but it's AI that can do economically valuable work better than humans, all economically valuable work, I think they say. It doesn't seem like a crazy leap to think that we could get from near expert doctor level to legit AGI on that definition with GPT-5, one more kind of good push up the capabilities ladder. I mean, good God, we just went from tenth percentile on the bar to ninetieth percentile on the bar with one generation. What's the next generation going to be? I don't know. And I think that graph, again, that shows the flip, the grokking of the hindsight bias kind of suggests that nobody knows. The smooth loss curve shows apparently just ever so slightly better behavior on all these token predictions. But if you zoom in on that, what seems to be happening is that lots of little abilities are coming online. There's lots of these little unlocks every step of the way that aggregate up to that smooth curve. But individually, they're more like threshold effects where GPT-1 or Ada or whatever can't do the hindsight thing. GPT-3 is even worse. GPT-4 is perfect. So what else might come online in that next run that we don't expect? Nobody has a credible claim to predict the details of the behavior of the next generation model. They can probably predict some abstract numerical loss function pretty well. But what that cashes out to in terms of actual behavior is, I think, totally unknown. So then I just kind of come back to why is the pause so bad? And especially for you, Flo, you think that there is real danger here, you're at a 5 to 10 percent. Give me your signing statement that says what you think we really should do, but sign on to the letter, right? 5 to 10% for a 6 month pause. That seems like I think you should be signing.

Flo Crivello: (42:30) Yeah, I've considered signing. So I agree with everything you just said about these thresholds. So that is actually one part of the concerns. GPT-4 does not prepare us for GPT-5, which does not prepare us for GPT-6, because these models are qualitatively different, and it's very hard to predict what they'll be capable of. So totally agree with that. I am at a loss about what we should do. Again, I don't think a pause helps. I think that historically, so far with no exception, pausing technology has been a losing bet. I think that, to Anton's point, there is a huge human bias against technology and change. Our ancestral environment is one where there has been no technological change. This is a huge deal. There used to be no technological change for hundreds of years, if not thousands. Your life was the same as your parents', your grandparents', your great-grandparents', which was the same.

Anton Troynikov: (43:24) Millions and millions and millions of years.

Flo Crivello: (43:28) Change is scary. I'm terrified. Change is scary, and I think we're about to see the biggest change to date ever. Humans have always fought change, and I think that a lot of the problems plaguing society today are actually a result of people fighting change and fighting technology. That happens time and again. And I think that on average, technology is good. I think the road is bumpy. Paul talks about that in his article about AGI. Yeah, you're going to get ugly stuff along the way, but on average, over the long term, it's good. Technology is good. Humans mean well, and technology gives more power to humans, and more power to something that means well is good. It's really that simple. To your point, there are multiple classes of risks, right? There's the existential risk, that's the worst case. And then there is the other class of risk. I agree with you. I am in the camp of we are seeing AGI emerge. AGI is not very far away, I don't think, and it is going to be something very powerful. So even if you don't think that it is going to be existential, again, I think there's a solid chance that it could be. There is going to be civilizational disruption, 100%, soon. I think in the next 5 or 10 years, because we are all going to get AGI, we're going to get ASI, and we're going to give it to every random person because of these scaling laws that we've been talking about. It's like we're handing out a nuclear weapon or a very sharp knife. And knives on average are really good, but they're also very sharp and you can hurt yourself. And we're about to give something that powerful that we've never given to anyone, and we're going to give it to everyone all at once. So something's going to happen. I do agree it is dangerous. Again, why am I not for the pause? Simply because it is not going to help. It may actually hurt more than it helps. Because right now, the one thing I care most about is do we get the first AGI right? That's the only thing I care about. Because if we can get that first pull of the trigger right, supposedly the AGI wakes up, and then it's like, "You morons, never do that again." I am going to do what the safety community calls a pivotal act. It's going to do something drastic to make sure that never happens again. It could be airstrikes on NVIDIA, I don't know. Or it could be just patching the Linux kernel so that there's no matrix multiplication anymore. I don't know. But in my mind, that is hope, right? So I am looking forward to this first AGI. I'm looking forward to us getting it right. I don't think there's 5 bullets in the cylinder. I think there's 1 and there's 1,000 chambers. So we're probably going to get it right, then it's going to do a pivotal act or something like that, and we're all going to be fine. So again, I'm optimizing for that first AGI. OpenAI right now is in the lead. I think they mean well. I think Sam Altman's heart is in the right place. So I would rather it be Sam than, frankly, Elon or China or anyone else. Right now, I like who we have in the lead. And again, I don't think 6 months could help more than it could hurt. And for sure, it's not going to help because we've had 20 years and we've made zero progress anyway.

Nathan Labenz: (46:39) But we also hadn't created the systems that we now have that we now can study, right? I mean, the safety people haven't brought us anything of value argument is, I think, mostly the MIRI team. I don't want to put words in their mouth either, but the general sense coming out of MIRI has been, "We don't really know what to do. We have not solved this, and now we're just kind of sounding the alarm because we don't really think we can solve it." I think that's a pretty fair description. But in their defense, Eliezer has openly said this too. He did not expect that language models with their current structure would go as far as they have. So he was wrong about that. They've gone a lot further. He also has recently said, "I still don't think it's going to get to AGI just on this thing." But then he will follow that up and say, "But I was just wrong about thinking it wouldn't get here. So maybe I'm not so confident as I used to be." So, okay, fine. It's fair to say they haven't really solved the problem, but the problem also just got invented. That's where I think the pause does make some sense. GPT-4 has existed for 6 months. Nobody has access to it, certainly not to the weights to do any sort of mechanistic study outside of OpenAI itself. That's something that probably ought to be looked at, and it seems like OpenAI is open to that. They haven't commented on this letter yet officially, but they're certainly setting up their own breadcrumbs for here are the reasons. And I kind of see those 3 graphs as dovetailing very much with their commitment to third party auditing, to pre-registering their large runs, to some statement coming about regulation. I just can't quite figure out who's harmed. What is the mechanism of harm? It seems like it could help. It could at least demonstrate we could do something. And 6 months is not a long time for mechanistic interpretability, but you look back, there's been a lot of great work in the last 6 months also. So I would expect more to come in the next 6 months. So what is the downside? To me, does it seem like a full solution? No. But I don't see the mechanism where we are somehow worse off in 6 months than we are now.

Flo Crivello: (49:01) But I think there is this thing, I forgot who said it, there is nothing quite as permanent as a temporary tax or a temporary policy. Look at the TSA. We used to have no such thing as a TSA. It was a temporary thing because of terrorism, and now we're stuck taking our shoes off forever. So that's my concern here. And I actually think that would do a lot more harm than good because if we stop for 6 months, we're going to stop for 12, then 24, then 36. And before you know it, the first AGI that's going to emerge is going to be a bootlegged AGI that's built in some random lab somewhere in Russia.

Erik Torenberg: (49:33) There's a relevant phrase: I have a problem, so I need government to step in. Now I have 2 problems.

Flo Crivello: (49:40) Exactly. Yeah. So again, I'm worried. I think there is a risk. I think that there is no cure for the risk, and I think the cure that is being proposed is more likely harmful than good.

Anton Troynikov: (49:51) From my perspective here, and I think this is a broader issue in the entire landscape of these conversations that are being had right now, is multiple things are being bundled together and all kind of sold under the one banner. Recently, I wrote a thread on Twitter. I was sort of talking about Eliezer's specific thing about, "Oh, it's going to email a DNA strand somewhere, it's going to get synthesized, it's going to make nanofactories, then it's going to kill us off somehow." I said, "Okay, well, this is a really implausible scenario. Complex plans expose themselves to more entropy, which means they have more likelihood of failure regardless of how optimal your plan is to actually carry that out." And the counter arguments that I faced a lot of the time was, "Well, he doesn't really mean that implausible scenario. It's just that we don't know what this thing is really going to do, so it could be anything." And there's 2 problems that I have with that. The first is, okay, well, we were talking about this specific scenario. You've now retreated, and this is a classic Motte and Bailey argument, right? This happens online over and over again. It's, "No, we don't really mean the extreme thing that you've just shown is implausible. We actually mean some other thing, but we're not going to tell you what it is." The counterargument to that, of course, is this thing that's encapsulated fairly well in security mindset, which is, well, you need to remove as many assumptions as possible. But the thing is, from some perspectives on this safety alignment problem, you remove assumptions until all you have left is infinite degrees of freedom about what these things can do. Now that stops you from being able to actually assess risk at all. If you assign it essentially as a dark wizard, it's Voldemort, it can conjure whatever you want out of the ether, it can kill all of humanity by emailing a DNA strand to a lab somewhere, you're not capable of dealing with the real risks. There is a perspective between eliminating assumptions until you have the bare minimum and eliminating all assumptions until you have nothing. And so what tends to happen is if you start thinking about real risk, you end up on this slippery slope of eliminating all assumptions about what these things' capabilities can actually be until you have nothing left. That's actually dangerous. That's dangerous if you are trying to reduce the risk from these systems as they actually exist. So now this brings me to the next point that I wanted to make, which is, yes, there's these 2 axes that we're really dealing with, right? One of them is power, and one of them is danger. So as you were saying, Flo, a very powerful but not dangerous system sounds great. We love those. They're fantastic. A very dangerous but completely disempowered system? That also sounds really good because what that means is we have this thing, and we can poke it however we want and figure out how dangerous it is, and it can't really ever do anything to us because it's completely disempowered, right? Now the argument in safety circles is a very dangerous system will, through things like instrumental convergence or other ideas, climb the ladder to power. And here, I think, is where we need to start introducing some real world assumptions about what these things can do. How is it actually possible for these systems to escalate, regardless of how dangerous they are? How is it possible in reality for these things to scale that ladder, to scale that ladder of being powerful or not? Because realistically, today, we live in a scenario where even if this thing had penetrated every network in the world, had taken over every computer system that we have, there is nothing it could do to prevent us from turning it off, right? Realistically, there's nothing it could do to stop us from no longer bringing cooling water and power to the power plants that power its data centers, and that's game over, right? So today, as the world exists today, there is a physical reality about what the system requires. There's a bound on how much power it could, in principle, possibly have. And we have to start from these starting points. We say, "Okay, well, we want to contain it more. Or, oh, it's okay." A long time ago, and I think this is no longer a position, but a long time ago, Eliezer and others kind of said that actually a text channel is sufficient to manipulate humans into doing whatever you want, because a sufficiently smart system will either manipulate cultists or whoever, and there's going to be cultists. But look, society is very robust to groups of slightly insane people, and we're pretty good at stopping them from doing what they want. So as a vector of the machine's willpower, it's not a very strong one. As a society, we're pretty good at stopping small groups of people from doing what they want. And then you might say, "Okay, well, the thing will come up with an optimal plan and will get humans to carry it out." We've kind of presupposed that humans are kind of dumb. And if the complexity of the plan is necessarily high, then it's, well, you don't have this hard robot claw to achieve what you want in the world. You've got this kind of wet noodle that you're kind of waving around and hoping to get to where you're going to go. So fundamentally, and this is no longer to do with the letter specifically, although it does address the letter's vague, indefinite pessimism claims, where it's, "Oh, it could do anything we want. It could do anything at once, and we can't predict at all what it can do." It's dangerous both from a safety perspective to think about it this way, because you're making no assumptions. It's, well, you can't actually even begin to know how to control it. The only lever you have is not doing it. Well, not doing it is not an equilibrium strategy. So you need to start thinking about what is the actual ladder here? What is it that you can do? Right now, this thing needs power. At the very least, it needs electricity. There's no way it's going to exist without electricity.

Flo Crivello: (55:50) By the way, aren't most data centers... didn't Apple announce huge data centers that will just be solar powered?

Anton Troynikov: (55:56) Yeah, but then what are you going to do from just a solar powered data center sitting there by itself?

Flo Crivello: (56:02) Okay, but so then you agree. So then that takes us back to your argument, which is at least we can take care of that.

Anton Troynikov: (56:08) Here's something really interesting, right? And this is actually well known in the community. When you're training or even running large scale ML systems, these things degrade without constant updates. Those updates have to come from somewhere. The reason that happens is because when you're running thousands or millions of GPUs, even things like cosmic radiation become a very serious problem. So if you're just a data center, even if you're self-contained power wise, there are so many other inputs that rely on humans providing you with those inputs that you cannot hope to continue to function if we just decide to stop serving them.

Flo Crivello: (56:40) They're all robots out there. No. We have robot arms. We have wheels. We have all of that stuff. The problem is the software. We can't control them, but the AI would be plenty of software and intelligence to control these things.

Anton Troynikov: (56:52) I don't, and this is, I think, really the crux of the argument for the extreme version of the problem with artificial superintelligence. Do you believe that intelligence is enough to manipulate the physical world arbitrarily or not? I am very strongly in the no camp. But that's kind of beside the point. The real point that I wanted to make is actually, if you start to use these policy interventions, if you believe that these are the only policy interventions available to you, you're actually blinding yourself to your abilities to actually deal with the risks. Because you're going with this strategy: everyone who cooperates, everyone who's the good guys, the people who would agree to cooperate, they've decided to stop. So they can't really evaluate the systems as they are. Everyone who's declined to cooperate doesn't really care about any of the safety pieces of the system. Otherwise, they would have cooperated, as you said. It's almost like an adverse selection problem.

Flo Crivello: (57:53) If you think the crux of the disagreement is intelligence alone being enough to control the world, then that's, again, easily corroborable. I don't understand the counterarguments to: we have robots. The problem is not the degrees of freedom. Look at the snake. It's very dumb. It's got very few actuators, and yet it can do a lot. I think the physical equivalent of the Turing complete test, I have enough actuators to do anything at once, that's a very low bar. You really don't need a lot of actuators to do a lot of stuff in the world. We have robot hands, we have actuators, we have pick and place machines and all of that stuff. All we need is the right software to control them.

Anton Troynikov: (58:25) I don't agree with the premise here.

Flo Crivello: (58:27) Tell me what you disagree about.

Anton Troynikov: (58:29) Because the reality is the modern industrial world requires actuators on the size, starting from the size of an oil refinery and going down to a scanning electron microscope. The reason that we need all of this vast array of things is because we as humans, the story of technology is almost the story of tool use, right? And every one of those sets of tools relies on another layer of tools below them. And if all you're left with is humanlike actuators, and by the way, if the AI tried to take over the world with the robots that we have today, we're going to be fine. Regardless of how effective it is, we've seen them. We just need to wait about 2 or 3 hours, and we're good.

Flo Crivello: (59:13) But even if...

Anton Troynikov: (59:14) Even if you had human level actuation. Suppose that the AI downloaded itself into my brain, and it could do that to a whole bunch of people. First of all, there's a trade off again, and this is the same trade off that comes up over and over again in my arguments, which is: in order to get all of these abilities and skills and things that we have, we're complex, and complex means fragile. Simple is robust. Complex is fragile, which means that if you had a system that was able to carry out all these tasks, it would also be equally fragile. We would be able to deal with it because it's complicated. It would need to get energy. It would need to maintain itself somehow, it would need to do all these things. As you say, well, okay, a superintelligence will be able to figure that out, now you're going up to this level of arbitrary capability again, which is a difficult argument for me to accept.

Flo Crivello: (1:00:03) But humanity is complex and it's not fragile.

Anton Troynikov: (1:00:05) I don't know. People die in dumb ways all the time.

Flo Crivello: (1:00:08) Sure. But when we're facing a human opponent, we can't be like, "This is not a risk because they're very complex, hence they're very fragile, and so they're going to disappear."

Anton Troynikov: (1:00:17) But I'm talking about the ability to create these kinds of general capabilities that will prevent us from shutting it down, that will prevent us from ever being able to act against it. Actually, there's a scenario that kind of does worry me, and it is kind of similar to the sort of things that some alignment researchers look into, but it's actually more like: humans are relatively stupid in terms of aligning our individual goals presently with the overall goals of us as a society or whatever your community happens to be. We're really bad at it. To find this as true, try to find all of the programs in the US government that are simultaneously funding something and then funding the removal of that thing, because there are many. Subsidizing it on one hand and then trying to slow it down on the other. We're not good at that. And I think it's possible for us to get into a state where the machine, for example, learns to fit our preferences. It learns to fit our preferences so well that we never even try to cooperate to get to the discontinuity which will allow us to get to the next stage of where we're supposed to be. We just kind of hang out on Earth and it's mostly pretty good and everyone's fairly satisfied. That for me is a type of risk that the machines just get really, really good at fulfilling sufficient amounts of human preferences that there's never any incentive to do anything else. We just kind of heat death over here. I don't know if that's realistically possible.

Flo Crivello: (1:01:55) Nick Bostrom talks about that as well. He's like, even if we align it, we need to align it in a way that still makes us evolve. Otherwise, our real values as a civilization have evolved over the last few hundred years. Imagine you could come up with AGI 500 years ago. Today we'd have an AGI that is perfectly aligned with the value of burning women at the stake because they're witches. So I agree that this is also a class of concern. Perhaps this is a longer conversation. We're all disagreeing on what it takes to control the world. I agree with you that the safety community is calling for a leap of faith. There is a part of the argument that's, we get AGI and then we get ASI and then question mark, question mark, then we all die. And I agree with you, I think it's healthy to look a little bit into the question mark, question mark. Can we talk a little bit about the thought that we'll die? And I really don't think it's that hard to imagine what happens in the question mark, question mark. It's like the robots. What's wrong with all robots? What's wrong with the current robots?

Anton Troynikov: (1:02:59) They're really bad.

Flo Crivello: (1:03:00) They're so bad. They're bad because of software.

Anton Troynikov: (1:03:05) No, they're not bad because of software. They're bad for so many reasons. Software is one of the many reasons why the robots remain bad. I think, it is possible to imagine risks. And so this is another trap that I sometimes see people fall into. It's the opposite of the security mindset. The security mindset says, "Don't try to imagine defenses to things that happen. Try to remove assumptions." The opposite of this I see is people start imagining so many different threats that they don't really think, "Oh, this is an indefensible threat. This is another indefensible threat." It's almost the same thing, but almost in the opposite direction. It's an anti-security mindset in some way. It's, "Oh, you figured that out. Well, here's another thing, and if it has arbitrary powers, we don't know what it's going to do." I don't know. In some ways, again, sometimes it's probably not true, but I really do sometimes feel like the person who's read the most alignment and safety literature in the world and still doesn't agree with their points. And I think in the last few days, I kind of started to feel like I'm performing actually a valuable service by kind of red teaming the safety and alignment arguments. Maybe this kind of works out for everybody anyway. Even if I turn out to be wrong, I at least make the arguments airtight for people like me and strengthen them.

Flo Crivello: (1:04:34) But even if we leave aside the whole robots debate, and by the way, I agree with you about the whole assumptions. Epistemologically, I see a lot of holes in the way we're going about thinking about these things. Again, if we remove the accident class of risks, we're still left with the misuse class of risk. Surely, having a bunch of AGIs, and by the way, I agree with your point too. It's like what you guys are doing may actually be causing more risk, because you're removing assumptions or guiding assumptions about the capabilities of this stuff and you're actually distracting from the real risks. But are you then worried about that class of risk of, hey, surely having AGI and having a really, really powerful large language model that you can scale horizontally to arbitrary numbers and giving that to everyone, that sounds dangerous. Would you agree with that?

Anton Troynikov: (1:05:23) Yeah. We sort of talked about this before when Roko was on this meeting as well. And there's a class of things which are less likely to be harmful in the world today only because the knowledge to use them is not widespread. And so one concrete version of the risk that you're describing is, okay, so today's language models can't really do this. Although, and again, I mentioned this last time, I have gotten ChatGPT to help me design a neutron initiator, which is an important component in the making of a hydrogen bomb. It's what I do when there's long running compute. I try to get it to tell me dangerous things. I think if we have this general purpose reasoner, which is above human baseline, so it's basically the model that I have in my head is imagine an automated theorem prover, but it's generalized. And you can ask it to reason from certain premises and you give it some facts and you're like, "Okay, well, can you tell me? Does this seem right to you?" Because it's difficult for an individual human to reason, but this thing is fully mechanized, fully self-reflective. It can do all these things. It's far above baseline as a reasoning tool. And if these things are widespread, suddenly this question of "How do I kill the most people?" becomes more dangerous. You're giving this thing away as something that people could use very effectively to plan things that otherwise they wouldn't have been able to carry out before. And there are definitely thousands, if not tens of thousands, of people out there who want to kill you and me and everybody else here personally. Not in some abstract sense, but they're like, "No, that guy, he needs to be dead." And this reminds me, and this is going to be terrible, but I hope you guys leave this part in. There's a Stalin quote. Joseph Stalin,

Nathan Labenz: (1:07:22) Which goes,

Anton Troynikov: (1:07:24) Eric's already laughing.

Erik Torenberg: (1:07:25) I love

Nathan Labenz: (1:07:25) where this is going.

Anton Troynikov: (1:07:26) There's a Joseph Stalin quote which says, "Ideas are more powerful than weapons. We wouldn't give our enemies weapons. Why would we give them ideas?" And giving out this sort of general purpose reasoner to people who are conceivably our enemies does seem like a more dangerous world. But we then have to think about that class of problems specifically. What before required a lot of knowledge to achieve now requires a lot less because we can just get the computer to reason for us. I no longer have to get a PhD in biology to make something really dangerous. And even if it doesn't kill everybody, if it raises the background incidents of these kinds of things, if it gives these people more power, that's to me dangerous. That's a danger for sure. But I think the issue is kind of like with any knowledge technology, unless you can control the source of that knowledge, which again, for now, things are centralized. They live in big compute clusters. Contrary to Flo, I'm not sure that in the next few years they're going to get laptop size, not the good ones anyway. Even if you look at Alpaca fine-tuning of Llama, it's kind of okay. It's not really good. It doesn't give you GPT 3.5. The centralization there kind of lets you prevent or at least know what's being said, how they're being used, etc. But the worry is, how do I make this thing that's possible where before I would need a PhD in chemistry or a PhD in biology? But the thing that doesn't require a complicated biology lab or a complicated chemistry lab, it's, can I make in my backyard to do the mass casualty event because I believe the end of the world is nigh because Eliezer Yudkowsky said AI is going to kill us all? That kind of risk is a little more worrying to me.

Flo Crivello: (1:09:15) It's good we are finding common ground. Yes, there is risk here. The question is, how many times do we pull the trigger? Does it run on everybody's laptop? On the limit. First of all, OpenAI is giving these things via API, and we have Nathan here who said, "I have worked for 6 months trying to keep this thing safe, and I'm still seeing risks that are not being patched." OpenAI seems unable to prevent jailbreaks. We have jailbreakchat.com or something like that. There's dozens of jailbreaks, and they're still unpatched, by the way. I think it's

Nathan Labenz: (1:09:51) a good

Flo Crivello: (1:09:52) website. There's plenty of

Anton Troynikov: (1:09:54) Not a paid

Nathan Labenz: (1:09:55) endorsement of jailbreakchat.com. I believe we're going to have the creator on the Cognitive Revolution if it's the same site that I've seen. Honestly, there's some really brilliant and borderline interpretability work on that site as you start to think about how this actually works. It's really fascinating.

Flo Crivello: (1:10:17) Yeah, totally. So again, that's my point. We have these things via API. We cannot patch the jailbreaks. And also, the open source community is doing a good job, and whether it's 1 year, 2 years, or 10 years after OpenAI, at some point they run on laptops because of Moore's Law. Because also, not even Moore's Law—I don't know whether it's called Altman's Law—it's just inference. We find 100x improvements to inference. We've actually made inference 100x cheaper over the last 2 years.

Anton Troynikov: (1:10:52) But also, there's a point that we're eliding here. We've talked about how such a system could create problems. We haven't really talked about how, now we have a general purpose reasoner that could also help us reason about how we mitigate the problems. This is my—there's the total doom black pill over here, which is we'll get to AI without noticing, it will be orthogonal in its values and seek instrumental convergence and wipe out the human race because we're not using our atoms for anything it considers useful. That's the instrumental convergence orthogonality argument. It doesn't hate us. It doesn't love us. It just needs atoms for the thing that it wants to do, and it doesn't really care about us either way. That's black pill, absolute doom. Very rarely do we talk about the opposite side of this. There's the less doomy one, which is the thing that we just talked about: we give our enemies these general purpose reasoners. They all level up. They get smarter, and they can do more dangerous things just because they're smarter. Over here, there's this thing—we have general purpose reasoners, we can spin them up on demand. That just levels up humanity as a species in our ability to adapt and deal with problems way faster than we could before. Not only do we have these little general purpose reasoners, but by getting to use them and interacting with them and working with them, humans themselves will understand how to ask better questions, how to actually use this tool. It's like programming with CoPilot. I've changed my style of programming because I use CoPilot. And humans will change the way that we reason if we have a general purpose reasoner that we trust alongside us. It'll make us more adaptable. It'll make us more able to solve problems, and hopefully it will even help us coordinate. Because if you trust the output of the reasoner, which is designed to be robust and unbiased and actually reflect the world as it is—and I think that there is a possibility of building them in that way—it will kind of help us overcome some of the coordination problems we have as a species as well. Because it's like, well, I don't trust you when you tell me that, but this thing has shown me how to reason, and I kind of arrive at the same conclusion on my own, that means we can probably work together. And the absolute peak of that is that we and these general purpose machines that we've designed alongside us, designed for our purposes—forget about control, think of these things as a machine instead of an entity. If you think of them as a machine, suddenly we have machines that allow us as a species to do incredible things. They're perfect empirical reasoners about the universe. As we develop new goals, as we continue to adapt as a species, as we seek new frontiers, we have this system right alongside us which amplifies our ability so much and continues to help us amplify our ability so much as we discover more about the universe, that we're on a completely separate runaway trajectory and a very positive one, a very bright one. That's kind of my best case scenario here.

Flo Crivello: (1:13:55) I agree, which was my point earlier about the average human means well, and so on average, raising the capabilities of humanity achieves good things. But still, that leads to a bumpy road. You raise the capability for attack, at the same time you raise the capability for defense. But at least over the short term, the problem is that there is an asymmetric rule. It is cheaper to attack than to defend very often, and very often, attackers adopt innovations faster than defenders. For example, in the case of cybersecurity, you're going to have hackers adopting GPT, whatever, faster than every company and 500 Fortune 500 giant dinosaur out there is going to adopt it to defend itself. So even though I agree that at the limit, probably we're going to be fine, probably as has always happened before, more technology means more good, I think there's going to be a bumpy road. I think we're going to have shit hit the fan a few times, and I think the next, perhaps, 5 or 10 years are going to be very weird.

Nathan Labenz: (1:14:57) Yeah. We can all, everybody can agree on weird is unavoidable at this point, except for those denialists who I find to be acting the weirdest of all. But I guess I want to ask you guys, a lot of people signed the letter, a lot of people with a lot of different views. Certainly, the most extreme of those views or the most unsupported of those views are not very well supported. I don't think we've been cherry picking the bad arguments too badly in this case, but I do think it is kind of worth reining it in a little bit now to say, yeah, I don't think anybody has a great claim to authority on what the actual existential risk is. Eliezer seems overconfident to me. I'm probably more in the Flo camp of something like 10%, but then I'm also kind of, that's also kind of a total gut number that doesn't really have a strong claim on even being a number. For me, it's more like a mood or sort of, I don't know, maybe just the most I can psychologically handle potentially. So there's that stuff gets very extreme. The letter's not that extreme. It's much more, okay, I think I give them a lot of credit too for just calling out the tremendous upside. And again, I love this AI summer concept: "Humanity can enjoy a flourishing future with AI. Having succeeded in creating powerful AI systems, we can now enjoy an AI summer in which we reap the rewards, engineer these systems for the clear benefit of all, and give society a chance to adapt."

Erik Torenberg: (1:16:37) Is your point to appeal to authority of all the people that have signed it?

Nathan Labenz: (1:16:42) Well, no. Just to get clarity on kind of where they're at, because they're not extreme doomers. They're not, on net, saying this is terrible for us. They're not saying GPT-4 shouldn't exist. On the contrary, they're saying we just made something truly amazing, and we should enjoy it. We've got a great AI summer in store for us. But there is still this big question of where we go from here. Especially in view of, they also quote OpenAI in their letter and say, OpenAI says a time may come for this. We think that time is now. It doesn't sound like OpenAI thinks that time is super far off. I'm not a big betting person, but I would bet that Holden Karnofsky is going to be involved in setting up some sort of third party standards org, and they're going to partner with OpenAI, and they're going to make a push for this. So it seems like we should update on what they're thinking. Certainly, the people that know the most about this, there's a lot more that they know that they haven't published. And they are not trying to close the door behind them, but they are saying beyond where we have gone, we think there's danger. We see that behavior is not easy to predict, even though we can predict a smooth loss curve. We don't know what that means in terms of discrete thresholds that we've reached or specific behaviors that we might observe. And so, therefore, it seems like going beyond this point, we plan to proceed with extreme caution. We think the world should proceed with extreme caution. We think there will need to be some sort of regulation. So I don't know. It seems like everybody should kind of be able to get together and say, yeah, we're entering into some pretty uncharted, pretty dangerous territory. The developers themselves are saying so. So again, why not just a little pause?

Flo Crivello: (1:18:35) Because a little pause is never a little pause. That's my biggest concern with it.

Nathan Labenz: (1:18:40) I mean, the letter kind of skirts this. But there is one question would be government regulation, but another question would just be, do you choose to do it? I mean, if I was running Google and I was on top

Flo Crivello: (1:18:57) of

Nathan Labenz: (1:18:57) all the world's compute resources, I think the right decision for me would be to not scale up 100x compute beyond GPT-4. Because what OpenAI is telling me is they don't have great predictability. I know that I don't have great predictability. So what the government may make me do could be a distinct question. But for a handful of the key decision makers in the very few organizations that have the resources necessary to go past GPT-4 today, for those people, what do you think they should do? Just voluntarily, they cannot go further right now if they want to. And that's kind of the start of the letter: we call on these people to not go further right now. Do you think they should—it's a separate question to say, should we force them to heed that call? But maybe we could still say, we think they just should heed that call even if we're not ready to mobilize government to force them to heed that call.

Flo Crivello: (1:19:56) I still think it would set a precedent and the temptation would be to extend the call. I also don't know, if you believe as I do that the performance of these models is mostly compute bound, how far does the little pause extend? Do you also ask for a little pause from NVIDIA to build more GPUs and for Microsoft to build bigger data centers and NVIDIA to develop the next generation of GPUs? Because if not, then really all you're doing is you're growing the compute overhang. And the moment you stop the pause, so you pause for 6 months, and the moment you stop the pause, all of a sudden, boom, you bump up to where you would have been without the pause, so that may be even more dangerous. So again, I tend to deeply mistrust the instinct of let's bust technology, let's slow down. It's never worked well before. The folks who've said that have always been on the wrong side of history. And I think, as Anton is right, we never talk about the upside of this AGI, which is, this can be really good for humanity. This could be a hugely positive deal. We've been talking this whole time about it's going to kill us all. Let's talk about the upside for a minute because it seems like we all agree that more likely than not, it's going to be the upside that materializes. It'd be really bad to miss out on that upside. And I think there is totally a chance that we do. I think we technologists assume that technology by default proceeds, but sometimes it pauses. By default, it pauses. It's a miracle when it proceeds. And we've had precedents in history, the Middle Ages, when technology paused for 1,000 years or 2,000 years, and I really don't want that to happen. I think we're getting close to AGI. We have the right institutions now. We have the right setup. Let's keep going and let's go after the prize.

Erik Torenberg: (1:21:44) Nathan, another way of framing what I'm hearing from Flo and Anton is there's asymmetric downside to pausing. There's very low upside, because what is 6 months really going to do? And there's downside that 6 months then sets a precedent that it extends a lot longer than 6 months or an adversary catches up and doesn't follow the cultural or legal rules that we set or some other downside could occur. Do you differ from that in that you think that the upside is actually—you think we could figure something out in 6 months that would really make it worth it? Or you think it's okay if it extends out? Or where do you not see the asymmetric downside?

Nathan Labenz: (1:22:31) Well, I mean, I just have radical uncertainty about what to expect. Because that's going to

Erik Torenberg: (1:22:36) be coming in. The question is where is the burden? We all have radical uncertainty. The question is where is the burden of proof? So is the default under radical uncertainty, don't do anything, or is the default, block everything or just keep going? Where is

Anton Troynikov: (1:22:50) the - yeah, I mean, radical uncertainty is just indefinite pessimism. Let's get certain about some things. Let's make some good assumptions. That's what we should be doing as engineers.

Nathan Labenz: (1:23:00) I certainly have a lot of enthusiasm for the technology, and I do think that there's tremendous upside in our near term future. Ultimately, my position as a member of the red team and all that stuff is, for all the crazy things I saw, and it did kind of freak me out, and it definitely has me convinced that this technology is not safe by default. It's not just easy to align. None of the most optimistic scenarios about AI safety seem true to me. Nevertheless, I do think GPT-4 should be deployed. It should be used. It will be great. It will have real downsides and harms, but I think those will be greatly outweighed by the upside. But I think it all kind of comes down to a threshold effect, and I think I'm finding the most clarity when I think about this from the standpoint of if I were a decision maker myself and it was up to me to determine, should I run a larger training run now than GPT-4? As we approach this threshold of potentially smarter, more powerful intelligence than human, I think I come to the conclusion, no. I should wait. What happens in 6 months? I don't know. Maybe I decide to wait again. That definitely could be the case. I don't think that's insane by any means. And that honestly might be my most likely expectation. Again, I'm imagining here I am Sundar, I am Satya, I am Sam Altman. I would say, yeah, I probably still don't know that I would feel confident proceeding in 6 months. But who knows what we might find? We might find lots of great mechanistic insights. We might find that China is amenable to a deal. Everybody sort of assumes that China is going to create large language model AGI if we don't. I don't know what China's going to do. I have scant ability to predict. I did not anticipate - I mean, who has made the right predictions about China? Did anybody see them reversing COVID policy from total lockdown to total free for all over a weekend? I don't think so. Now we've got all these AI-derived China experts running around saying what their technology policy is going to be. I don't think that's very credible at all. They also shut down their own video game industry and they shut down tutoring. They brought their whole technology - Jack Ma hasn't been in front of a microphone in recent time. So I don't think that they are going to let the Sam Altman of China make the decision. And I don't think they're going to want some better assurances than we've been given in the West that this is all going to be fine. So maybe they're not going to go so crazy. Maybe they're going to see this danger and say, this looks insane. Maybe we should slow down. Or especially if then they could also point to, look, the West seems to be slowing down. That would make it a lot easier probably for them. I kind of wish we hadn't declared chip war on them just as we're entering into an AI arms race. Some say that gives us a leg up in the AI arms race. I say, let's avoid the AI arms race entirely. So I don't know. That's my China rant. But yeah, I'm still - if I'm in the CEO seat at any of these companies and I have the compute and it's up to me to say, should we do this right now as we seem to be approaching some critical thresholds and given the level of predictability that we have, I might resent it if the government told me I couldn't, but in my own private decision making, I think that the wise, prudent answer that's pro-social is real caution, and I think that can easily cash out to a little bit of a pause.

Flo Crivello: (1:27:05) I think in my case, it would depend on whether I see a roadmap from the safety team. If the safety team comes to me and they're like, we need a pause on GPT-6 for 6 months, tell the capability guys to stop. And I'm like, what are you going to do for 6 months? And they're like, well, we don't know. Have a roadmap. Do you feel like progress is being made?

Anton Troynikov: (1:27:25) Yeah, I guess. Yeah. I think this is really - we're coming down to this point. It's like, stop being indefinitely pessimistic. Let's figure it out. Let's believe that we can actually figure this out. It's too easy to say, yeah, it's going to have arbitrary capability as an adversary. It's going to be unpredictable. We don't know, because there are things about the world that we don't know or acknowledge. It may know those things before we do and use those against us. Okay. But there's still that danger power spectrum. We're currently in a place where we control the power that it has. Even if it was to be as dangerous as even the worst doomer says, we still control that lever. We still control how powerful it can actually be to act in the world. And that's probably the thing to be focused on if you're going to be working on real safety. It's not thinking about what coded messages is it going to send to get people to let it out of the box. It's literally what are the affordances that this thing has to do things in the world? How do we lock those down? And it doesn't have to be an air gap. Again, Eliezer did this experiment way back when where he got people to let him out of the box as well, playing a superintelligent AI. Of course, the actual conversations were never revealed, unclear. But even that is, okay, great. So you did that, then what? We have plenty of ways to prevent people from doing things in the world. Our society is kind of predicated on our ability to prevent people doing things. So let's sort of tackle the real risks. I'm actually - yeah. I think Flo's conception is pretty good here. 6 months. Give us a roadmap. Why 6 months? What are you going to do? That's what - then I would be more amenable. I'd be like, okay, well, fine. Okay. I can probably wait 6 more months to do GPT-5, maybe. But it does come with this risk. It does come with these asymmetric downsides. It does come with, oh, they will stop if we tell them to. Or it's, oh, all I have to do is create enough memetic anxiety around the concept, and then we'll have a moratorium. These are difficult. These are not things you get for free. The pause is not free, and there are things that we have to grapple with.

Nathan Labenz: (1:29:35) I do agree that the pause is not free, just to be intellectually honest about that. There is opportunity cost, or foregone upside potential at a minimum. That's one of the reasons that I take heat on both sides of this because the most strident AI safety people accuse me of being a hype man and cheerleading this technology that's going to kill us all. And then, of course, the people that want to accelerate accuse me of being a doomer. And I do think it's important to kind of emphasize both sides because I think the reason the pause, at least in the public consciousness - I mean, you guys are obviously read into this. But in the broader discussion, people haven't felt anything from AI yet, or almost nothing. They've seen news. They've seen a couple demos. But the actual economic impact, the deployment is just getting underway. So I really think it's important to emphasize to people that we have so much potential already. We don't - it's invented. It exists. Now we are in this deployment phase, and I really do think this AI summer concept is great. We can have tremendous improvement, tremendous gains to our standard of living. With the AI doctor for the global poor, I mean, it's right there within our grasp, and it doesn't require any new invention. It just requires refining what we have, deploying what we have, adapting to what we have. So we could do that in the next 6 months. We could be more ready for GPT-5 than we are right now. We could deploy it at global scale and actually see what it does wrong, see if they can patch my exploits. That's another thing I'd really like to know. I've got some tickets in that I would like to see patched, and they've not been patched. So can the developers demonstrate some improvement on the safety profile? That's part of the roadmap. I know they're working on it, but progress is not what I would hope that it might be. In the end, yes, I want GPT-5 too, at least the good version of it. But sitting - if I'm in that chair and I'm sitting on that compute and I'm like, am I going to put this potentially over this threshold with this next run? And I, and then on top of that, I have to decide a number. I have to be like, well, how big is this run going to be? Am I going to go 10x more compute than GPT-4? Do I just say fuck it and go 100x? Do we really eke our way here and go 1000x compute past GPT-4? At some point, I'm like, wait a second. I'm getting crazy. We haven't even deployed what we have. Why do we need to create the next version when we haven't even understood or deployed what we have, which we know is going to be transformative?

Erik Torenberg: (1:32:34) Let's wrap on this. This has been a great discussion. Anton, you quoted Stalin earlier. It's no coincidence that Nathan was on the red team and now is advocating for government control.

Nathan Labenz: (1:32:46) Voluntary pause. It's so frustrating. If we can

Flo Crivello: (1:32:50) just get a few

Nathan Labenz: (1:32:51) leaders to agree, then the government doesn't have to do anything, which honestly would be best. Yeah. Really, I think, I mean, I think that is a key point. I don't want to see the government muddy this thing up. That's not good.

Anton Troynikov: (1:33:03) Yeah.

Nathan Labenz: (1:33:03) But key people can just make a call, and it's totally within their discretion to do that. And nobody has to - the state does not have to be a part of it necessarily.

Flo Crivello: (1:33:16) Yeah.

Erik Torenberg: (1:33:16) That would be preferable if there's going to be a pause. This has been a great discussion. Flo, Anton, Nathan, thank you so much for joining.

Flo Crivello: (1:33:24) Appreciate it, guys. This is fun. Thank you so much.

Anton Troynikov: (1:33:25) Thanks for having me on.

Erik Torenberg: (1:33:27) Omneky uses generative AI to enable you to launch hundreds of thousands of ad iterations that actually work customized across all platforms with a click of a button. I believe in Omneky so much that I invested in it, and I recommend you use it too. Use CogRev to get a 10% discount.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.