OpenAI's Safety Team Exodus: Ilya Departs, Leike Speaks Out, Altman Responds - Zvi Analyzes Fallout
Dive into the intricacies of AI ethics and safety concerns as we dissect the recent resignations of AI Safety Team from OpenAI.
Watch Episode Here
Read Episode Description
Dive into the intricacies of AI ethics and safety concerns as we dissect the recent resignations of AI Safety Team from OpenAI. In this stimulating conversation, we unpack the challenges of aligning leadership vision with safety culture, explore legal implications of non-disparagement clauses, and discuss the future of AI alignment and superintelligence. Tune in for a thought-provoking analysis on the responsible path forward in AI development.
SPONSORS:
Oracle Cloud Infrastructure (OCI) is a single platform for your infrastructure, database, application development, and AI needs. OCI has four to eight times the bandwidth of other clouds; offers one consistent price, and nobody does data better than Oracle. If you want to do more and spend less, take a free test drive of OCI at https://oracle.com/cognitive
The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR
Head to Squad to access global engineering without the headache and at a fraction of the cost: head to https://choosesquad.com/ and mention “Turpentine” to skip the waitlist.
Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/
CHAPTERS:
(00:00:00) Introduction
(00:04:46) Compute resources
(00:08:15) The straw that broke the camel's back
(00:13:35) Sponsors: Oracle | Brave
(00:15:42) Dwarkesh interview with John Schulman
(00:19:14) What should we do?
(00:22:47) Strengthening the bill
(00:25:11) Non-Disparagement Clauses
(00:30:48) Sponsors: Squad | Omneky
(00:32:33) Safety agendas
(00:43:22) AI movie concept
(00:47:24) Forking Paths
(00:49:44) Simulation Hypothesis
(00:53:56) Doomer
Full Transcript
Full Transcript
Transcript
Nathan Labenz: (0:00) Hello and welcome to the Cognitive Revolution where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week we'll explore their revolutionary ideas and together we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my cohost, Erik Torenberg. Zvi Mowshowitz, welcome back to a special bonus session of the cognitive revolution.
Zvi Mowshowitz: (0:28) Yep. There's always more to learn.
Nathan Labenz: (0:30) It's happening quickly today. So by the time we got off the recording, Jan Leike had posted his tweet thread statement about his reasons for leaving OpenAI in which he puts it pretty plainly that he's had pretty fundamental disagreements with leadership and has had trouble getting the resources that he needed to do the work, including compute resources, and certainly had some nice, fond things to say to his teammates, but basically was like, I don't think we're on the right track and seems to be resigning pretty much in protest. So that doesn't seem like a good thing. Doesn't seem like a good situation. What else have you learned, and what do we make of it?
Zvi Mowshowitz: (1:15) We also got coverage from Vox, Bloomberg, from TechCrunch. We got Sam Altman's response actually to Leike, which was extremely graceful. Essentially saying, yes. We have a lot of work to do and we're going to do it. And it's about a longer response later. It's the basically best possible thing you can say there, but then you're on the hook for doing it. We have Kelsey Piper confirming the nature of the draconian nondisparagement clauses, which apparently have lifetime duration and include an NDA that you can't reveal about violating the NDA. And she claims that when employees come onboarded for the first time and are given equity heavy compensation, they are not told that they will be required to sign this disparagement clause or have their existing vested equity confiscated upon departure. So that seems like a really bad equilibrium and way to run a company. I'm honestly confused as to why that's legal.
Nathan Labenz: (2:12) Well, it maybe shouldn't be.
Zvi Mowshowitz: (2:14) Yeah. I I think you should have to, like, very much acknowledge the disparagement clause rules very clearly initially if you're going to confiscate something of immense value versus not signing them. Doesn't seem reasonable at all. It also doesn't obviously speak well of the company in its openness in the good sense. Right? If if you're forcing every employee to never disparage you for life no matter what or else, like, that's just not a reasonable position to take if you want people to not assume the worst. And so, we look at Leike's statement essentially saying that for years they have had the trouble of shiny new products becoming the priority of a move away from safety culture, that the culture is not amenable to or compatible with safety. My paraphrasing here a bit, I'm not looking at both the words precisely. And that he had trouble getting the last few months despite the explicit 20% of existing compute commitment from OpenAI, which should have been sufficient for current purposes. And indeed, TechCrunch confirms that they have not been honoring their commitments. They have asked for a fraction of the 20 commitment and have repeatedly not gotten what they asked for. And that is part and parcel of the whole idea of running anything in AI these days is compute. You need your compute to do your thing. And he was reporting this to specifically become a substantial barrier to doing their work. So this is not only a philosophical approach, because he said they should be spending vastly more, but not just a little bit more, but they should be spending vastly more of their resources on preparing for our AGI future. But also, if you'll notice what he actually said, for just the next generation, that essentially he doesn't think they're ready for GPT-five. He doesn't think they are on pace, they have the tools they need for GPT-five to be safe in a pedestrian mundane utility sense, like not in an existential sense. And then later on, there's the problem of superalignment, there's the problem of AGI, which many people like OpenAI said they expect within several years. It's a very short timeline. And now the superalignment team has been dissolved and its people have been dispersed throughout the company. They claim they will still continue the work on that level, but having dishonored their commitment, and having dissolved the team, and having lost the leadership, but doesn't look like the kind of effort they promised us, that they said they were going to do.
Nathan Labenz: (4:46) Yeah. Does the timeline still hold, Sam? I would we're now, what, 9 months into the 4 years.
Zvi Mowshowitz: (4:51) He if he still expects to need superalignment to succeed within 4 years, then these actions do not reflect somebody who understands that and understands as he's repeatedly told us he understands what is at stake with superalignment. And it's clear, like, his breaking point was Ilya departing according to the Bloomberg article. That makes perfect sense. But we have a series of departures. We have obvious justifications for that. And this all makes sense. It's we all there was all this speculation about what did Ilya see? That was the whole thing. What did Jan see? Became the thing. Yes. You know, after Jan quit. And the answer is what they saw was a company that's not committed to safety, that's unwilling to put its money where its mouth is, that has a culture that is hostile to safety efforts, and that is pivoting towards the shiny new product, it's a product company, it's a scaling startup, it's devoted to making money, nothing wrong with making money. But in this case, they're building smarter than human intelligence out of their explicit company mission and goal. And as Leike points out, and has ultimately repeatedly acknowledged, this is not a safe thing to do. This is not a default thing that will go well by accident, right? We can debate how likely it is to go badly with good real efforts to make it go well. But I think any reasonable person can see that there's a very good chance that if we do not put in the effort, we do not do the work, that things would then go badly. And it doesn't have to necessarily involve like some sort of specific AI catastrophe or external risk scenario, which simply means this could go very badly for people as experienced by people on the planet earth. And we have to think carefully about these questions. But what's clear is that OpenAI's leadership, as embodied by Sam Altman, increasingly is not doing this, and simply in the last few months is not under their commitments.
Nathan Labenz: (6:50) Yeah. That compute 1 for me is pretty troubling. I I could try to make a in the absence of that, it it would be a lot harder to parse, I feel like. Obviously, people can have all sorts of disagreements, and you can imagine the Sam Altman defense being like, you're doing your thing over here in the superalignment team. We're doing our thing over here in the product team. Why is that inherently a problem? But yeah, if you can't get the compute to do the alignment work
Zvi Mowshowitz: (7:18) Yeah, it's not just a promise of x and we got y, it's beyond specifically saying, you promised us x, we needed x over n, where n is a lot more than 1 to do our work, and we couldn't get it. And the superalignment commitment, like it sounded a lot, 20% of our currently available compute, That's not a lot over 4 years because 2 years from now, OpenAI is going to have 10 times as much compute, like for sure, unless something very strange happens. So over the extended period, this is a reasonably small, very modest commitment, as opposed to in certain similar industries where safety is paramount, most of research costs, most of development costs can become safety, right? And again, that's for mundane level, just make sure the plant doesn't melt down levels of safety. So I don't understand it. I also don't understand why it's
Nathan Labenz: (8:10) not simply good business to give you a mickey and people like that their compute. Certainly relative to this outcome, seems if we are indeed talking about something like 5% of compute, then that can only allow you to move, you know, 5% faster, right, than you could without that compute. It does seem like a very strange decision to allow people to walk and allow this to become this big of a story over a couple percentage points of compute availability.
Zvi Mowshowitz: (8:47) Yeah. It's almost always the straw that breaks the candle's back. Right? Like, it's clearly Ilya was being shut out of decisions, couldn't play his ambassador role that he had previously played inside the company if you're reading between but not even between the lines, just reading the lines of various articles and reports that people were turning against the very idea. The board battles embodied this struggle. The way they played out in practice, whether or not this is at all fair or deserved by anyone or any philosophy or any approach or anything, Turn people against them, every person that leaves turns you further against them. The firings, you have a cascade of every person that leaves, you lose more trust. Like, why did they leave? What caused them to leave? How was that handled? And yeah, like he presumably was just like, I'm fed up. We're not getting what we need. You need a wake up call. I can't just sit here and try to pretend that I have the resources I need because I don't. And the way he talks about it, they're just not investing what a company in their position needs to invest in what I call mundane safety either. And in fact, I haven't investigated this further, you see this in a their reports that 4 0 has much less tendency to refuse inappropriate requests like building a bomb than GPT-four turbo or its rivals. That somehow this new model that was sent out is actually just very not robust in the jailbreak slash mundane harm zone. It just is very well your request. And I haven't tried to stress test it or red team it in this sense, because why would I? I don't think it's particularly a dangerous thing for the most part to have it be jailbroken. It was already jailbroken for those who care enough. But what's very clear is that the 4 development process did not involve a robust attempt to make it safe in a conventional sense. They mostly just decided that the abilities that this model possessed were not so dangerous, and they just weren't going to give it much care.
Nathan Labenz: (10:42) So I'll see a chain to the OpenAI fence, or what do we do? What do we do here?
Zvi Mowshowitz: (10:49) I think we have to treat going forward until we see, could respond with an amazing set of commitments. He could respond with a new set of hires of people he's bringing in. He could do a number of things. So, you know, we're always hopeful. And as much as AI moves fast, I don't think we need to move this week or anything like that, right? And part of it is that it's very clear from Jan's testimony that we don't have to move this week. That if Jan and Ilya and others were concerned about something imminently happening, they would have said so. It is very clear from this statement that these are long term concerns driven by things that the ships get steered slowly, no matter what you do. So they have time to fix it. But barring evidence to the contrary, given everything that we've learned, I think we just have to assume that OpenAI is functionally a fully for profit business, fully a move fast and break things hockey stick graph startup business run by Sam Altman, who is running it on that basis, and that they are not taking the safety problem seriously. That their culture internally is hostile to the idea of safety, the idea of worrying about things, and that therefore we should expect them to handle this future battle. That we should expect them not to be prepared for what is to come, to be extremely cavalier. And there's the possibility that for a while cavalier works out. A cavalier GPT-five might well be the best thing to happen for the world, but it's possible if they're just not that dangerous. We talked about not safe for work. We talked about sex and gore and other capability. And, yeah, it might just be good to let that stuff happen. It might be completely net positive. Right? I actually expect that. And plenty broke all the major models. He broke all of them fully, and he broke GPT-four point in about 2 minutes during the announcement speech, right? Like the moment he got access to it, his first hunch just works, because why wouldn't it? But at some point, that's going to stop being an acceptable situation. At some point, these things are going to be highly capable, and we're going to have to worry about societal implications, and then catastrophic risk, and then existential risk. And all that's probabilistic because you don't know when it's going to happen, right? Like when GPT-five comes out, you're probably not going to be in existential situation at all, but how many nines are you going to put on that statement? And if your answer is more than 2, you're crazy, right? And I would probably put 1, right? So like you can come crazy and you said it was only 97% and it turned out not to happen. I'm like, well, Chalk it up to a slight slightly worse calibration than you on this 1, and let's keep betting on sporting events. But I don't know what else to say.
Nathan Labenz: (13:35) Hey. We'll continue our interview in a moment after a word from our sponsors.
Nathan Labenz: (13:35) I assume you have seen the Dwarkesh interview with John Schulman out this week. So as it's reported now that he's taking on the responsibility for safety writ large. I guess he was already responsible as Dorkesh presented him in the interview, he was responsible for post training the models. As described in some of these articles today, he's responsible for making today's models safe. Now he's gonna have this kind of additional responsibility rolling up to him of, like, big picture long term safety. In that context, that interview, I think, is going to give you serious pause. Obviously, you can evaluate it for yourself, but there were multiple Just
Zvi Mowshowitz: (14:21) hearing what he was doing previously. This is you took the mundane safety guy who is doing a job with mundane safety that it's not necessarily better or worse than the rivals, at least until I haven't evaluated 4, right? The sentence and there's again, reports that there's potentially a problem, but that's just completely irrelevant problem to the problem he is now being asked to solve. And if he was doing that job properly, then like he would minister William mentioned that they have these concerns about the next generation of models not being prepared for, that should be part of the job as well. But yeah, the idea that you're going to use post training on your AGI, or even your ASI, right, to render it like HHH or like net useful or any of those terms, I think that's basically a pipe dream, right, to rely only on that. In fact, you have to worry during even the training regimen of some of these things, but just the kind of strategy certainly, there might be a way, but the strategies that he's been using so far are the kind of things that Leike himself acknowledged explicitly, like both online and literally 5 feet away from me during a talk, would not work, right? He took an 80000 hours podcast, he explained why RLHF is not a solution to this problem, and he has other approach solutions that he wants to try, where I don't think they'll work. And I tried to debate him and explain to him why they wouldn't work, and I was unable to make my case sufficiently convincingly, and he didn't buy it. I think it was reasonable for him not to buy it in a sense that like I had, I was making some very bold claims, they're very different from the world view, and I didn't back them up specifically enough. I think it was hard for me to do without technical training. But Leike was thinking about the problem, right, on a different level than someone who's thinking about post training, who's thinking about the problem. So I haven't seen the interview yet. Again, I wanna be very careful with it. But, if think it's gonna give me pause, I'm confident you're right. It's gonna give me pause.
Nathan Labenz: (16:14) A couple well, just for folks who might be listening to this and might not have time to go through that or inclination to, it is definitely worth it. It's very interesting in multiple ways. 1 of the things that he says is that he's expecting, quote, unquote, AGI on kind of a 2 to 3 year time frame. Dwarkash asks him, like, could it be as soon as next year? And he's, I don't think so. That would be surprising. But 2 to 3, he was pretty much willing to cosign on. And then, Kash asked a number of questions that were, like, pretty fundamental. What happens when this happens? How are we gonna deal with it? Or, yeah, 1 point he said, I'm not sure that's a doesn't sound like a super robust plan that you're outlining here. And at mean, I I appreciate the candor, but at multiple points, he was like, yeah. I don't really have a great account for how that's gonna go, or hopefully, we'll be able to work together with the other leading developers in that situation. And, yeah, we don't really have a robust plan for that at this point.
Zvi Mowshowitz: (17:12) Second best plan, right? The best plan is to actually figure out what you're going to do and how we're gonna handle this. And the second best plan is that, no, you don't know, right? To start with a blank beginner mind. So he comes to this, if he comes day 1 and he says, we don't have a plan, we don't have a solution, we don't know how to align this thing. We don't know what to do with it if we did align it. We don't know how society can handle this. We don't know how to make the transition from AGI to ASI. Again, even if we handle alignment, even if we handle the interim, we don't know any of these things, we're lost. We need to figure this out. And he starts from scratch. I'm perfectly happy with that as the answer for a person who's highly capable. He founded OpenAI, done a lot of impressive things. Like Ilya, I thought, Ilya cares deeply about these issues and appreciates the depths and scope and stakes of the problem, but I always thought, and I still, I mean, going along with Ilya's publicly stated remarks, I just don't think any of the things he was thinking about are anywhere near, like, what you need to be thinking or what would work. Right? My key stuff wouldn't work when I thought, like, he was reasonably grounded and trying hard. Ilya's stuff just felt like it often felt, okay, that just seems like a misconception. It feels like you're thinking about this problem in a kind of fuzzy, like, not sufficiently geared way often, and in a way that, like, just wouldn't survive an encounter with the enemy. Right? But when he looks at when he and Leike and the other people run their superalignment team and they build these papers, they run these experiments and they actually try to make these things work, they'll understand that their plans aren't working. And they'll either find ways to modify them so they work or throw them out and try new ones. Because 1 of the things Ilya is famous for is the kind of attitude of, you know, I'm gonna try this as many ways as I have to, and throw out everything I think I know until I make this thing work, and I'm committed to making this thing work, right? And that's what you love to see. And that's why I had a lot of faith that Ilya and Jan would find a way, not every time, because I don't think this problem is even necessarily theoretically solvable by humans in a reasonable timeframe with any attitude, But I thought they had a reasonably good shot because with a lot of resources and a lot of time, and 4 years is somehow, some need a lot of time, they would figure out at least some ways not to align an AI, and then we get to try again. I think their first way definitely won't work in my model is that's fine. They were a mix of ideas that were like reasonable and I thought promising and they're hopeless, but nobody knows how to solve these problems. So I can't really get that mad at you for being excited by ideas that I think won't solve these problems. Do I have a better suggestion? You try something, you learn something, you try again. And I think there's a decent chance that looking at these generalization questions, looking at these supervision questions, will inform your approach to trying to find alternate solutions that might themselves be more promising. If none of those types of solutions, that entire solution class is entirely hopeless, like the worst case scenario that like it is as bad as it looks to me on first glance and there's no fix it yet, then those are pretty bleak worlds in many ways because I cut off a lot of people's plans. So I'm leaving room for there being a system maneuver there.
Nathan Labenz: (20:33) So with this clarity and while we await a proper response from Altman and leadership, what else do you think people should be doing? Joking, but maybe I'm not entirely joking about chaining oneself to the OpenAI fence. There for reference, there was a person who's done that this week. Yeah. There is, you know, like, mundane consumer protest. Now that ChatGPT is free, it's gonna be hard to cancel our accounts, but we could all we could rally app developers to boycott and switch to Claude or something. We could obviously, on the record on being at least, like, gen generally positively disposed to s p 10 47. We could think about supporting that even more, you know, forcefully or suggesting possible amendments to that to strengthen that. Something like the nondisparagement clause being made illegal could be an interesting 1.
Zvi Mowshowitz: (21:30) When when I looked at 10 47, I wrote it up. I was mostly looking for ways in which it was too strong. I identified a number of ways in which this bill might go too far in the sense of it has these serious downsides and unless we're getting a lot in exchange for these downsides, that makes it politically hard to pass, makes it harder to get buy in, it makes it harder to get cooperation. And nobody actually wants to tank the economy. Nobody wants to actually slow down the mundane utility, not nobody, but I don't. And so how can we strengthen this bill? The exception being the derivative versus non derivative definition clause, where I thought it was just a bug. It's literally, there's just a major definitional mistake in this bill. Maybe it's just making it worse on every level. We need to fix it because if I can pass off all of my blame to you, that makes your situation terrible. But it also makes all the safety guarantees something I can skirt. I can then ignore anything. And that's not good either. Right? Like we can't let anybody cheat, we have to stop this. So, some of others were about like, how do we weaken this? How do we I don't know if we can the right word, but how do we clarify this bill? How do we prevent potential overreach or misinterpretation of this bill in a stronger sense? And other people pointed out ways in which the bill was like potentially too weak. Everyone was complaining about the criminal liability, only under perjury, maybe that's not enough. Right? And 1 could ask potentially, and some people did. I think for now it is where we want to be. And I think reaction to even the perjury showed us that this is just a third rail, and people just get so scared of such things that let's not it might have very bad dynamics for the good people to stop touching things, or they panic, and they don't want that to happen. But yeah, we have whistleblower clause in the bill, and if you don't enforce non compete agreements, right? This seems so much worse than a non compete agreement, right? An ordinary non compete agreement, which now not only California is not enforcement forever, but now, like they're going to be illegal for high paid, all but the most prominent employees, the entire country because the FTC doesn't unless that doesn't go through. But yeah, I have a hard time believing that it is in the interests of the public of The United States to allow a company to hold most of somebody's wealth hostage to signing a lifetime full non disparagement clause on the company they are leaving, in an area in which things that are wrong are in the vital national interest, right? If there is something wrong with the safety at OpenAI, that's something we need to know. If there's anything wrong with the culture at OpenAI in other ways, that's something we need to know. If you have a whistleblower provision, this is like, how do you blow the whistle if you're not legally allowed to blow the whistle? Or the price is millions and millions of dollars in equity, right, that you can't sell, so they can just confiscate it. So even they don't even have to sue you in a survey, they can just confiscate it for that situation. And then who knows what else they might be threatening, or holding over people, but you don't know again, because we can't talk about it. So I can't know these things. So we have to assume the worst in some senses because we can't talk about it. And, yeah, I think it would be very reasonable to say that AI companies should not be able to sign non disparagement clauses as pertains to certain aspects of the company, certainly, and potentially universally. Like, I think it's just why is it good for the 2 of us to get to an agreement where we agree that no matter what happens, I can never say anything bad about you and you can never say anything bad about me? I understand why it's better for us in some sense. But, you know, people not being able to talk in that way just doesn't seem great. And if we're in the business of not enforcing contracts that are against the public interest, this seems like a prime place to look. Right? Even though I have libertarian instincts that like avoiding contracts is bad. Yeah. This seems to be a reasonable place to consider that, obviously. But beyond that, well, if you're not going to do the safety work yourself, necessarily, someone has to make sure that you do. Right? Someone has to be doing the check-in. And this is a reason to doubt, right? If I'm going to trust a company, I have to be able to trust their commitments, I have to be able to trust their statement, their testaments, and I need it to be under some sort of punishment, right? Like this is the whole idea, right? You lie, right? The idea is that you don't have to do anything. You have to tell us what you are doing, and you have to be held responsible if you lied your ass off. That is what the key provisions of SP 10 47 are about. They're about saying what you're doing and being responsible if you didn't do it, right? And saying what your logic is and being responsible if your logic is a lie. It just makes no sense, willful disregard, beyond the pale, that's what it's about. And also just having a mechanism where if you discover that there is actually catastrophic risk in the room, you can get the model shut down. Both that they have the ability to shut it down, at least locally, and that you can order that. These are the fundamental things that this bill is about. So these things seem important now more than ever, right, in the light of this information. But fundamentally speaking, yeah, I think we have to understand that until proven otherwise, OpenAI is much less of a confusing than it was a week ago, or 6 months ago. Right? There were reasonable arguments to be made when they announced superalignment, when they put out their reasonably good They
Nathan Labenz: (26:54) made some of them myself.
Zvi Mowshowitz: (26:55) Yeah. They made it they put out a reasonably good fairness framework. Right? Like they've done some good things. They've hired a bunch of good people. They had a bunch of people who moved in circles where like they're credibly spending a lot of their time talking the right ways, asking the right questions, even if I don't agree with their specific beliefs. And a lot of that's just gone now. Their credibility is shot from a safety perspective. And so I think it's a lot less confusing now. And yeah, I think that if you have a choice in whose technology to use, in some sense at this point, and you choose to go with OpenAI's in a way that matters, well, this is part of what you are considering, part of what you're doing. And I think it also means that you are taking on a risk, a concrete risk yourself, because I don't think you should necessarily trust their mundane safety in this world going forward. I think it's I'm still just again, there's just not much that can go wrong with the GPT-four O that I am that scared of, but they don't have a culture of safety. You need a security mindset to build AIs, to make these AIs do the things you want them to do and have it not go up in your face, even in an ordinary normal mundane way. You need to be thinking about these problems and working on these problems and giving them the respect they are due. And so, in the Anthropic business case where we deeply care about this and we have a culture of this, which they clearly do amongst their employees, right? Where they encourage this concern, where they have this concern. And we're gonna make sure that when you use our product, you get what you're trying to get and not something else you did not expect and did not want, becomes a lot more interesting, right? And then where Google lies on that spectrum, can evaluate for yourself. I'm not saying there are any angels in this room. I'm not saying there's anybody that I trust, but there are levels. And again, like, we'll see what Altman says next week.
Nathan Labenz: (28:45) Hey. We'll continue our interview in a moment after a word from our sponsors.
Nathan Labenz: (28:45) What do you think about the revisiting the notion of third party testing as well? This was something we chatted a little bit about prior to my interview with state senator Scott Wiener and who, of course, is sponsoring the bill. The current bill just says including third party auditors as appropriate. And now it does seem, okay. Maybe we step that up a little bit. But we got this AI UKAI safety institute not quite getting the access. Now we have the resignation clearly in protest. You're a good game design guy. How would you think about designing that game so that the right people get the right kind of access to do the right kind of testing and it doesn't collapse into
Zvi Mowshowitz: (29:38) something The thing that I emphasized when I analyzed both the preparedness framework and the responsible scaling policy for Anthropic was these are potentially very good policies. If the spirit of the rules is being honored, if the people at these companies care about safety, they have a culture of safety and they don't just look at this as a bunch of check boxes to get through so that they can get through compliance and release their thing and satisfy the scolds, but they genuinely care about this result, and when the answer comes back, technically you passed, but that's funny, The response is no. Wait. Stop. Think about what's going on. Oh, well, that wasn't okay. If something's going on, need to investigate this. We need to stop this. Right? And you you react accordingly. And certainly you can't do it if people are potentially gaming to sabotage benchmarks to be like under worse of thresholds. If you're worried about them targeting the task, they know what questions are gonna be asked, they know what exactly is gonna be the attack surface that checked, they strengthen that particular attack surface. These things are death. Right? If all you have to do is verify the AI doesn't fall or cause a problem in specific ways, I think you're just toast. And I think there's no set of tests that would be at any reasonable cost that could possibly satisfy that. And that's why you need third party testing unless you deeply trust the people who are doing this, right? Do you trust them not to teach the test? Do you trust them to look for anything at all, not just things that are specific? Do you trust them to isolate these groups, right? I can imagine a world in which I would have that trust, but after what we just saw, do you have that trust for OpenAI? And you don't know that I do, right? In the sense that I don't, definitely don't. But we saw that like, you know, I trust their benchmarks, right? When OpenAI comes up with benchmark, I trust they're not giving the benchmark, they're not trying to do it in that way. I trust them for 2 d 5 in the same way they did previously. Now, maybe not as much, right? I don't know what direction they're going to do these things in, but as I think Colin Fraser was the first person to say, let's not assume they invented the operating system from her until we get our hands on it. Let's not jump to conclusions in any way, not just safety, but also capabilities. Because if you are a hype machine that is trying to hype, that is a very different world than what ChatTPT was. ChatTPT was just here is our very sterilely named, very simply presented, very clean. I still give major props to them for not having glammed it up in various ways, but there are things to love about OpenAI, but they just presented this thing, I'm like, hey, here's a cool thing, Let's see what you do with it. We're not going to tell you how awesome it is. We're just going put it out there. And the GPT-4 went the same way, mostly. And now this is the third, I think, time they've gone out there and gone here are amazing abilities on which we don't have yet, hype hype hype. And that's very different. And then Sam Altman went on Twitter the day after Google's IO to gloat about how his hype was so much better vibe than Google's hype without but he hasn't addressed to my knowledge any of the concrete things that Google announced, or any comparisons as to who's building the better product, even to just congratulate them on a great set of offerings or anything like that. He's just like, oh, get a hold of the nerd, basically. Like, they're trying so hard. He's in that lane. That's not a good sign either.
Nathan Labenz: (33:05) Yeah. It's not great. It's not great all the way around. It does and I would say I definitely strongly noticed the the shift from the earlier releases to the current releases. I would say last fall, the demo day, was really the first time where it was like and even that was more buttoned up than this 1, but it was like the GPTs weren't really ready. They didn't really work that well. And it was the first time where it felt like, man, you guys shipped this even though it wasn't really in shape to ship.
Zvi Mowshowitz: (33:37) GPTs, I'm not safety there's no safety concern, but they felt like vaporware. Right? Like Didn't
Nathan Labenz: (33:43) work. Yeah. The retrieval was not good. I think that has been improved, but it was a few months later that and I need to do a little more testing with this myself. But what I've been hearing in my, you know, app development circles is that the assistance API has rounded into form now where the retrieval actually does work much better than it originally did, and it's more like what they described in that first release. But that's relatively recent, and it's been a number of months. And this 1 just seems even messier where it's like, they weren't even really clear on what we were supposed to be getting or, you know, people are just all confused at the moment. I was confused. I was trying to get ChatGPT to modify a picture of me and my son, and it was not doing it right or whatever. And then I was, what's going on with this thing? And then I went into go on Twitter and I see, oh, okay, I'm still using the old system and it's just not clear to me what I'm even dealing
Zvi Mowshowitz: (34:33) They're not updating a lot of the angles and they're not being clear and experienced by a lot of people complaining, oh, I just realized I'm not using the new version of this. I understood it from what they were doing. Like I was being a journalist and being paying very close attention, but like to a normal person who's paying ordinary attention, it was very unfair.
Nathan Labenz: (34:55) Well, any other thoughts on I guess 1 other question I had was, are there any new safety related agendas or developments that you think are worth extra attention or that, you know, that we should maybe be increasing our bets on at the moment? I'm always looking out for something that seems like it, you know, really could work, and I'm always struck by the fact that
Zvi Mowshowitz: (35:17) The most interesting thing to happen that I think is getting no attention at the alignment. So did you hear about Sofan?
Nathan Labenz: (35:24) I don't think so.
Zvi Mowshowitz: (35:25) So I mentioned this a few I I forgot which week how many weeks ago it is because time blurts. But so the Chinese have proposed a technique called a sofan because, you know, let's build a SOFAWN from the ancient from the dystopian offering the aliens have prefaced us using SOFAWN. You read the book? Anyone? See the show? But the idea is you can track a model, open source or closed, but in particular open source, in a local maximum with respect to certain specified topics. Such that if you attempt to fine tune it, to get it out of the local maximum, it won't work. Ordinary fine tuning techniques to try and escape from a failure won't work. So your proposal was you would actually teach, not just not, because like if you don't teach biology to Lama 3 or Lama 4, let's say. Lama 4 doesn't learn biology. Well, there's only so much biology that you have to learn, even a bunch of textbooks, suddenly it knows biology. Right, even if you somehow manage to not have it learned by implication. But now the SOHOMOM Produal is, you can specifically teach it to not understand the biology and be really dense about it. The way that certain people are like, I can't do math and just refuse to learn no matter how much you teach them and how many examples do you have, right? Because they had trauma. You actually give this in trauma, right? So the idea is, if you can make sure that the thing can't learn biology, now you've got an open model that can't give a bio, potentially, very early. We haven't run it through its paces, we haven't tried it at scale, we haven't, but it's an idea. It's the beginning of the first proposal I have ever learned that's in theory, maybe we could do something that raises the cost above epsilon compared to the training software model, to take somebody's general model and turn it to whatever specific end we have in mind. Right, maybe this will start to require enough work that we're not just making it easy. And if we can do that, now we still have the problem of you have to enumerate all the specific things you want it not to know. But you have to figure out all the things you want to stop it from knowing and block them. Again, very similar to what we saw in the book, not a spoiler, but the idea being, you see this in sci fi all the time, right? You see the villain or the oppressors or whatever it is, and they say, oh, all that matters to us is that you don't do x y z, because if you can't do x y z, we're fine. And someone finds out a way to do w, right? Someone finds out a way to do something that they don't detect, it doesn't count to them, they just don't understand what's going on, and their response to that is to ignore it. Constantly happens, right? People start doing weird shit, right? And the aliens that are trying to take they've taken over the enterprise or whatever it is. They go, I don't know what weird shit is going on, but whatever. Whereas the correct answer, of course, is I don't know what weird shit is going on. So stop what you're doing until I know that's not okay. But, you know, if you only can have it kind of scripted, like I detect you do bio weapons, you can't do nuclear bombs, you can't do chemical weapons, you can't do cyber attacks, blah, blah, blah. Well, that's fine for now. That's incredibly helpful for law 4, but it's not that helpful for law 4, even if it works, because it's no longer gonna be the threat that you knew was common. You're not gonna be able to enumerate what a smarter thing than you comes up with, right? So it works up to a point, but it's still incredibly helpful and it potentially raises the bar quite substantially to the point where maybe we can all reach an agreement if this works great, I don't know. But, you know, if you want like a moment of hope or something to end this with, there are at least some proposals.
Nathan Labenz: (39:11) Yeah. That's good. I hadn't heard of that, and it definitely sounds like something I need to go do a little more homework on. Anything else? Because you're 1 for 1 in terms of new and very interesting pointers there.
Zvi Mowshowitz: (39:23) Yeah. I I haven't seen that much in the alignment sphere lately, unfortunately. Other than it's even new evidence that things won't work particularly, but it's just been, like, relatively quiet, I would say, on that level. And I guess something has to be quiet. Right? You can't have everything happening all the time.
Nathan Labenz: (39:39) If you were to pitch a movie concept that you think would be most influential right now in the way that, like, her seems to be inspiring the current moment of technology, mine might be the social network meets the Lord of the Rings where the sort of central figure would be the Sam Altman type who is on this, like, meteoric rise of technology, but is also being corrupted by it in the way that the ring is corrupting.
Zvi Mowshowitz: (40:12) You're an example just because the ring was a metaphor for the actual AGI for decades.
Nathan Labenz: (40:18) I It just And that seems like the story that we might need to all hear.
Zvi Mowshowitz: (40:22) You should talk about the fellowship taking the ring to Mordor as a sort of metaphor for some of the things that might happen in some scenarios. But yeah. Certainly, could tell that story. I think and my instincts tell me that's not the the most interesting approach to that. I think if I was going to do it, I think I would maybe just do a very kind of straightforward AI takeover scenario with not such a smart intelligence anywhere. Just show the humans only giving up control. Show the humans because they have everyone's individual interest, no 1 can stop it. Just show things just spiraling out of control. 1 thing leads to another. There aren't even any idiots, and there are no villains. Things just go wrong, and there's that. You could also have a law and order artificial intelligence set in 2035. That'd be fun and interesting. So I the idea being that, well, there's not these open models have given everybody these extra capabilities. We have to be very proactive about hunting down people who try to, like, implement catastrophic threats. And then obviously, the police have all their AIs, and everyone's acting on a high level, but it just it's 1 of these things where you notice that, like, the world almost blows up every other week. That I think we can
Nathan Labenz: (41:39) write that. That that's 1 of the challenges that I have with this in general is I feel like the the leap from here to there is tough. Like, what is that actually? What does the procedural look like? Can you imagine that getting concrete enough to be shown on TV in a way that
Zvi Mowshowitz: (41:57) Procedural, right? Is Star Trek not a procedural in its own way? Right? We explore a strange new world. We find an ethical dilemma. We find a technical so we got a technological problem, and we have our debates, and then we encounter a setback, and then we we implement our solution, we solve the dilemma, we go on our way, right? It's not that simple, but it also is, right? And so, you find a way to do a version of that, naturally. But yeah, like a fun game for watching any sci fi show is, note how often things almost go horribly wrong. Just watch a season of any Star Trek, and watch how often the ship almost blows up, right? Or somehow the Federation is almost in dire danger, right? And how often this happens because someone was being a complete idiot, and how often it happens so naturally, but like being kind of all these different problems. And then ask yourself, well, if you just looked at the 45 minute mark of every episode, you had to assign probabilities that this wasn't a narrative someone wrote, what's the chance that humanity would have survived from here? Or what's the chance that ship would have survived here? What's the chance the enterprise actually makes it for 7 seasons? The answer is 0. Right? It's so challenged in so many different ways. The federation's probably does too. The federation like, is in a lot of trouble reasonably often. Like, like, we can get out of it. We wrote it. But this is not a utopia in the sense that if you actually were there, the safeguards aren't there. Right? We have no there's no robustness in this world. This world is fragile. The Star Trek universe is so fragile. And so we get lucky a lot, but like, why are we getting lucky unless it's a cube protecting us and stuff or the travelers, in a way that we don't understand. And so you carry forward. The other game is like how often does somebody come within 5 minutes of building an ASI? Right? Like, often would I just run completely random here if you didn't have the rule of mysterious development? And then also is like scenarios. So you can always have an ordinary sci fi show where it just runs a normal sci fi world, except that every now and then, and by every now and then, 1 episode in 3 or something, someone like accidentally follows through on logic and super intelligence emerges and everybody dies or someone takes over the world or everything is or or some new regime take happens or there's a there's a who knows? I haven't thought this through a brainstorming with you. But the idea being that like, imagine if you got to just And then of course the world just like, you see the rewind where everything goes back in reverse and then the person just chooses not to do it. But like once every episode or 2, like somebody almost entered the world they just decide not to. And there's really no explanation why they don't, probably.
Nathan Labenz: (44:41) Yeah. The sort of garden of forking paths is a pretty interesting idea. I minded of the 3 body problem too has part of its story kinda goes that way where the the civilization is being restarted and rerun over and over again, and it just ends at various times and then get it gets booted up again. But it's like they last different lengths of time. And some of them are short and others are longer, but they all kind of end and get rerun. And I do think that would be an also a pretty interesting way to present the future that, like, some branches of this tree are terminal.
Zvi Mowshowitz: (45:18) Yeah. The 3 body problem is such a weird I don't wanna spoil anything, but it's such a weird I'm doing I'm gonna do my best not to, but it's such a weird mix of this kind of fully cynical, hard real hard realism beyond what I think is even accurate, where like the universe is this cold place that wants to kill you so badly and like you can afford not the slightest bit of kindness and decency if you want to survive. At the same time, you you don't just all die in some of that. I'm like, there's a book 2, a book 3, which is not the point where the book exists. And the book isn't like what happens to Tricholera after we get wiped out. The book is about people in some sense. So
Nathan Labenz: (46:00) Okay. Last question. Do you find yourself shifting at all in terms of your sense of whether or not we may be in some form of simulation?
Zvi Mowshowitz: (46:10) Simulation hypothesis has always been essentially all of the value lies in the world, lies in the places where we're not in 1. If there's if you take you make 9 simulated copies of me, and there's me, and you put us in 10 copies of the situation, but 1 is real and the other 9 will just be like, recording in a videotape and viewed back later or something. Well, shouldn't I just act as if I'm the real 1? Isn't that just currently the correct strategy? Even if there's 9,999 of them, maybe I still have the correct strategy. Or rather, if you are the ancestor, if this is an ancestor simulation, and so then 20000 years later, we try to run a bunch of sims of the ancestors. Well, the ancestors who decided they were to, ancestors who, when faced with the ancestral situation, figured out that given the situation, were probably in an Anthoser simulation, and therefore didn't need to actually make sure that the civilization progressed to a point where they could run the future Anthoser simulation simulations. Those guys don't get simulated, right? Because those civilizations don't make it. There's a real sense in which your only simulation hypothesis is only valid if you treat it as invalid, or something like that. Right? You have to take the situation seriously. And also, what's the point of a simulation when a person's type out and acts like it's true? There are bunch of movies like that, right? No spoilers, even naming them. But in general, the point is to treat it as real. The whole goal is to treat it as real. And I don't really see any, I don't know, right? Obviously, there is some probability that this is a sim of some kind. I can't rule it out. I will sometimes jokingly refer to things in that kind of way. The writers were a bit on the nose today, and 1 of my things I'll sometimes say, but you can't take it seriously in in in the sense of changing your behavior.
Nathan Labenz: (47:58) What about moving to The Caribbean and unplugging? I I just saw an interesting tweet from Amanda Eskel from Anthropic the other day where she said, I don't think AI is definitely gonna kill us all. I'm not a doomer. If I were, I wouldn't be working on this. And she was also kinda like, I do think it's a real risk, but, you know, I think it's something that we can shape and, hopefully, I can have an impact on. But if I really was a a doomer, I would just head to The Caribbean and spend the rest of my days there. I do know I do have a close friend who basically has that attitude that he's like, I just wanna enjoy the good times that we have and not worry about it too much. And then she also said the downside of this or flip side of this is if I ever do get burned out and and decide to take some time off in The Caribbean, people will take it as a sign of doom.
Zvi Mowshowitz: (48:42) I saw that too. Yeah. I think it was Jeffrey Miller. I'm not sure exactly who it was though, who said it wouldn't work, the food would turn to ash in your mouth. You would get no joy because you would know this. And I think that that was that's largely true for me. Think no 1 had just walked away from this thing and was ignoring it. It just wouldn't sit well with me and I wouldn't be able to just go hedonist. It just wouldn't work. Right? You can, you want Mr. Reagan to plug back into the matrix. Well, you need your memory kind of wiped in some sense. You need to really not notice for some people. Where I like kind of enjoy, I enjoy fighting, I enjoy struggle, I enjoy the striving to do better and to solve problems, that's what my thing is. I would never go to The Caribbean because what am I doing in The Caribbean? Right? I'll just be bored week anyway. But unless I'm just like posting, making sports betting, book making decisions again, that's the only reason I've ever made to The Caribbean and had a good time. So there you go. But I think like, it highlights by the way, the fact that the word doomer is just a slur and has been completely misappropriated and misallocated. Right? Because who is a real doomer? The doomer is the person who says that there's nothing to be done, that what we do doesn't matter. Right? Doomer is the person whose p doom is 0.9 bar, or otherwise just it's all over. Like, there's nothing you can do. And we see those people around climate change, doomers, who think that humanity is doomed and there is nothing you can do about it. Your decision doesn't matter. Whereas if you think your decision matters, as a man to point out, that makes you not a doomer. Yeah. So what if it's we could lose, we could win, and you can help fight. This is an ancient, Judaic idea. Right? The universe hangs in the balance. Right? The scales oscillate and it could be up to you which way they were good and evil and God's judgment. Like you can decide like which way this goes. And obviously this isn't a God judgment thing. It's not a, you know, there isn't a moral tone to this. It's about solving a problem. But you have to take it from 34.007% to 34.008% chance of victory. That's a great life. Right? In some sense. But there's so much look at all the mutil, including just for you. Imagine what happens. But yeah, like if you don't, time and waiting to wrap up the problem, then you just decide to go off and do something else, makes perfect sense to me. And you can't burn out, right? If what Amanda needs to do once a year is sit my ties on a beach in The Caribbean for 2 weeks, so that she can regain her mental health and she can go back and resume, then she should do that. She should do it for an entire year after 5 years of working, because otherwise she won't be able to pick a career or make a decision, then we should do that. There's nothing wrong with understanding your limitations. Like we can't Life is not all the fight. Right? And I think an effort to like work on other things, think about other things, meet with family, I try to have fun. I'm going to Madison Square Garden tonight, with my old magic friends, and we're going to have a watch party, where 40,000 of us Knicks fans are going to watch game 6 on a video screen because they're in Indiana. It's gonna be fun as hell, Right? And I don't claim that I'm saving the world here. I'm not. I just wanna read about that.
Nathan Labenz: (51:52) Yeah. I feel you. I think it's probably a good place to end it. Any other closing thoughts?
Zvi Mowshowitz: (51:57) Indeed, you've many amazing and wonderful things and weird things come to pass. And best of luck to everyone. And I look forward to, you know, with his response and where all these people land next, you know, like, who are, you know, this is some great talent. Someone's gonna snap them up or they're gonna do something. So we'll see what happens.
Nathan Labenz: (52:16) No doubt about that. Well, the saga will continue, but for now, I appreciate the extra time today. Zvi Mowshowitz, thank you for being part again of the Cognitive Revolution.
Zvi Mowshowitz: (52:27) Absolutely. It's been a pleasure.
Nathan Labenz: (52:29) It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcr@turpentine.co, or you can DM me on the social media platform of your choice.