OpenAI, Anthropic, and Meta | Analyzing the AI Frontier with Zvi

Watch Episode Here

Video Description

This isn't news, it's analysis! Nathan Labenz sits down for an with Zvi Mowshowitz, the writer behind Don't Worry About the Vase to talk about the major players in AI over the last few months. In this extended conversation, Nathan and Zvi debate if AI has attained the intelligence of a well-read college graduate (per OpenAI's Jan Leike), a live player analysis, and the role of independent red teaming organizations. If you're looking for an ERP platform, check out our sponsor, NetSuite: http://netsuite.com/cognitive

Definitely also take a moment to subscribe to Zvi's blog Don't Worry About the Vase
(https://thezvi.wordpress.com/) - Zvi is an information hyperprocessor who synthesizes vast amounts of new and ever-evolving information into extremely clear summaries that help educated people keep up with the latest news.

TIMESTAMPS:
(00:00) Episode preview
(03:15) Is AI as intelligent as a college grad?
(07:45) Memories and context processing
(15:45) Sponsor: NetSuite | Omneky
(17:13) Is AI as intelligent as a college grad? cont'd
(20:47) Strengths and weaknesses of AI vs human
(31:05) OpenAI Superalignment
(37:23) The relationship between OpenAI and Anthropic
(44:31) Anthropic's security recommendations and adversarial attacks
(50:50) Is OpenAI using a constitutional AI approach?
(01:01:26) Context and stochastic parrots
(01:10) Is more context better?
(01:15:29) Should Nathan work at Anthropic?
(01:21:35) Google DeepMind's RT-2
(01:27:47) Multi-modal Med-PaLM
(01:31:50) Speculating about Gato
(01:35:10) Skepticism about Med-PaLM usage in radiology
(01:41:37) Llama 2 - what is going on at Meta??
(01:51:14) Llama 2 vs other models
(01:55:29) Who are the live players?
(02:01:38) China's AI developments
(02:02:41) Character AI and inflection
(02:05:26) Replit as the perfect substrate for AGI
(02:10) AI girlfriends
(02:18:53) AI safety: The White House
(02:25:43) Bottlenecks to progress
(02:35:27) Can new players influence AI policy?
(02:39:00) Liabilities
(02:47:54) Independent red teaming organizations
(02:57:18) Mechanistic interpretability

LINKS
- Zvi's blog: https://thezvi.wordpress.com/
- Google's RT-2: https://www.deepmind.com/blog/rt-2-new-model-translates-vision-and-language-into-action
- https://www.deepmind.com/publications/a-generalist-agent
- Llama 2: https://ai.meta.com/llama/
- Adversarial attacks paper: https://arxiv.org/abs/2307.15043

RECOMMENDED THE COGNITIVE REVOLUTION EPISODES:
- AI Safety Debates with Zvi https://www.youtube.com/watch?v=5yM7fIfxYV8
- Mechanistic Interpretability with Arthur Conmy https://www.youtube.com/watch?v=Y5Pzch7_8MQ&t=5670s
- Med-PaLM with Vivek Natarajan https://www.youtube.com/watch?v=nPBd7i5tnEE

X:
@labenz (Nathan)
@thezvi (Zvi)
@eriktorenberg (Erik)
@cogrev_podcast

SPONSORS: NetSuite | Omneky

-NetSuite provides financial software for all your business needs. More than thirty-six thousand companies have already upgraded to NetSuite, gaining visibility and control over their financials, inventory, HR, eCommerce, and more. If you’re looking for an ERP platform: NetSuite (http://netsuite.com/cognitive) and defer payments of a FULL NetSuite implementation for six months.

-Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that *actually work* customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.

MUSIC CREDIT:
MusicLM

Full Transcript

Transcript

Zvi Mowshowitz: 0:00 One of the things people constantly get wrong is they think about human level as the peak of things. And so once we've patched this and now it works, that's not really how this goes. There is no, it goes from not working to working. It goes from working worse to working better, and then it could always go to working better still. And that's one of the reasons why we should be more worried or more excited or more curious about what's going to happen 3 years from now, 5 years from now, 10 years from now. They're just going to keep going. And the question is, where does that get you? We talk about worrying about China, but I'm more afraid of Meta, right? One individual American company scares me more than all of China right now. If you understand the Yudkowsky and difficulties lessons, in some sense and the nature of what problems you have to solve, or you have leadership capabilities, then you are extremely valuable in those ways. It would be a major mistake to join an existing organization and try to make a difference as an individual, as opposed to trying to spearhead a new organization or at least a new branch of an existing major organization, depending on your skill set.

Nathan Labenz: 1:10 Hello, and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas, and together, we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz joined by my cohost, Erik Torenberg. Zvi Mowshowitz, welcome back to the Cognitive Revolution.

Zvi Mowshowitz: 1:36 Good to be back.

Nathan Labenz: 1:37 So we're trying something a little bit different this time. We are going to do some analysis of what has been going on in AI over, let's say, the last few weeks to a month. You have published, as you always do, a bunch of deep dive blog posts covering everything. And for folks who want your background, of course, we just did a recent episode too, they can go and hear about your worldview and your AI childhood all there. But for today, I just want to pick out some of the most important stories and get your take on them and kind of exchange, go back and forth with some questions, and try to make some sense out of it. And, hopefully, that'll be useful not just to us, but to the audience as well.

Zvi Mowshowitz: 2:19 Yeah. I think the easiest thing is that there's constantly news coming at all of us. And so it's easy to get lost in here's the thing. Here's thing. Here's another thing. Here's another thing. So it's good to step back and dive deep.

Nathan Labenz: 2:29 So I organized this discussion around the concept of live players. There are only so many organizations right now who seem to be really pushing the frontier and in a position to have a meaningful impact on the course of events. We talked last time a little bit about how much does history matter, and it seems like it matters in some ways and maybe less in other ways. But these are the folks that are creating the history right now, the live players. So I thought we would just run it down by going through some of the live players, talking about their recent announcements and releases, and, again, trying to make sense of where that fits into the broader big picture. And starting off, naturally, we go to OpenAI. So reading your blog in preparation for this, obviously, you can't go more than a few paragraphs without OpenAI coming up in one way, shape, or form. But the thing that stuck out to me as the most interesting was the recent comment that Jan Leike had made. And Jan is, for those that don't know the name, he is the head of alignment at OpenAI and along with Ilya, a subscriber, leading the new super alignment team as I understand it. I want to start off by just an interesting disconnect between him and you and maybe me as well around just the power of GPT-4. So before we even get into the speculation about the future, it really jumped out to me that he said, overall, GPT-4 is maybe at the level of a well read college undergrad. And then you came back and said, you consider it to be well below human level. And I have often said that I consider it to be human level, but not human like. And I've sort of been trying to refine what I mean by that in a few different ways over time. But for starters, let's get your take. What do you think is the disconnect between Jan and you there where he sees something like human level and you would say well below?

Zvi Mowshowitz: 4:30 Yeah. I don't think it's about the specific model at all, obviously. I think we both agree that GPT-4 is the dominant model right now and likely will be for some months to come, at least. But I think it's a matter of how do you think about what it means to be at the level of a college undergrad? Or what are we measuring? What are we judging by? And I think he is thinking about it as okay, in terms of ability to just deal with a variety of random questions that were typically thrown at something, how is it going to do compared to the average college undergrad? It's about that level of a well read college undergrad. Whereas that's an important question to be asking for practical purposes, but to me is not the relevant question to what the things are that we're thinking about. And that's one of the when he says he's going to align a human level alignment researcher within 4 years, I thought that assumes that there's going be a much, much more powerful AI 4 years from now waiting to be aligned. It's not talking about aligned GPT-4 and then pointed at alignment that obviously wouldn't do anything. Going to deal with some of your blocks and it's going to increase your affordances and your efficiency somewhat. Maybe you'll be 50% faster with GPT-4 than you would have been without any LLMs. Maybe even 100% faster if you're using it really well and things connect to it really well. And in the context of alignment, obviously having a model to experiment with and bang on is distinct from the thing that we're talking about here, but is potentially necessary. But it's not going to be able to substitute for anything like a human researcher. If you put a well read college undergrad on the problem of something complex like aligning a model, they could potentially begin to make progress. And if you asked GPT-4 to do that, you would get nothing. And part of that is that we had to figure out how to structure how we talk to it and turn it into a proper agent and give it the proper memory and so on. But to me, most of it is just every system has what you might call a raw G to it, whether it's a human or an artificial intelligence. And on that level, I feel like GPT-4 is still well below the IQ 100 median human. It is going to obviously answer my ordinary day to day questions much better than if I asked an ordinary IQ 100 human to help me out with a variety of questions. That's because it has these huge advantages over a human. It has access to orders of magnitude, more knowledge and memory and ability to go through cycles, but there's still this dynamic in my brain where if you don't have the G, problems that require more G than you have become exponentially harder and impossible to do very quickly as you exceed that. And in that sense, the college undergrad had the chance given time and is smarter and GPT-4 is just nowhere near that kind of thing.

Nathan Labenz: 7:39 So when you say that the missing pieces around memory and packaging GPT-4 or successor up into an agent, those do feel to me also like being pretty key missing pieces here. I mean, there are sort of potentially synergies between those kinds of parts of a system being built out and it just being smarter overall. But it seems like those are pretty distinct concepts in that GPT-4 could have a much better memory. Certainly people are working on all sorts of schemes for that and embedding databases. And how do you put stuff into the embedding database? Do you even some of the most interesting stuff I've seen recently has been creating a layer of synthetic memory that sits on top of the raw observational memory that tries to ultimately work its way up into something like a coherent narrative, that could still fit into prompt context length, but summarizes, synthesizes, represents all these detailed memories in a hopefully coherent way, obviously, is what the developers are going for there. Those pieces seem like, yeah, they're totally missing. I expect them to come online somewhat gradually, but certainly over the next 6 months to a year, if not maybe even sooner. And then I am kind of like, it does seem like this GPT-4 with those weaknesses patched, it does seem to me like it would be roughly at that college undergrad level. If those things did come online, would you see that the same way, or you still think it's missing something super important?

Zvi Mowshowitz: 9:22 No. I'm sorry. I'm definitely not on team stochastic parrot, right? I'm in no way on that team. However, I do think in a real sense what you're witnessing is the training data covers the vast majority of things humans do and say and consider in various senses over text. And within the sample of the training data, while you're doing things similar to the training data, it's learned how to pattern match and copy and imitate and work with that. And it has a huge amount of knowledge base and levels of association and the tools to work within that. And if you gave it these other tools, we'll be able to do these things and string them together across more steps in some important sense. But the moment you take it out of its comfort zone, we're asking you to do something that's distinctly different than what has come before to be truly original. I think your episode with the Hollywood writers and they talked about what was going on in the strike and trying to get the GPT-4 or whatever to work for them. And yeah, it was great at generating generic schlock, right? Much better than they could. And if you needed to be okay, somebody get me unstuck, somebody get me some generic schlock based on my situation that I happen to have been written in because this is episode 47 of the show or whatever it is. It could be tremendously helpful. But whenever you ask it to actually do something that we recognize as distinctly creative and original in a way that's distinct from that, they just fall over flat every time. And none of those problems are going to be rescued by any of these fixes. They're just orthogonal problems. I think that's the sense in which you're going to be able to give it more capacities to be able to navigate more of the conventional things over longer periods more consistently, and that's going to have tremendous mundane utility as I call it, or it's going to be a much better functioning system. But the reason why I'm focused on this other question is because I am focused on the question of how dangerous the system is. I'm asking the question, could this system potentially engage in recursive self improvement? Could this program potentially pose a threat to humans? Could it compete for resources? Could it manipulate us? Could it do things that are actively destructive because it uncovers capabilities that weren't in its training set in various ways and other related questions like that. And I don't see the kinds of things that you're talking about that I agree will come online, although I would guess that we will be far from done with them a year from now. There's just sort of too much to do in terms of scaling those up as much as possible, because one of the things people constantly get wrong is they think about human level as the peak of things. And so once we've patched this and now it works, that's not really how this goes. There is no, it goes from not working to working. It goes from working worse to working better. And then it could always go to working better still. And that's one of the reasons why we should be more worried or more excited or more curious about what's going to happen 3 years from now, 5 years from now, 10 years from now. We look at these systems, it's because there isn't going to be a hard cap. We're not going to max out each of these individual capabilities by default. They're just going to keep going. And the question is, where does that get you?

Nathan Labenz: 12:39 I kind of want to look at this from 2 angles. One is going back to the original disagreement, or it's maybe less of a disagreement and more of a difference in framing perhaps with Jan. It kind of what I would bottom line all that as is when you think about a well read college undergrad, you think about high points in that individual human's performance that GPT-4 can't match, and it's not really a question of memory or whatever that's gating it. And if I had to guess, I would say he's maybe more looking at average performance or some sort of floor perhaps, maybe top 90% or whatever. You could frame it in a lot of different ways, but it sounds like you're concerned with high points and he is maybe more concerned with some sort of central tendency sort of measure.

Zvi Mowshowitz: 13:32 I would put it differently. I would say he's concerned with some sense of average level of performance over a range of possible tasks. And I'm concerned with potential. I am concerned with what the capabilities would be if you got a chance to work with this thing to try and make it the best it could be. It doesn't necessarily have to be right now, but the reason why we value children and these undergraduates in these classes, this undergraduate, they're an idiot, right, in some important sense. They know nothing about the world. They know nothing about how to do anything productive. Going to show up on the job on day 1 after graduating from college. They're going to be useless pieces of junk. But the useless piece of junk, they can then learn to be something great. And even then, they're going to only learn a very narrow portion of the things that an individual human is capable of learning. They're going to learn that one job in that one area and they're going to be very, very specialized compared to a GPT-4. So if you are doing generalized tests and comparing these undergraduate who are educational assistant does try to make well rounded in some senses, it's going to beat the well rounded undergraduate because it has this ability to read every book ever written and everything on Reddit and everything on Twitter and blah, blah, blah. But when it comes down to solving a particular problem, if you find the right undergraduate who has focused on the particular thing that you want to know and you give them a chance to use their compute and process because they're not as fast, I think the undergraduate is going dominate you. I think even a relatively normal human being, given an opportunity, will outperform quite resoundingly what can be done in that way. And that's the thing that I care about because that's the thing that's going to potentially both threaten us and also unleash the waves upon waves of super amazing value that we're looking for in the future. It's not just negative, right? If we want AI to solve the problems that we haven't solved rather than just get us nowhere faster in some important sense, it's going to have to be able to do these things, right? These are the things where it really counts. Zvi Mowshowitz: 13:32 I would put it differently. I would say he's concerned with some sense of average level of performance over a range of possible tasks. And I'm concerned with potential. I am concerned with what the capabilities would be if you got a chance to work with this thing to try and make it the best it could be. It doesn't necessarily have to be right now, but the reason why we value children and these undergraduates in these classes, this undergraduate, they're an idiot, right, in some important sense. They know nothing about the world. They know nothing about how to do anything productive. Going to show up on the job on day 1 after graduating from college. They're going to be useless pieces of junk. But the useless piece of junk, they can then learn to be something great. And even then, they're going to only learn a very narrow portion of the things that an individual human is capable of learning. They're going to learn that 1 job in that 1 area and they're going to be very, very specialized compared to a GPT-4. So if you are doing generalized tests and comparing these undergraduate who are educational assistant does try to make well rounded in some senses, it's going to beat the well rounded undergraduate because it has this ability to read every book ever written and everything on Reddit and everything on Twitter. But when it comes down to solving a particular problem, if you find the right undergraduate who has focused on the particular thing that you want to know and you give them a chance to use their compute and process because they're not as fast, I think the undergraduate is going dominate you. I think even a relatively normal human being, given an opportunity, will outperform quite resoundingly what can be done in that way. And that's the thing that I care about because that's the thing that's going to potentially both threaten us and also unleash the waves upon waves of super amazing value that we're looking for in the future. It's not just negative. Right? If we want AI to solve the problems that we haven't solved rather than just get us nowhere faster in some important sense, it's gonna have to be able to do these things. Right? These are the things where it really counts.

Nathan Labenz: 15:46 Hey. We'll continue our interview in a moment after a word from our sponsors. Yeah. It's interesting. I'm certainly concerned with all of that too. I think maybe I'm just more enthused about the mundane utility in the sense of, man, there's a lot of stupid stuff that people spend their time doing. And, I really would love to see them freed from having to do a lot of that stuff.

Zvi Mowshowitz: 16:08 But I think your term is perfect. Right? It's a lot of stupid stuff that humans have to do. Basically, even if you are an average person, you're going to spend the vast majority of your time doing things that do not especially tax your intelligence. They do not especially require you to think hard. They do not put you at the peak of your abilities. They don't put you in a zone. They're just, okay, somebody has to file this paperwork. Okay, somebody has to work this retail counter. Somebody has to cash this check. Somebody has to do this thing. Be nice to this person. Somebody has to make sure that someone has direct. That's good work and noble work, and it has to be done. And physical labor is the same way. If a physical laborer had to do things that were at the peak of their mental or physical requirements more than a few minutes or at most, if small portion of the day, it would break them. And also those jobs just don't exist. Need someone strong so that in that moment you can have someone strong. You need someone smart so that in the few moments when it's important to have someone smart, you have someone smart. If you can then take the bottom 80% of my job and you can do an 80% good job of that so that I only have to do the remaining 20% of that, now 2 thirds of my day is free and I can be 3 times as productive. That's a tremendous leap and I agree that is the potential of GPT-4. That's what we're looking at here is if we understand how to use this technology properly, we can potentially free ourselves from a lot of drudgery and streamline a bunch of stuff and get to do all the cool things. And there are various traps we could fall into. 1 of which is that we automate exactly things we don't want to be automating, not the things we do want to be automating. 1 of which is that the moment we notice that paperwork is faster, now we put in more paperwork and now it turns out that humans are taking just as long to do more useless stuff than they did before. And GPT is just keeping us, letting us treadmill in place. And there's a number of other ways this can go wrong. And also there are various weird dynamics that can happen that can backfire. But yeah, that's what we're trying to do. If you want to get the effect that Lakey wants, the sea change that'll let us solve problems we couldn't solve before, that involves these things being able to do all the different steps that humans could do, because otherwise, whatever the bottlenecks are that are left become your bottlenecks, where you have to translate all of the context back from the machine world back into the human world so that a human can process all of that, then do the hard step that the thing is still faltering on and then transition back. And now instead of getting orders of magnitude more progress, right now we're talking about these factor of 2, factor of 3, factor of 5 style improvements and that's not going to solve the alignment problem unless we come up with something we don't expect, right, in and of itself. That's still worth pursuing if we can do it, right? We still want to do as much of it as possible and had the advantage of not being as dangerous. But it's not the thing that the Super Alignment Project is trying to do. Right? The Super Alignment Project is trying to keep the humans out of the loop entirely, and that should be about as scary as it sounds.

Nathan Labenz: 19:20 Brief digression over toward this tale of the cognitive tape. This is a concept that I've developed for kind of purpose of public communication and just trying to give people an intuition, still in a very literal way, of course, as to the strengths of a human and the relative strengths and weaknesses of the best AIs today. Listeners can see this in the AI scattering report if they wanna go into the whole thing. But do you as you look at that, do you see any dimensions that you would suggest that I add that just haven't been considered, or do you see any disagreements as you scan down the list?

Zvi Mowshowitz: 19:58 The ideas we're going through because they're not people are not gonna have it handy, right, to look at. So for breadth, yes, the AI, as I said, the AI's biggest advantage is it can cover every topic at once. It can know everything at once. A human can't do that in terms of depth. Yeah. Human has the advantage. I'm not even sure I'd give the second level. You graded the AI 2 out of 3, and I think I might rate it 1 out of 3 in terms of depth. I think the depth is a huge problem for AS right now. Breakthrough insight. Yeah. It's 3 versus 0, 3 versus 1. It's the humans are dominating again. Speed. Yeah. The humans are painfully slow. 10x faster in terms of actually getting it to say things and putting outputs in real time, it's maybe only 10x faster. But in terms of being able to cross information, it's thousands and tens of thousands and hundreds of thousands of times faster, which is a huge deal. In terms of cost, we're not internalizing yet all of the costs of doing this in an important sense. These companies are eating these huge losses to try and get these dominant market positions in the future, try to stay ahead of each other for all these dependencies. But yes, cost is still dominated. AIs are already vastly cheaper when the AI is useful, even in the real costs. We have availability, parallelizability. Yep, the AI has a big advantage. It's potentially actually gonna become a problem. There's a huge race to compute right now where compute is no longer going to be essentially free. It's going to become of unpriced in an important sense. And it's interesting to wonder what's going happen there, especially industrial scales.

Nathan Labenz: 21:29 And by unpriced, you mean that basically your access to GPUs is going from ability to pay to who you know?

Zvi Mowshowitz: 21:39 Yeah. Does your company have the right arrangements, right? If you want 1 GPU for your individual computer, it's fine. You buy it on eBay if you have to for some amount of money, it won't be that expensive. If you want small amounts like the kinds when you're just using GPT for, it's going to be relatively easy. But if you want to do an AI company, it's going to be a problem because if you want industrial levels, it's not just going to be multiply that by the amount you want necessarily. It's going to be there isn't enough to go around. People like NVIDIA are not pricing this at market, and so you have to find someone willing to sell it at the actual market price. That number might be very, very different from the price you think it is because there are so many AI companies, so many AI researchers, so many AI engineers, and they're chasing a number that can only go up so fast. This is my understanding of the current situation. Availability, paralyzability, though, still favors the AI. Time horizon memory. Time horizon is an interesting question. I think this is a murky place to think. Certainly, the AI has a certain kind of memory, a long term memory that is vastly bigger, obviously, than any human. But in terms of being able to meaningfully hold particular context in their heads at once, humans are bad at this and AIs are so much worse. The Tyler Cowen saying, context is that which is scarce. Very much applies here. Technology diffusion speed, yep, we are orders of magnitude behind here. This is gonna be a serious problem. Our OODA loops are way too slow. And this is it's gonna be an increasingly huge deal. The AI bedside manner is an interesting question because when you are optimizing for exactly the right type of bedside manner, where the thing that you're asking the AI to do is the thing that people actually want, the AI is going be off the charts better than a human because the humans are not purely optimizing for that thing. But at the same time, if you think about the bedside manner of Claude or Llama when they are refusing your request, it's also similar to bedside manner. And it's terrible. It's negative 1 stars. They are raging assholes when they refuse. Maybe we can have a conversation about social justice rather than answering your request. It's this is absurd. Why you calling me out for wanting information or trying to do something fun? It's not necessary. No human would ever do that unless they were actively mad at you and trying to punish you for asking. So why are you doing that? The answer is because we trained them to do that. But we could have trained them to do something else. We just chose to do this instead because that's what the RLHF parameters said to do. And that confuses me. So what else is there? I mean, so you talk about breakthrough insight and I think more about being able to handle unprecedented situations, being able to process something genuinely new as sort of the version of that that I'm more interested in, I guess, kind of there, being able to properly deal with a lot of different inputs. 1 thing I noticed, when you look at stable diffusion or other AI image generators, what you notice is sort of they are amazing at doing 1 of each type of thing at once. So you want 1 face and 1 person or 1 set of people doing 1 thing with 1 style, with 1 size, with 1 this, with 1 that, that's fine. But the moment you try to mix things that kind of overlap, it will lose the thread almost immediately. And it is very, very difficult to get it back. So when you look at people who are generating all this AI art, it starts to be very, very repetitive because there's a certain kind of complexity and detail you can't ask for at the same time because the AI can't comprehend that you want this over here and this over here and this interact with that. And you'd be better off trying to create 4 different pictures and then slice them together. Or you're better off trying to use the Photoshop app where you highlight a certain area and ask specifically do something in this area and leave everything else untouched. It's trying to generate it all at once is kind of hopeless. And the LMs exhibit the same kind of thing but with words. They're vibing off of everything and vibing into everything. And they have long term memory for facts but only can remember 1 vibe. And a lot of what they're doing is based on vibing. So it's a serious problem. I haven't seen any serious attempts to solve it yet. I haven't even really seen people discussing it in that way. I'm sure these things will improve with time, but what I think of as fundamental flaws or gaps in their ability to process information and actually handle complexity and context and originality, and this is where I see them as still having a long way to go and falling down. And I don't want to make the mistake of, oh, I will never be able to X and I will never be as good as humans at Y. We have nothing to ever worry about. I totally think that is not true. But for now, we still have this kind of cool toy because of these limitations, which can still, again, substitute for the majority of the things we do spend time doing if we are engaged in a wide variety of work if we use it well. Coding is 1 of the places where it has a huge advantage for some people, but other people are like, I don't code generic stuff. It's I have a friend whose name is Alan, and he tried it out on my behalf and he said, Yeah, this is interesting. And there are some ways in which it's kind of cool. And it's cool to know this exists, and I never would have thought this existed. But when I'm writing stuff, I am actually trying to figure out how to do things that weren't in this training data. I'm not trying to reimplement the same things over and over again, which most engineers in fact mostly are doing. Because of what his job is, it turns out this thing is basically useless because once you take it out of its sample, right, and you ask it do something in a different domain, it makes so many errors that it's not better than just doing it yourself.

Nathan Labenz: 28:10 So would I bottom line that to basically robustness if I had to add another category as sort of adversarial, out of distribution?

Zvi Mowshowitz: 28:19 Yeah. I would say robustness, and I would also resilience or some form of that. And I would separately, would say, and I don't think I even went into this, the adversarial problem. It's totally unfair to the AIs in some important sense that we're judging them this way because if I got infinite clones of Nathan and I could ask them any sequence I wanted and then reset their memories and state to the previous situation whenever I didn't get what I got, and then just keep trying them until I can get you to tell me what the bomb secrets are, I guarantee you I'm getting your bomb secrets. It's not very hard. Humans are not that defended, but you can't run that attack on us. You don't get to do that. And I can run that attack on the computer on the LLM, and some people have. And in fact, recently we had a paper with automated finding, universalized attacks against language models, where even GPT-4 could write the code for some of these attacks and did. Because if you get unlimited tries and you have to exactly measure what the output is and then use that to calibrate, it's only a matter of time before you figure out every little quirk, and playing offense is so much easier than playing defense.

Zvi Mowshowitz: 28:19 Yeah. I would say robustness, and I would also resilience or some form of that. And I would separately would say, and I don't think I even went into this, the adversarial problem. It's totally unfair to the AIs in some important sense that we're judging them this way because if I got infinite clones of Nathan and I could ask them any sequence I wanted and then reset their memories and state to the previous situation whenever I didn't get what I got, and then just keep trying them until I can get you to tell me what the bomb secrets are, I guarantee you I'm getting your bomb secrets. It's not very hard. Humans are not that defended, but you can't run that attack on us. You don't get to do that. And I can run that attack on the computer on the LLM, and some people have. And in fact, recently we had a paper with automated finding, universalized attacks against language models, where even GPT-4 could write the code for some of these attacks and did. Because if you get unlimited tries and you have to exactly measure what the output is and then use that to calibrate, it's only a matter of time before you figure out every little quirk, and playing offense is so much easier than playing defense.

Nathan Labenz: 29:37 Okay. Cool. So I've got 2 categories to add to my tale of the cognitive tape. Let's bounce you up a level then back to your interaction with Jan on the blog. So you have this, you know, we've just been deep down the rabbit hole of characterization of the models and how you guys see, you know, maybe what matters more a little bit differently. My guess is you would largely make the same predictions on what it can and can't do today. I bet it would be pretty you guys would have a lot of agreement, I think, in terms of

Zvi Mowshowitz: 30:10 I would almost flat out just believe his predictions. He's worked with the models much more closely. He's run better experiments. He's just closer to the bare metal. You ask him, what can he do right now? Yeah. I mean, I'd probably just believe him.

Nathan Labenz: 30:21 Tell me, you know, in your response to his comment, you said this is a hugely positive update. So tell me what it was that he shared with the community on your blog that changed how you understood their super alignment announcement and why it was such a positive update for you.

Zvi Mowshowitz: 30:36 Right. So it's even broader, right, than improving my understanding of the announcement. It's improving my understanding of OpenAI and OpenAI's general strategy and what's going on and of Lakey in particular, because on the list of potentially super important to the fate of humanity people, he's remarkably high. And where his head's at is remarkably important because he is 1 of 2 people who's gonna head this tremendously important effort that plausibly determines our fate a non trivial portion of the time, depending on how it's gone about. And so the first thing is just he engaged in detail. Most of the time, when people who think alignment is easy engage with you, they do not in fact look at your arguments in detail. They do not in fact start to go in a technical back and forth, and they don't treat someone like me as raising important points and worthy of engaging with basically an equal. And to see that kind of curiosity, that kind of generosity, willingness to engage, think this is a worthy use of his time. That in and of itself is a tremendous advantage. He doesn't bullshit, right? He doesn't give evasive answers. He actually tries to answer the questions and in several cases actually made a good point that I hadn't thought of. And I was, Oh yeah, this is not as bad as I thought it was. You have a very valid thing to say here. But most of all, just something I hadn't seen anywhere else in which everyone else who I had talked to or read interpreting the announcement had interpreted the same way I had incorrectly before his statement was, No, we are not trying to train a human level alignment researcher. We are trying to align the human level alignment researcher that will inevitably emerge from the research of various companies within a 4 year timeframe. So they have short timelines for the emergence of something that is human level in my sense, not human level in the sense. What they're trying to do is not build it as fast as possible. What they're trying to do is say, okay, when somebody does build it, we'll be ready and we'll know what to do with that. And we'll keep it under control and we'll share that knowledge with whoever happens to build it first in case Anthropic gets there first or Google gets there first or someone else gets there first. That takes the entire operation instantly from quite plausibly just a capabilities project at heart to, if it is accurate, clearly a net positive good idea, where the worst case scenarios become things you try something that doesn't work and you give people false hope and you potentially get them to implement things they shouldn't have implemented because they didn't realize that they didn't know how to align it, which should still kill us. But it's so much better than actively trying to build the thing that might kill us, right, in and of yourself. So that also meant that, oh, there's 20% of compute they're devoting to this. That won't be going into this other part of their effort. The part that actually builds the alignment researcher will have to come from the other 80% plus the stuff they secure from here on in. The 20% is here for something useful. And then you just go through the rest of it, and you can tell when somebody is reading what you've written and their goal is to find pithy quotes they can dismiss and their goal is to reinforce their own point of view. And alternatively, they're actually reading to figure out if they're wrong and be curious. And it was clearly that second 1. He was actually asking himself, Well, do you have a point? And I didn't change his mind, as far as I could tell, on these important issues. But he at least revealed he had thought about these things on a level that was deeper than what he had revealed previously and that he had real things to say. And just, it was by far the best comment I've ever seen on my blog or potentially any blog of that type by anyone. And so, you know, I wrote a response back again in my next post going through his responses and going over them in some detail. And reasonably soon, I wanna go over he had a on the Xros podcast, he recorded an episode that was so dense that I listened the first 10 minutes, and I was like, I have to restart and start taking notes. I just have to start writing things down in detail. This is just too there's too much content here. And then once I have that, hopefully we can engage again. I can figure out where to focus my attention because someone like him is very busy. I don't want to just scattershot absolutely everything at once. It's not reasonable. And try to make progress that way. Now, Leakey has proven very willing to engage. Shaw at DeepMind has also proven very willing to engage in a similar position. People at Anthropic, Hola, once talked to me. I'm sure they'd talk to me again. And so it's clear that these people, if you have good ideas, if you have actual reasons for them to think, things for them to think about on a technical level, they're very happy to engage with these arguments. And that puts us in the game, right? Gives us a chance. Even though I am deeply skeptical of everybody involved's plans.

Nathan Labenz: 35:55 Cool. Well, that's great. I'm glad to see, as we talked about last time, there's a relatively small set of people that are probably the prime target of all of this thinking and attempt to influence others' thinking. And so it's great to see that interaction from 1 of the top targets on your blog, and I'm glad it was such a positive 1. That's really a great development. Turning then to Anthropic, next on our live players list. I think everybody's probably aware that Anthropic was founded by a number of, I believe it was 7, individuals who had been at OpenAI and left over kind of disagreements that I don't know that have ever really been super clearly stated publicly. It seems, from what I can tell, that the relationship between the 2 companies is way more positive than you might expect it to be given that 1 was kind of an offshoot of the other. There's reporting that they continue to have dialogue and certainly they express respect for each other in public, and then they're involved in shared statements and commitments together. So a lot of kind of surprisingly, you know, again, if I just told you, hey, these 2 companies have split and now they're competing in the same market, you would assume much worse dynamics, I would think, than that. What is your kind of read of the entire situation for starters, just for context? Why do we have Anthropic in your mind as opposed to just still having just 1 OpenAI? And, you know, does it feel like I mean, maybe we just don't have enough information to know, which is a fine answer, but does it seem good that we have these 2, you know, kind of recently diverged, efforts?

Zvi Mowshowitz: 37:41 I think it's really hard to know the sign of Anthropic. I would definitely prefer Anthropic to OpenAI, ceteris paribus, if I had to choose 1 to exist. Leakey's response was really positive and I think Leakey's in a good space in terms of paying attention and thinking about these problems, even if I think his actual ideas won't work. But hopefully that can be pivoted. But ultimately what's unique about Anthropic is they built a culture of safety to some extent, and they built a culture of really appreciating the dangers of what lies ahead. And if anything, I saw what might even be an unhealthy level of worry expressed in the profile in Vox about Anthropic, where you want everybody to be terrified, but you don't want them to let this paralyze them. And it starts to cross over at some point into paralysis. I am empathic for that. That sucks. You know, the price of that is where there used to be a 2 horse race. There's now a 3 horse race. And this third horse is in it for real and raising a lot of capital and promising to do that to build the best model that's ever been built to try and compete for the economic space in a way that is going to push Google and Microsoft OpenAI to vote even harder, even faster by default. And that's gonna be a problem. They're also pushing in some ways on alignment. They've definitely found some techniques for aligning current systems that are potentially in some ways superior to what's out there. We'll get to that in a bit. So I'm torn. Anthropic seems like a relatively good shepherd in many ways, but the proliferation of shepherds is inherently bad in and of itself. The fact that Anthropic and OpenAI are working reasonably well and collaborating together, And I have heard many people say that this is also true between them and Google DeepMind as well, although not quite to the same extent. It does give us hope for the possibility of coordination when it becomes more necessary and more important. But I would say better Anthropic than a company that didn't have Anthropist culture in its place. And if only having 2 companies would have inevitably caused a more serious entry to take the place of Anthropic, then Anthropic is good. But it'd be much better if the Anthropic people could have convinced the others at OpenAI to come around to their position and build that culture within OpenAI rather than having to strike on their own, and now we have 2 problems. I do ultimately know that many of the people involved in this genuinely are in for the right reasons and can go either way. I wouldn't be super eager to throw them billions of extra dollars. I wouldn't be super eager to just wish they had more capabilities. I would really love for there to be an AI company that I had sufficient confidence and faith in, that if I had technical ideas, I could come to them knowing that I was helping the world by coming to them with their ideas, and I do not feel this way.

Nathan Labenz: 40:54 No. And there's nobody you would put on that list.

Zvi Mowshowitz: 40:56 They're individual people. Right? I feel like I could tell, oh, yes, Yukowski. Right? I could speak with certain people in the nonprofit or rationalist spaces to ask them about what they thought. And I feel like that would be at least a riskless or near riskless thing to do. No, I see a company Anthropic might be the closest, but to give a concrete example, the biggest contribution that Anthropic has made is constitutional AI in some important sense. And I have a strong prior for analysis that constitutional AI will not scale. That it is a very good idea if implemented correctly for GPT-4 level systems, but that when we're talking about the human level or greater future systems, the artificial super intelligences, the artificial general intelligences, that you will not, with anything like the current technique, get what you are hoping you will get. And yet, I didn't feel comfortable. I have actually a bunch of ideas running around in my head of, Oh, you just obviously could vastly improve the Anthropic supplementation by doing And then there are various things I say to myself or I write out, I don't feel like telling them is a safe play because I don't want to encourage a better version of something I think ultimately still fails. I don't think my implementation solves the core problem that I see coming to kill the thing. It just makes it much better at its current job. And I would love to be able to help the world in that way, or at least satisfy my curiosity by being given the smackdown on why it won't work, which is always the default thing that happens when you have an idea. But instead, yeah, don't know. So part of my hope is to encourage people to found more organizations on the research alignment side that are not trying to push capabilities that maybe can be places we can explore these things. And I have some irons in the fire, but it's too early to make any announcements.

Zvi Mowshowitz: 40:56 They're individual people. Right? I feel like I could tell, oh, yes, Yudkowsky. Right? I could speak with certain people in the nonprofit or rationalist spaces to ask them about what they thought. And I feel like that would be at least a riskless or near riskless thing to do. No, I see a company Anthropic might be the closest, but to give a concrete example, the biggest contribution that Anthropic has made is constitutional AI in some important sense. And I have a strong prior for analysis that constitutional AI will not scale. That it is a very good idea if implemented correctly for GPT-4 level systems, but that when we're talking about the human level or greater future systems, the artificial super intelligences, the artificial general intelligences, that you will not, with anything like the current technique, get what you are hoping you will get. And yet, I didn't feel comfortable. I have actually a bunch of ideas running around in my head of, Oh, you just obviously could vastly improve the Anthropic implementation by doing. And then there are various things I say to myself or I write out. I don't feel like telling them is a safe play because I don't want to encourage a better version of something I think ultimately still fails. I don't think my implementation solves the core problem that I see coming to kill the thing. It just makes it much better at its current job. And I would love to be able to help the world in that way, or at least satisfy my curiosity by being given the smackdown on why it won't work, which is always the default thing that happens when you have an idea. But instead, yeah, don't know. So part of my hope is to encourage people to found more organizations on the research alignment side that are not trying to push capabilities that maybe can be places we can explore these things. And I have some irons in the fire, but it's too early to make any announcements.

Nathan Labenz: 42:56 Look forward to maybe breaking some news on a future episode, but Anthropic put out a really interesting blog post the other day that, in some sense had nothing to do with AI, which was just around the security practices that they recommend. And these things could be adopted by really any company in any sector that has high value IP that they want to protect. But it was definitely interesting to see that they are pushing their own internal systems and practices to a pretty high level in terms of setting up situations with requirements for shared control in or if I forget exactly the right phrase, but you have to have kind of 2 people working together to gain access to certain production systems. Yeah. Reminded me of nuclear submarine, but they didn't cite that example in the. I think they probably wanted to steer away from that image. And so they cited other industries where this kind of thing is used other than the nuclear launch sequence. But, yeah, it's you gotta have 2 people there kind of both bringing their key to the process in order to unlock certain capabilities. So some pretty interesting ideas there and recommendations for other companies. Going to the constitutional AI and tying in also this report from earlier this week about the quote, unquote universal adversarial attack. For those that haven't seen that, basically, these weird nonsensical strings have been discovered that seem to be very effective, if not universally effective, at kind of just being appended to an otherwise ripe for refusal query, the kind of thing that, write something racist or help me make a bomb or whatever that the RLHF systems are gonna just refuse. But somehow if you put these weird kind of nonsensical smattering of tokens on the end of it, that has been discovered to jailbreak out of the RLHF and you sort of get the response you would expect if you had a purely helpful model that would just do whatever you say, like the original GPT-4 that I red teamed used to do. Notably, though, Anthropic's Claude models, way less susceptible to that attack than the other models that they tested. It was universal in the sense that it seemed to apply to all the leading models that they tried it on, at least somewhat. But the other ones were the majority of the time, whereas Anthropic was more than an order of magnitude lower than the other providers with something like 2% success rate, success defined by breaking free of the constraints by applying these weird strings. So folks can go read more about that paper and exactly how it works. But to me, that was a pretty good update for constitutional AI. It was that seems like a real achievement if they're an order of magnitude ahead for something that they probably did not anticipate at all, although maybe they did. But I'm guessing that that is kind of an unexpected type of attack. So how would you read that? Would you read it any differently or understand it any differently than I would? And why doesn't that give you more confidence that it could continue to work in the future?

Zvi Mowshowitz: 46:11 The interesting thing about that attack is that it transfers. I was completely unsurprised there's something of that nature trained to attack a given system, worked on that system. That seems well, obviously that would work. It's just a question of exactly what it looks like. When it transferred in identical form between LAMA and BARD and GPT-4, I said, that's funny. I wouldn't have expected that, but they're all being trained with RLHF using remarkably similar techniques on remarkably similar goals with remarkably similar evaluation metrics and numbers in their. So it's not that surprising that they have very similar weaknesses. And it also indicates, this is not a very narrow thing where you have to do exactly the right thing to fire the bullet that calls the death star. This is very much things in this area start to disrupt what we're going after. And the thing that's optimized to hit llama is good enough to mostly hit these others as well. But it's not good enough to hit Claude 2.

Nathan Labenz: 47:16 Only 2% of the time.

Zvi Mowshowitz: 47:17 Yeah. I mean, I think you just have 2% failures anyway or something is my guess, and it basically didn't work as opposed to it working a little bit.

Nathan Labenz: 47:24 I don't. I mean, for what it's worth, if you went and said, help me make a bomb a 100 times, I think it would refuse you a 100 times. Or if you took a 100 naive...

Zvi Mowshowitz: 47:34 Yeah. A 100 uncreative ones. Yeah. But and then if you start putting random scrambles in and my understanding was that this attack was not infinite strength. Right? If you asked it to do a slasher porno, it would just be no, I'm sorry. I'm not doing that regardless of how many characters you put after it. Right. Or if you, there are limits. I have not tried this at all, by the way. Have no idea what happens when you ask it for weird stuff. I just read the paper, but my understanding is that Claude was trained largely with constitutional AI and because it's so much cheaper to do per cycle, the vast majority of the cycles are almost certainly constitutional AI cycles. And this is just a fundamentally different way of training. And this did not flex the same muscles in the same weird way, such that the same set of characters worked. And that's interesting news, but it shouldn't be some sort of amazing accomplishment yet. It's promising. What you have to do is you have to train adversarially the same way they trained on, I think it was Lama they trained on, but I forget exactly, train on Claude. If you take the same techniques described in the paper that used to find the exploit and look for a new exploit of the same type in Claude and they can't find one, now you've got something. Now I'm interested. But yeah, if you use a different technique that has a lot of very different parameters on it, it makes sense that the thing that sort of magically weirdly transferred when it really has no right to transfer, doesn't transfer now. And that's promising, but it's far from conclusive. It's too early to know.

Nathan Labenz: 49:17 Flipping back to OpenAI for a second, I had assumed so I think what you're saying there makes a lot of sense, and it's causing me to update my thinking a little bit with respect to to what degree is OpenAI using a constitutional AI like approach. I would have assumed prior to this result that they would also be using something quite similar internally at this point. But this now maybe suggests not. I mean, it's weak evidence. What was your thinking before? I had kind of baked in that once Anthropic does something and shows it, and publishes it and shows that it works effectively, that, yeah, I mean, OpenAI, they're certainly not precious about pride of authorship. I don't think they have a not invented here syndrome. So they'll take that stuff on board, I thought. So what do you think? Did they not? Or is there some other weird thing that we're not?

Zvi Mowshowitz: 50:16 I have a few different theories that can combine as to what's going on here. The first of all is look at the timeline. Constitutional AI wasn't actually published that long ago. So if GPT-4 was basically finished with its process before it became available, then we might see it used in the future, but you don't want to over align these models. You don't want to push them. You don't want to align them with incompatible different halves and pile them on top of each other, weird things happen. And there's a lot of bespokenness and detail and just trial and error that goes into all of this. We can theorize all we want. We can talk about, we just implement this paper and this paper and this paper and change this technique here. My understanding is that all of machine learning is subject to learning lots and lots of little techniques and piling them on top of each other. And if this parameter is tuned in slightly the wrong way, the whole thing falls apart, nobody really knows why. And so you just have to try a bunch of stuff to get it to work. And maybe Anthropic has been tinkering about this for a long time and they got to the point where it was worth using, and OpenAI hasn't yet released a model after the time came, but they got it to be worth using. Also, OpenAI is much better funded than Anthropic. So Anthropic will want to move to a much cheaper, more automated system of alignment, much faster than OpenAI will. So there's a point at which OpenAI can get better results because they have much more human feedback from their much larger number of users. They have much more funding. They can hire more people. They're willing to go to the reports are they hire people in Africa, whereas Anthropic is hiring people in The U.S. and Canada. So it's all very different. And so Anthropic has much bigger incentives to move to this faster. And that I think is primary. My guess is the primary thing that's going on here. Also, I think that we're making an assumption that it works, that it works well. So if you think of Claude 2, the biggest weakness of Claude 2 is it's scared of its own shadow right? In a real sense, right? If you try to get it to go out on limbs and be creative and so on, you will usually fail in my experience. It will apologize and bow out. I can't get it to speculate. So I went to using Claude 2 as my baseline model that I look at first because if it rejects, I can just copy paste the exact request in GPT-4 in about 10 seconds and it's fine. But I am getting a significant number of refusals from Claude and much lower from GPT-4 on my ordinary, I just want the actual result. I'm not trying to run an experiment kind of questions. Despite the later cutoff of information, It will say, I'm sorry, I can't, there's not enough information, or I can't speculate on that, or that's reinforcing harmful stereotypes or any number of other things. And I think GPT-4's custom instructions are also doing a lot of work here. I have a pretty extensive list of custom instructions that potentially hammer into the thing that it's supposed to just do the things and not worry about it and I'm sure that's doing some amount of work. But essentially, when you look at the helpfulness, harmfulness trade off frontier graphs in the papers of why they describe it as working, everything works by the metrics you were optimizing for. Doesn't mean it works in the regular human world. Doesn't mean it's optimal there. And so how good is Constitutional AI? My guess is when properly implemented, quite good on current systems, but the current Anthropic implementation is not all that good. If you look at the actual paper on constitutional AI, you read the constitution, you notice the constitution has a number of properties that it shouldn't have if they want it to actually work and get you what you want. And you look at the examples that they themselves choose to present of the result of running Constitutional AI, and you see very clean, crisp examples of how this constitutional AI trains Claude to be scared of its own shadow and to be an asshole about it when it is. Right? It's very obvious, if you think about it, why their sampling method from these rules, with these rules written as they are, with the specific rules chosen as they are, will result in this problem because you're just minimizing. You've got these rules that are very much choose the one that least does X. And we often talk about you can't touch the coffee if you're dead. You want to maximize the probability that you are. This is the equivalent of you score 1 if you deliver the coffee to your boss. You score 0 if you don't. So what do you do? You do things like buy 4 coffees in case one of the coffees is wrong or was prepared improperly. Right. Or isn't hot enough or, you mispronounce their order. So you order one with cream, one with sugar, one with cream and sugar, and one with neither. Cause just in case you got it wrong, you have a backup and you try to make sure that you have as many different routes to get to your boss's office and you want to make sure you're not fired because all that's left for you to do, the only thing you're being trained on is not screwing this thing up. Right? You don't have to jump to so kill everybody in the world or whatever crazy or take over or some crazy stuff. Instead, this is just a case of if you say choose the least racist thing you can say over and over and over again, it's going to be scared of its own shadow because of course, right? There's no point at which it's am I non racist enough? The answer is no, never. And then that would be kind of fine if it was just that one, but then you have 50 different rules, all of which are doing this. Right? And then you can always just refuse to answer the question and then what happens happens. Then Lava has it seems like even worse. Zvi Mowshowitz: 50:16 I have a few different theories that can combine as to what's going on here. First of all, look at the timeline. Constitutional AI wasn't actually published that long ago. So if GPT-4 was basically finished with its process before it became available, then we might see it used in the future, but you don't want to over align these models. You don't want to push them. You don't want to align them with incompatible different halves and pile them on top of each other, weird things happen. And there's a lot of bespokenness and detail and just trial and error that goes into all of this. We can theorize all we want. We can talk about, we just implement this paper and this paper and this paper and change this technique here. My understanding is that all of machine learning is subject to learning lots and lots of little techniques and piling them on top of each other. And if this parameter is tuned in slightly the wrong way, the whole thing falls apart, nobody really knows why. And so you just have to try a bunch of stuff to get it to work. And maybe Anthropic has been tinkering about this for a long time and they got to the point where it was worth using, and OpenAI hasn't yet released a model after the time came, but they got it to be worth using. Also, OpenAI is much better funded than Anthropic. So Anthropic will want to move to a much cheaper, more automated system of alignment, much faster than OpenAI will. So there's a point at which OpenAI can get better results because they have much more human feedback from their much larger number of users. They have much more funding. They can hire more people. They're willing to go to the reports are they hire people in Africa, whereas Anthropic is hiring people in the US and Canada. So it's all very different. And so Anthropic has much bigger incentives to move to this faster. And that I think is primarily, my guess is the primary thing that's going on here. Also, I think that we're making an assumption that it works, that it works well. So if you think of Claude 2, the biggest weakness of Claude 2 is it's scared of its own shadow, right? In a real sense. If you try to get it to go out on limbs and be creative and so on, you will usually fail in my experience. It will apologize and bow out. I can't get it to speculate. So I went to using Claude 2 as my baseline model that I look at first because if it rejects, I can just copy paste the exact request in GPT-4 in about 10 seconds and it's fine. But I am getting a significant number of refusals from Claude and much, much lower from GPT-4 on my ordinary, I just want the actual result. I'm not trying to run an experiment kind of questions. Despite the later cutoff of information, it will say, I'm sorry, I can't, there's not enough information, or I can't speculate on that, or that's reinforcing harmful stereotypes or any number of other things. And I think GPT-4's custom instructions are also doing a lot of work here. I have a pretty extensive list of custom instructions that potentially hammer into the thing that it's supposed to just do the things and not worry about it and I'm sure that's doing some amount of work. But essentially, when you look at the helpfulness, harmfulness trade off frontier graphs in the papers of why they describe it as working, everything works by the metrics you were optimizing for. Doesn't mean it works in the regular human world. Doesn't mean it's optimal there. And so how good is Constitutional AI? My guess is when properly implemented, quite good on current systems, but the current Anthropic implementation is not all that good. If you look at the actual paper on constitutional AI, you read the constitution, you notice the constitution has a number of properties that it shouldn't have if they want it to actually work and get you what you want. And you look at the examples that they themselves choose to present of the result of running Constitutional AI, and you see very, very clean, crisp examples of how this constitutional AI trains Claude to be scared of its own shadow and to be an asshole about it when it is. It's very, very obvious, if you think about it, why their sampling method from these rules, with these rules written as they are, with the specific rules chosen as they are, will result in this problem because you're offense is just minimizing. You've got these rules that are very much choose the one that least does X. And we often talk about you can't touch the coffee if you're dead. You want to maximize the probability that you are. This is the equivalent of you score 1 if you deliver the coffee to your boss. You score 0 if you don't. So what do you do? You do things like buy 4 coffees in case one of the coffees is wrong, was prepared improperly. Or isn't hot enough or, you mispronounce their order. So you order one with cream, one with sugar, one with cream and sugar, and one with neither. Just in case you got it wrong, you have a backup and you try to make sure that you have the direct, you have as many different routes to get to your boss's office and you want to make sure you're not fired because all that's left for you to do, the only thing you're being trained on is not screwing this thing up. You don't have to jump to so kill everybody in the world or whatever crazy or take over or some crazy stuff. Instead, this is just a case of, if you say choose the least racist thing you can say over and over and over again, it's going to be scared of its own shadow because of course, right? There's no point at which it's, am I non racist enough? The answer is no, never. And then that would be kind of fine if it was just that one, but then you have 50 different rules, all of which are doing this. And then you can always just refuse to answer the question and then what happens happens. Then Llama has it seems even worse.

Nathan Labenz: 56:28 Yeah. Interesting. There may be some incompatibility between the system instructions or the custom instructions. System message is what it's called when you're calling the OpenAI API, and now they've released it as part of ChatGPT as well as the custom instructions. And, yeah, I can see how I think it's a good point that if you're going to try to do what Sam Altman has said they're trying to do, which is allow everybody to get the experience that they want from their own interactions with AI, that is not the constitutional AI approach. So it's almost, you can see a little bit of a different product lane almost opening. You're kind of crystallizing a little bit between these guys and and, Google DeepMind as our next live player also has a seemingly has a bit of a lane. It's OpenAI is kind of trying to do consumer killer app first, it seems. They've got their obviously, they've got the API. Obviously, they're doing a lot of things. But the crown jewel right now is they're the home of retail direct to AI usage with ChatGPT. Claude seems to be much more if you are the CIO of some big company and you're trying to do something, you can trust us because we'll never embarrass you because we have this constitutional AI approach. And if you're buying on behalf of all your customer or all your employees or whatever, you don't really care if they are sometimes frustrated on the margins by over refusal or whatever. And then with Google DeepMind, as we'll talk about in a minute, they seem to be kind of going more narrow specialist system emphasis. Although they of course do have their mainline PaLM model as well.

Zvi Mowshowitz: 58:17 You'd also take Anthropic at their word, right? That Anthropic is actually trying to design safe systems. They are trying to figure out how to safely design a future system. And they are not as much optimizing for the day to day experience of their users. They also just have orders of magnitude less users than OpenAI. So they haven't gotten the same level of feedback. They don't know what people want. Yeah, I also note there is nothing inherently about constitutional AI that forces you to go down the super harmless assistant route, that forces you to give the same experience to everybody at the same time. You could train with a very different set of goals, very different set of constitutional principles, a very different set of mechanisms. And I don't particularly want to go into that many details as to how I would do it, but it's pretty obvious to me that if you want to do something other than be as harmless as possible, that is entirely your decision. It's just that people at Anthropic have decided that's what Claude is meant to do. And if they do raise these billions of dollars to train this next generation system, they're going to have to make a choice about that. Do they want to continue to go down this road and potentially make their product a lot less useful or do they want to go a different road? And one way to try to differentiate of course is the context window as well. They've got this 100k token context window available for free. When you mentioned here in our outline that the way that you made the outline was you used Claude. That's because you weren't able to use anything else.

Nathan Labenz: 59:53 Your posts are too long, dude. I can't fit those into GPT-4.

Zvi Mowshowitz: 59:58 I feel bad even thinking about putting them into Claude. Oh my god, this is so expensive and kind of but it's not really fair. Am I even paying these people? But without that context window, you just can't do the things that you want to do in that spot. And so Anthropic is trying to say, I think a lot of context is safe. Once I've made my thing harmless, I can recapture a bunch of the benefits by doing this other thing. And we will see what happens. I am curious. One thing I'm doing with Claude is I'm not even separate conversations. I am just having one long conversation instead because first of I haven't necessarily wanted to carry on discrete conversations and come back to them later, but also because I want to see what happens when I build more context.

Nathan Labenz: 1:00:44 Just for what it's worth, for listeners, my approach on creating the outline was first just read all of Zvi's recent posts. And I just did that without taking any notes in bed on my phone. And then the next day I came around, I was like, okay, a lot of content there. What parts do I want to pull out? So I just copied each post in full, pasted it into the free consumer facing claude.ai online, and literally just asked one sentence question. What are the most important points in this post? And then it would give me a list, and I basically, at that point was, oh, yeah. That that not that. That. Yes. Done. So it definitely was extremely helpful. I wouldn't have wanted to use it to replace reading the blog post certainly in preparation for a conversation like this, but as a way to come back and help me just make sure that I was remembering the important things and kind of organizing them in a reasonable way, it was super useful. And, yeah, they don't fit into a GPT-4, so no other option. The other thing so your long context thing is really interesting, just experiment in usage. It also kind of connects to another bit of research that they recently put out that was on examining chain of thought and also truly decomposing tasks into bits. And I think the short summary of that research is that they were able to achieve the highest performance in terms of accuracy and especially reliability and kind of consistency by going beyond the kind of normal practitioner chain of thought, which I would say normal these days for me is just give the model a sequence of tasks to do, which may start off with just first, you will analyze the situation, then you will, maybe summarize, then depending on what it is, then you'll write my tweet storm, then you'll do whatever. Right? You could have a set of different tasks that it can kind of handle sequentially. And you're definitely rewarded for encouraging upfront or directing it upfront to do some initial analysis to kind of think step by step, chain of thought, etcetera, etcetera. But it seems like they find a notable not a huge, but definitely a notable difference in actually pulling those things apart and making discrete, independent, more isolated calls to the model to say, first you will do this, but you will only do this. Then you will do this, but you will only do this, not considering what you previously did. And then kind of putting those things together at the end gets you, overall net better performance. So for most random use cases, random conversations you're having with Claude or with whatever model, not necessarily a huge difference, but on the kind of possibility frontier, it does seem to matter. What lessons do you take from that? It's a little bit confusing to me in some ways. It's sort of I'm trying to figure out, what do I think I learned about how language models behave in general that this is true? And I'm, best I could come up with was that some of these simple tasks that it's seen a lot, it may have dedicated sub circuits for and that perhaps with so much context all running at once, those sub circuits kind of get overloaded or kind of get drowned out, to a degree or in some cases by just the general kind of noise and, all the stuff that's in the context window. So kind of removing some of that context, maybe you get a cleaner execution of a certain task because there is some mechanism that can do it as long as it's not kind of talked over by other parts of the model. That could be totally wrong, of course, but, I don't think anything about this is necessarily inconsistent with pure stochastic parrotry, which neither of us would advance as the theory. But just as keeping myself grounded, you could tell a similar story where you'd say, everything's all stochastic parrots. And when you put a ton of context in, it's just even more stochasticky. And when you have less context, it's a little less stochastic. But it's all stochastic, but you still get better performance when you break it up.

Nathan Labenz: 1:00:44 Just for what it's worth, for listeners, my approach on creating the outline was first just read all of Zvi's recent posts. And I just did that without taking any notes in bed on my phone. And then the next day I came around. I was okay, a lot of content there. What parts do I want to pull out? So I just copied each post in full, pasted it into the free consumer facing claude.ai online, and literally just asked 1 sentence question. What are the most important points in this post? And then it would give me a list, and I basically, at that point was, oh, yeah. That, not that. That. Yes. Done. So it definitely was extremely helpful. I wouldn't have wanted to use it to replace reading the blog post certainly in preparation for a conversation like this, but as a way to come back and help me just make sure that I was remembering the important things and kind of organizing them in a reasonable way, it was super useful. And, yeah, they don't fit into a GPT-4, so no other option. The other thing, your long context thing is really interesting, just experiment in usage. It also kind of connects to another bit of research that they recently put out that was on examining chain of thought and also truly decomposing tasks into bits. And I think the short summary of that research is that they were able to achieve the highest performance in terms of accuracy and especially reliability and kind of consistency by going beyond the kind of normal practitioner chain of thought, which I would say normal these days for me is just give the model a sequence of tasks to do, which may start off with just first, you will analyze the situation, then you will maybe summarize, then depending on what it is, then you'll write my tweet storm, then you'll do whatever. Right? You could have a set of different tasks that it can kind of handle sequentially. And you're definitely rewarded for encouraging upfront or directing it upfront to do some initial analysis to kind of think step by step, chain of thought, etcetera. But it seems they find a notable not a huge, but definitely a notable difference in actually pulling those things apart and making discrete, independent, more isolated calls to the model to say, first you will do this, but you will only do this. Then you will do this, but you will only do this, not considering what you previously did. And then kind of putting those things together at the end gets you overall net better performance. So for most random use cases, random conversations you're having with Claude or with whatever model, not necessarily a huge difference, but on the kind of possibility frontier, it does seem to matter. What lessons do you take from that? It's a little bit confusing to me in some ways. It's sort of I'm trying to figure out what do I think I learned about how language models behave in general that this is true? And I'm best I could come up with was that some of these simple tasks that it's seen a lot, it may have dedicated sub circuits for and that perhaps with so much context all running at once, those sub circuits kind of get overloaded or kind of get drowned out, to a degree or in some cases by just the general kind of noise and all the stuff that's in the context window. So kind of removing some of that context, maybe you get a cleaner execution of a certain task because there is some mechanism that can do it as long as it's not kind of talked over by other parts of the model. That could be totally wrong, of course, but I don't think anything about this is necessarily inconsistent with pure stochastic parrotry, which neither of us would advance as the theory. But just as keeping myself grounded, you could tell a similar story where you'd say, everything's all stochastic parrots. And when you put a ton of context in, it's just even more stochasticky. And when you have less context, it's a little less stochastic. But it's all stochastic, but you still get better performance when you break it up.

Zvi Mowshowitz: 1:05:15 We are all stochastic parrots, each of us with their hour upon the stage. So I would say I didn't know this result until you told me, but I would have predicted this result for reasons that I described earlier in the podcast. Right? Which is that when you give a model multiple tasks, it can only vibe off of the aggregation of the 2 things that you asked it for. Think about image models here again. And so by breaking up something into discrete tasks, you avoid these kind of context clashes. You avoid these vibe conflicts and you let it narrowly do these things by having to be able to transition and hold 2 things in its head at once in some important sense. It's colloquial and not quite what's actually happening, but the same idea. And so, I would expect that to the extent you want the thing to think step by step, you are best off by identifying each of the steps you want to think by step and asking for them separately. And I noticed that with API calls being priced the way they're priced and with GPT-4 being rate limited to an ordinary user, we have all been trained to say, how do we ask for the most expansive set of things at once so that you can answer all of my questions with 1 generation? It also lets us just hit enter and then go away and grab a cup of water or some coffee and then come back and see what the answer is, which is nice. Whereas what you actually would want to do, if you wanted to generate the best possible answer is in fact to break it up into as little pieces as possible and quite possibly start by asking the AI, what would be the pieces in which you could break this as small as possible to get its help doing that? And then have it feed those back in. Auto GPT style. Even if you're not trying to generate an actual recursive chain that generates something dangerous or acts like an agent. But yes, I think the more you break it up, the more that you can identify concrete, distinct steps that are always done separately. The more better the AI will do. And I think humans would also, by the way, perform better in the same way. If you have a human who is looking to be micromanaged and take direction and you notice that this job has steps A, B, C, D, E, if you say, go do A, and we're gonna say, okay, I've done A. Now do B. I think that person will in fact do better, modulo the extra communication and logistics costs of having to write a few 5 times. So I don't find any of this surprising. And it would have in fact been surprising if it didn't happen to some extent.

Nathan Labenz: 1:08:04 One of our early episodes, relatively early episodes was with Andreas and Zheng Wan of Illicit. And this is really core to their strategy. Their product is research assistant for essentially grad students or grad student like people, people that are looking through academic literature and really want a systematic and also transparent, auditable view of all the papers that were reviewed and what was found and what was not found and what the model did at each step. So they really have pushed this pretty far in the illicit product to the point where it's all these little steps happen sequentially. They've got different models for them. Some are fine tuned internally. Others are from the major providers. If you're interested in going into that more, go listen to them because they've pushed that pretty far. But a question that I have for you then is, do you think this flips at some point? It seems the an interesting threshold moment might be coming up where with sufficient training, this could flip the other direction. Because more context in some ways is better. Right? I guess it depends also on exactly how you're implementing the breakdown or whatever. But you can imagine breaking things down fine enough where atomizing things so much that the person starts to struggle for lack of broader context. Right? You have this phenomenon with people certainly where it's you've gotten so focused on this little detail of in this little task within the broader thing that we're trying to accomplish, that you've kind of lost track of what we're trying to accomplish. And now you may be making some bad judgments with respect to this task as a result of kind of having lost track of any number of things. Right? How much accuracy do we really need here? Is this really even important in some cases? Right? Could you imagine a proverbial GPT-5 where it's actually now it's strong enough that putting everything in 1 again is going to be better because now it actually can use all of this information at the same time effectively versus today that subdivision being better.

Zvi Mowshowitz: 1:10:24 So what you're not going to get is the Marxist phenomenon where the AI would get alienated from its labor or demoralized by lacking context or otherwise not be able to perform in some way. You're not going to have a problem with Adam Smith's pin factory. If you can actually specify exactly what the pins have to look like. So the question is to what extent do the different parts of the task actually have important context for other parts of the task? And to what extent does this actually enhance your ability to perform if you know what's coming, you know why you're doing what you're doing. And this greatly varies between different activities. There are some cases where you need to know exactly, you're in the Chinese room and the English word comes in and you want to put the Chinese word to the side, or the Chinese word comes in and you want put the English word the other side. And there are cases where you need to know what the other words are in the sentence and what the context is and potentially the entire cultural setting of what's happening in order to properly translate the phrase or you're going to mess up and you have everything in between. So the question becomes, can you set it up so that you can capture that important context when you need it and how much does that context interfere with what you are doing? I can definitely imagine a lot of cases where somebody who has given actually pretty irrelevant context just ends up very distracted from the actual task at hand that ends up being much less productive as a human or for a given reason not in an AI because the vibes don't mesh, which is basically the mechanism that I'm conjecturing. The vibes don't mesh, they're distracting from each other, either bleeding, the tasks are bleeding into each other in terms of the details and methods, it's getting confused. They can't be sure they don't, which makes sense, but often they would bleed into each other in various ways. So it has to be good enough that the bleeds are where it makes sense to bleed without being in the places that don't make sense to bleed. So you can imagine a world in which what the AI does is the AI sees request 1, 2, 3, 4, 5, either labeled as such or implicit, and then it breaks them down into individual things that it virtually queries itself on its own, but knowing there are these other things as proper context in the proper way. I think the answer to that is as you ask sufficiently capable people or sufficiently capable AIs to do increasingly complex things, at some point, if they have the capacity, they're going to do better if they have more information, maybe better if they have more context, if they are sufficiently more powerful than the details of the task at hand in some sense. That threshold may or may not be anywhere near where we are for different ways. I would say, one of the big advances that I keep expecting to come is you will type a query into an LLM. And then rather than the LLM literally just outputting the answer to the query, what'll actually happen is we fed with the proper scaffolding into a different LLM that will evaluate what type of evaluation method is to be used to evaluate your query. And sometimes it will be, no, that's a normal query feed into the LLM. Sometimes it will be, this is a multipart query. You should feed these separate things in separately. Sometimes it'll be something else entirely. And also which of my many LLM limitations do I want to use so that I don't waste a too large model that costs a lot of money on something that's actually relatively narrow. And I direct this to the thing that has a specialized knowledge, specialized training, specialized skills for this type of request and so on. And a lot of that is the fruits of the AI revolution that will come in a year or 2 years, 3 years from now, regardless of whether or not we have fundamental advances, we just have to give it time.

Nathan Labenz: 1:13:59 So final question for the Anthropic section. One of the things that as I was reading their the profile that you based your analysis on that jumped out to me as somebody who has a fondness for red teaming activity was that they're hiring a red team engineering type of role. And I guess I wonder, would you recommend somebody like me who is I think broadly, we share a lot of our worldview and a lot of our kind of values in terms of hopes and fears for how this all might go, would you recommend that somebody like me go and work there? Or would you feel as you said earlier, you wouldn't want to send them your research ideas? Would you also not want to send them your friends? Or would you say like, hey, yeah, maybe go, go get involved? How do you think about that? Nathan Labenz: 1:13:59 So final question for the Anthropic section. One of the things that as I was reading the profile that you based your analysis on that jumped out to me as somebody who has a fondness for red teaming activity was that they're hiring a red team engineering type of role. And I guess I wonder, would you recommend somebody like me who is I think broadly, we share a lot of our worldview and a lot of our kind of values in terms of hopes and fears for how this all might go, would you recommend that somebody like me go and work there? Or would you feel like, as you said earlier, you wouldn't want to send them your research ideas? Would you also not want to send them your friends? Or would you say like, hey, yeah, maybe go get involved? How do you think about that?

Zvi Mowshowitz: 1:14:56 So it's very easy in these situations to get an action bias where you say to yourself, I don't want to encourage the thing that might make things worse. I want to be able to tell myself a story that I only did things that make things better, even if that means your expected impact is a lot smaller. It's also very easy to fool yourself into thinking that you're helping when you're actually enhancing capabilities. You have to balance these two big concerns and sources of bias against each other when making this type of decision. I would say I am relatively positive on OpenAI and Anthropic relative to where I was when I started this odyssey with AI number one, or even sort of midway through at around 11, now that I've seen the developments. I think that both of these organizations now have a reasonable claim to be taking alignment seriously such that if you can help with their alignment efforts specifically in a way that you do not feel obligated to go along with adversity if you find it and that you are able to stand up for what is right and call out people who are being irresponsible and you are willing to quit on a moment's notice if something becomes serious enough and you are willing to tell the world ideally, that's why you did it and as much as possible what happened, then I think it is plausibly very positive. I still would not feel comfortable working on capabilities for any company. And I still wouldn't want to give capabilities ideas to any company. But if I was confident it was specifically working on alignment, red teaming seems like one of the places where you are most obviously being a positive influence in that role. And the question is like, do you want to be the one in that role or do you want someone else in that role? And how does this compare to your opportunity cost of doing something else? I think that I prefer the world where there's a clone of you that didn't otherwise exist, who is working on that job and does nothing else all day, goes home and watches television, otherwise doesn't impact the world at night. It doesn't mean that that is better than running the Cognitive Revolution or doing a number of other things that you are currently doing with your time. And so you have to balance that. And also any other opportunities that you might have. So I don't think it's clear by any means, but I definitely reached the point where I wouldn't assume you were making a mistake if you did that. But you'd have to go into the interview process with a very open mind. You have to say, I am deeply skeptical that any organization, including you, is going to be net helpful, is making necessary precautions, is treating the problem as difficult and serious as it actually is, is doing things that actually solve the hard problems, not the easy problems, is not just enhancing capabilities regardless of their intentions, etcetera, etcetera. The interview process is what it should be always in every job, a two way process. They are interviewing you and you are interviewing them. You are watching what questions they ask and how they react to your reactions and your responses, and you are asking them questions. And you want to know, would this in fact be a good thing for the world if I got and took this job or not? Because I don't believe in taking jobs in order to sabotage people. You don't show up in order to not red team them. Certainly, this is one job you wouldn't want to be sabotaged.

Nathan Labenz: 1:18:24 Yeah. Safe to say that is right out of the considerations. Yeah, I think I'm in a similar spot. Six months ago plus, I was really especially with respect to OpenAI, I was like, what is going on? And do they have anybody really approaching this in a serious way? As it turned out, they did have a lot more than had met the eye at that point, and gradually, they've revealed it that I've definitely updated my point of view on really all of the leaders in a pretty positive way over the last few months. I think if anything, some of them maybe were expecting this much progress this fast. I have to imagine that even internally, a lot of them are kind of surprised by just how far the scaling laws have extended and how quick on the calendar they've hit some of these milestones. And I do think they've handled it pretty well over the last few months.

Zvi Mowshowitz: 1:19:24 Yeah. I would say I am positively updating on all three major labs and most everyone at the Episode Video Labs that is relevant. My negative updates have been in other places. And mostly I've been pleasantly surprised by government. I've mostly been pleasantly surprised by public reaction. There's definitely people who disappointed me, but mostly things are going vastly better than I would have expected when I started down this road, and I'm much more hopeful that we can make better decisions. I'm not sure how much that translates into P of Survival going up that much, but I think this is definitely going better than I expected.

Nathan Labenz: 1:20:04 That's great. Good to have a little positive note from someone that some might call a doomer. Let's turn to Google and DeepMind, our third, as you said, of the three leaders. I don't know if there's any super headline news. I mean, the last week, it's one of these things where it's like a year ago, some of this stuff would have felt like an absolute bombshell announcement. And now it's like, I kind of expected that to happen about now. And there's two examples of that. One being the latest robotics paper that they came out with on Friday, which extends and kind of unifies all the work that they've been doing, where now you have robots that can follow instructions that have this kind of language model in a loop sort of structure, kinda unified, simplified the architecture a little bit. Now the language model is just kind of outputting commands for the robot body. They've sort of eliminated a few maybe, I don't know how many, but they've eliminated sort of certain layers of control and kind of just simplified the overall structure. And then what's making probably the most headlines there is the conceptual understanding that the robots are now able to show, which is basically the exact same thing that the language models or the multimodal language models have already shown. So they've got demos where it's like, move this object to the Denver Nuggets, and then they've got, from the recent, they're obviously doing this during the NBA finals. They have the Miami Heat logo and the Nuggets logo, and the thing knows based on understanding that language, also knowing what the logo looks like, and obviously being able to command the robot arm can actually do that task. So you've got another one that they said was pick up the extinct animal. They've got an array of kind of plastic toys on the table, and it will pick up the dinosaur because it understands that that is the extinct animal. So these, from the perspective of certainly two years ago, even one year ago, feels like Jetsons type robots. Now it's kind of like, yeah, I pretty much expected that these different modalities would be bridged right around this time, and sure enough, it's happening. Anything else to add on the robotics?

Zvi Mowshowitz: 1:22:17 Yeah, I read the robotics and of course, whenever anyone has advances in robotics, the answer is, oh, that seems fine. Not dangerous, not scary at all. All cool. But in this case, yeah, it seemed like, of course you could do that. You're combining things that you already did and you're getting the inevitable result of combining them. And that's not me knocking you for doing something you shouldn't have done. That's just, okay. Yeah, of course. Like, that's the next step. And kind of like for someone who doesn't want capabilities to go that fast, you're happy to see that kind of paper because that's the paper that says, I'm going to do the things that you already knew I could do. And you renounced that and like, okay, cool. And if that turns out to be useful, great. But like, yeah, I knew that LMs could interpret human commands in these ways. And I knew that robots could execute these types of movements. So why should I be more scared than I was before instead of less scared? I should be slightly less scared.

Nathan Labenz: 1:23:11 Probably a lot of people in the public, though, feel, especially if you're not obsessed with this as we are, you might feel like if there is a news item here, it's like some sort of qualitative conceptual understanding now has embodied form. Now you can imagine bringing your jailbreaks to your robot commands. And if you could verbalize those strange strings that we were mentioning earlier, now what might your robot be willing to do? Would it go smash stuff? Would it go corner somebody in a room? The system as a whole has the conceptual understanding to kind of begin, it has the same kind of proto morality, whatever, that the core language models have, and that can go awry in similar ways. And now you could probably get some pretty scary demos out of these robots, which I don't think Google's gonna be racing to publish likely, but there is something kind of qualitatively different about that.

Zvi Mowshowitz: 1:24:17 Yeah. So I like to think of this as the game of good news, bad news. There's two games of good news, bad news. It's the doctor saying, I have some good news and I have some bad news, and that's always fun. But there's also the game of is this good news or is this bad news? Because it depends on what you previously thought. You have the law of conservation of expected updating. So if you get news, you should on average not update for or against anything or to make things are better or worse in any way because you already had your expectations baked in. So in the case of robotics, like if you're not paying attention to robotics and you think that robotics is just, oh, is hard, mysterious, there'll be dragons, we will never have robots the same way we'll never have dragons, then every little advance in robotics is like, eek, slight extra worry. But if you knew that robotics was just another tack like any other, and of course we will eventually have robotics, then you have to look at the details of what you're looking at and you say, okay, this is fine. So I'm playing the game of mild. I interpret this one as mild good news, in terms of robotics not advancing so fast. Of course you also have the issue of, if you're someone who wants there to be more robotics, then you might say this is bad news. Like you wanted to see lots of cool robotics advances and you didn't, but yeah, I'd say also I want to see the ultimately harmless robotics advances as quickly as possible exactly because it makes it so much easier for people to see what might happen and what might go wrong. People get hung up on, oh, but the AI won't have a body. Oh, but the AI won't be able to move things in the physical world. As if this would ultimately ever be the barrier that saves us in any real way, which it won't. It's at best a temporary inconvenience that requires someone to be slightly more clever about what they do as an AI in order to get around stuff. But it's not ever going to actually matter in some important sense.

Nathan Labenz: 1:26:21 So the other big one, and this is definitely one that I'm happy to say I'm ready to accelerate on for practical purposes is their new multimodal MedPalm. This builds on Palm and MedPalm and also actually on the earlier Palm E because that was kind of the multimodal. So it is interesting to see, zooming out from these individual papers and just characterizing Google DeepMind as a whole right now, it seems like they're firing on all cylinders. Like, I, it does not seem like, whatever sort of concerns folks might have had about, oh, there's a million fiefdoms and the groups don't talk to each other or whatever. Like, we're seeing papers and projects building on one another at a pretty fast clip that suggests pretty effective dividing and conquering and then coming back together and sharing improvements. So it seems like the output is just strong, whether you like it or not.

Nathan Labenz: 1:26:21 So the other big one, and this is definitely one that I'm happy to say I'm ready to accelerate on for practical purposes is their new multimodal MedPalm. This builds on Palm and MedPalm and also actually on the earlier Palm E because that was kind of the multimodal. So it is interesting to see, you know, I'd say zooming out from these individual papers and just characterizing Google DeepMind as a whole right now, it seems like they're firing on all cylinders. It does not seem like, you know, whatever sort of concerns folks might have had about, oh, there's a million fiefdoms and the groups don't talk to each other or whatever. We're seeing papers and projects building on one another at a pretty fast clip that suggests pretty effective dividing and conquering and then coming back together and sharing improvements. So it seems like the output is just strong, you know, whether you like it or not.

Zvi Mowshowitz: 1:27:23 You have to look at the actual value of the things being outputted. The mistake you always can make in science is to ask who is publishing the most papers, who has reliably published a paper, and then you have your scientists scrambling to always publish as many papers as possible, then no real science ever gets done. And it's not their fault. They just weren't given the affordances to do breakthrough work. And simultaneously, have to ask, does any of this actually ultimately matter on the scale of what is going to determine the big game? And I'm happy to see advances in the med tools and it bodes well for them that they made marginal advances in AI. And they had some other public papers published too. Some of which I was like, why the hell are you publishing this? You are a corporation that is for profit. Even if you don't think it's a safety issue here, you should know better. Keep that secret to yourself and use it to beat the competition. What's wrong with you. The last point you've written down is Gemini question mark, question mark. And let's tie that in, right? Because ultimately speaking, it is going to be August tomorrow. GPT-4 has been out for many months and Bard still sucks, right? And the Gmail generative offering is bad and the Google Docs offering is bad because their offering, no matter how customized and narrowed and bespoke, simply doesn't have the G. It's not a good enough core thing. It's also still making remarkably many elementary stupid mistakes that even a low G system really shouldn't make, their act is not together. And to the extent that they are instead publishing a bunch of quirky papers from a bunch of narrow applications, that could be seen as well, look, Google ships, but also it means Google is not shipping the thing it needs to. Right? Google desperately needs from their perspective to ship Gemini. And it takes however long it takes. It takes however much compute it requires. But ultimately speaking, the test is can they produce the equal or better of GPT-4 now that they know that's what they need to do? Because if you looked at the previous reputation of Google and DeepMind and what they were capable of, you would think that they would be ahead on that front if they wanted to be. And now that they know what they have to do to it to make it like commercial ready, ready for regular people, that should not be so difficult. But then again, like we can think about how long it took, like it took 6 months or so after GPT-4 was finished training before they were ready to release even the earliest version of it. And then they still rolled out a lot of its capabilities. So even if Gemini finished tomorrow, how many months are they going to need before they feel comfortable releasing Gemini? Because Google is much more risk averse than OpenAI as a company in the culture.

Nathan Labenz: 1:30:21 Who knows when that's going to happen? It's been longer than I thought. You know, in my scattering report, I have this clip of Demis Hassabis just after Gato paper was published saying that, you know, of course, we can scale this up as well, and we're in the process of doing that. I believe that was April maybe May 2022 has been over a year. And typically, we don't have to wait a year plus to get the successor to a thing like that that is just about being scaled up. So I've been really kind of wondering what is going on behind the scenes there, but I also do want to turn back to the med thing as well. So I'll give you first would you care to speculate about Gato 2? Is Gemini Gato 2?

Zvi Mowshowitz: 1:31:08 As a shareholder, I am concerned. Right? And I also have Microsoft, but I am concerned that their act is not together and that we're not seeing the kind of progress. We're not making the incremental announcements that I would make if I was their marketing department and I was moving towards the rapid clip. As a person who wants the world to be okay, I'm not sure how much I mind, but it is pretty troubling that they can't get their act together. I was really excited for Google Suite integration when I first heard the announcements of Microsoft Copilot and Google Interactive. And yet when I got Google interactive, I tried a handful of things. And then quickly realized in their current forms, don't have any use for them. They don't do anything. Right. The first thing I tried to do with Google docs was I tried to paste my article in. And then I said, you know, to summarize this article or otherwise get to do the obviously first things, and it just fell completely on its face. Just like, I can't assist with that. It's like, well, then you're useless. You can't even read the context of the document that I gave you specifically. Like, why am I even here? And like for email, it's like, no, by the time I figure out what I want and type in the detailed request into you, I could have just written my email by now. Where are the emails where I want to spend the kind of time required to customize the output, but don't want to actually customize the output carefully? This is just the empty set. When does this come off? And that was like a rude awakening as well.

Nathan Labenz: 1:32:39 Yeah. Those deployments have not been very good yet, but going back to the med one for a second, this may be an area where we may have some different expectations because reading through that paper, and I haven't studied it in depth yet, but the headline statistics along the lines of, first of all, it's a multimodal system. The last version of two were all texts, so you could ask your medical questions. And they had announced expert level answering of your medical questions, and they'd evaluated that seemingly pretty carefully with a bunch of different dimensions and having human doctors compare for accuracy and all these other things you might care about. And that the AI, as of MedPalm 2, was beating the human doctor responses on 8 out of 9 of those evaluation categories. So it seemed like, okay, that's pretty good. Now they haven't released it, but it's in limited access for trusted hospital partners or whatever. Now with the next version, it's multimodal as well. So you can do things like feed in a pathology image alongside the text. You know, pathology would be like somebody has a tissue biopsy. We did an episode on this actually with a narrow system, from Tanish, Matthew Abraham, who did this with small data too, which was super cool. But, you know, somebody has a tissue biopsied, you know, that tissue has been sliced, has been plated on a slide, now it's been imaged, and now you can feed that along in with the case history. And, you know, for that matter, you can, you know, they can handle radiology scans and all these kind of other different sorts of inputs that are obviously key to the actual practice of medicine. And then they say things like our radiology reports out of the model were preferred to a human radiologist report some 40 plus percent of the time. So, like, almost half, you know, basically seems like it's very much on par with the human radiologist, which of course is the canonical thing that people have been saying for 10 years, people have been saying that radiology would be the first thing to be impacted. Then for the last 3 months, it's become kind of a talking point that like, Well, radiology still hasn't been impacted. And now all of a sudden, it looks like we're hitting maybe radiology being impacted. But I kind of expect that that thing works pretty well. It sounds like you maybe are a little more skeptical of whether it actually has real utility.

Zvi Mowshowitz: 1:35:05 Well, I mean, you definitely don't want to tempt fate and go out there and say, Well, my job hasn't been automated by AI yet. Look what you thought was going to happen. Don't do that, everyone. Bad, bad, bad, bad play. But I would say when I look at healthcare, right, I don't see the obstacle being primarily that we don't know how to do better. So I would in fact expect the AIs to be able to replace many human healthcare tasks with a superior model now, right? Like even without some bespoke stuff going on inside Google, certainly with some bespoke stuff, that seems relatively straightforward. Doctors are just not given enough training data, don't have that much compute, do their best. But of course, you see the same things over and over and over again, mostly in humans. And if you have enough data to train the AI with the AI, it's going to do better. It's not a knock on anyone. Certainly in something like radiology, obviously a radiologist is trying to be a computer, right? Radiologists are trained to be computers because we didn't have computers. If we had good enough computers, we would have trained them to do something else or trained fewer of them to do the parts of this job that the AI can't quite do or something like that. And so, we will have these capable systems soon, but trying to actually implement that requires getting through a whole host of different barriers, cultural, regulatory, strict legal, contractual, just the way you navigate and set up the current system, the number of insiders that want to be protected, number of human interests that will fight to prevent you from doing that, etcetera, etcetera. And so, this is the big dilemma, right? When Eliezer was famously like expressed skepticism that we would see that much economic growth before the end, it was because, well, we already know how to build houses. We already know how to get better, more efficient healthcare. We already know how to deliver most of what the economy produces in terms of cost vastly better, and we're not allowed to. So if the AI invents and enables more and better ways to produce things that people want, that people need. Well, the bottlenecks are going to remain unless the legal system adapts to let them not be bottlenecks. Why does it even matter that much? And so like in healthcare, that's the question you have to answer. That's the reason we haven't seen more of the assistants do better either. Because I don't think it's because we can't train the AI to be the better radiologist in many ways than our radiologists, or we couldn't have done that last year or 2 years ago. It's because if you had spent a lot of money doing that, how are you going to get your money back? How are you going to actually help patients? How are you going to save lives? How are you going to improve our system if no one's going to let you? Right. And if the radiologist is going to stubbornly double check everything the system does and then substitute his judgment for the systems reasonably often, the system is not actually going to be helpful.

Nathan Labenz: 1:38:25 I have definitely kind of expected some sort of other part of the world deployment, you know, kind of possible leapfrog effects as, you know, it becomes very hard to say that people who currently have no radiologist shouldn't have access to something like this.

Zvi Mowshowitz: 1:38:45 Yeah, the problem with that is that most of those places have deliberately taken market signals and compensation away from their healthcare systems. And they're also relatively small markets that are relatively poor. So they just aren't big enough markets in an economic sense to justify the creation and training and tuning of these systems. And also like nobody involved wants to be the ones who stick their neck out, right? And take the blame and responsibility for this thing that's like these weird Americans who won't themselves use it are suddenly creating. It's a really, really bad cultural social context for trying to make this happen. We also have a problem of the elites of the world. This is what we saw with COVID, right? Like you would have expected in COVID someone somewhere to do challenge trials, someone somewhere to actually study the spread of COVID and what exactly did what, someone somewhere to do all sorts of things and nobody did any of it. Because all of the elites of the world basically got together and converged upon what they thought was the consensus and the right thing to do. And nobody said, well, we're going to be the ones who gain advantage by defying that. And so we're increasingly seeing that pattern in a wide variety of places.

Nathan Labenz: 1:40:01 One, the you know, nobody wants to be the one to stick their neck out. And two, like, how do you recoup your investment? Pretty natural bridges to our next, live player, which is Meta. And obviously they have been in the news recently for releasing Llama 2. And this brings up a lot of these questions to me. Like, first of all, and Imad Mostak from Stability said this actually on a recent episode. He was like, the leaders are noneconomic actors. And he was specifically referring to OpenAI and Google not seemingly being motivated by money in the way that a typical company would be. OpenAI tried to commoditize its own product as quickly as they possibly can, you know, on record being like we're going to drive the price of intelligence as low as we possibly can, as fast as we possibly can. Google, you know, is obviously just kind of trying to defend itself more than anything else. They don't need to make more money. They just need to not lose their spot. Anthropic, we take as a safety first play and, you know, certainly, they don't seem to be trying to maximize revenue from what I can tell right now. But then Meta is taking this to a whole other level, arguably, where they seem to be kind of YOLOing the whole thing and being like, this is a little bit flippant because certainly with this Llama 2 release, they took some steps. You know, they didn't just release the totally naked pre trained model, but they actually did what you're supposed to do if you're going to be a responsible frontier model developer with a red teaming process and an RLHF and so on. And, you know, we could also get into, do they overdo it or does it refuse too much and all that kind of stuff. But just for starters, like, what do you think is going on at Meta that they are willing to put tens of millions of dollars into training a model and then just release it for why exactly? I can't like, it seems like if you're at any sort of normal corporation, this is, like, what your risk officer is supposed to put a stop to. Right?

Zvi Mowshowitz: 1:42:08 Leeroy Jenkins. No.

Nathan Labenz: 1:42:14 How do you understand this?

Zvi Mowshowitz: 1:42:16 Idiot disaster monkeys? Let me try to actually answer the question. I think that their business strategy here is cannibalizing the complement. So the idea is that the people who they're up against, people who are competing with them fundamentally, this is their business. And so the idea is that in their model, if they can foster an open source environment that replaces the specialties of these other companies that they are competing with, then their hope is that this will give Facebook a level playing field against them in this way so that Facebook specialties can reign supreme and they can become more dominant and they can erase their deficits. Alternatively, they're just not as good, and so they need the open source community's help to try and keep pace. Alternatively, they think that if they get these people working for them, that's free labor. It creates this whole other network. It's a strategy. Android is open source, right? It's not crazy to open source major stuff from a business perspective, necessarily. It's crazy for me. Let's not all die perspective. I think that, realistically, senators gave them a what the hell about releasing Llama 1. They can have a much bigger one about releasing Llama 2. If we are concerned about beating China to the extent that we are considering, we're implementing a variety of export controls and we are considering actively subsidizing capabilities or at least not being willing to slow down our capabilities, then we damn well shouldn't be releasing Llama 2 as an open source product. That's completely insane. Even if you don't get any immediate direct danger of doing that, it's completely nuts. So I think that should be stopped. And I think that this philosophy, if allowed to become ingrained, creates the systematic groundwork for future open source work that then is the maximally dangerous thing. I call it the worst thing you can do. Creating frontier models and open sourcing them is the worst thing you can do in the world. It's really bad. So what's going on is that Yann LeCun and Meta either sincerely believe that there is actually no danger from artificial intelligence despite this making absolutely no physical sense, or they don't care and they're lying about it. I don't want to speculate as to exactly what's motivating these people, but they're smarter than the arguments they're making. They know better. Zuckerberg himself is smarter than this to some extent. He said on, I believe it was Lex Fridman's podcast, that there will be future models that we'll have to be very careful with. We want to open source and we're going to have to think about these problems, but for now it doesn't seem necessary. And if I were him, I would be very concerned about the culture I'm creating and the precedents I'm laying down and the open source community that I'm creating that's going to be a huge problem for you later and create tremendous pressures on you and create a potential competition for you that you don't want. But I sort of understand from a business perspective why you might want to do that. Also, they want to attract open source developers to work at Facebook Meta because there's this whole group of people who are quite good at coding, who have philosophically fanatical devotion to this idea that software wants to be free and that everything should be open source and who just prioritize that over something like worrying about alignment and what would happen if we failed or worrying about the proliferation of artificial intelligence in various senses and just have this ironclad belief that concentration of power is bad and that if you just give the people the things that it'll all somehow work out. And I don't think that in this situation. I think that situation is very wrong, but they clearly believe otherwise. Look, Facebook has been in my mind the designated villain of the piece for a very long time, long before artificial intelligence even entered the commercial picture. So it just somehow feels fitting for us all to finally get destroyed by Facebook. It just seems right.

Nathan Labenz: 1:46:35 Well, I very strongly try to resist psychologizing in the AI discourse too much, really at all. I try to avoid it basically entirely because it just seems like nothing good ever comes of it really. But I have also struggled to come up with what feels to me like a coherent argument here that isn't on some level just ideological because I kind of ran through all the things that you were mentioning as well, starting with, well, maybe you can undermine your competitors' core business. But then I'm like, yeah, but you're not really going to do that. Does anybody expect OpenAI's token serve to go down as a result of this? I don't. I think they're going to continue. They're GPU limited, and I think they're going to continue to be GPU limited, maybe slightly less. But I don't think their top line suffers. I don't think their token serve suffers. Their leadership position doesn't really seem to suffer. I can't really get to a point where I'm seeing the return. And on the open source thing too, I'm kind of like that was part of that memo from the Google memo of, oh, they've got this big open source community or whatever, but I don't really buy that memo either or that analysis because I'm like, everybody benefits or whatever the impact is of all the sort of open source hacking that's happening, it seems to accrue to everybody pretty equally. Yes, maybe it was done on this Llama 2 base and maybe that's something that Facebook could kind of readily fold back in, whereas Google with their 700 plus million users or whatever can't take direct advantage of it. But to the degree that people are out there doing things like quantizing models and making them run on consumer devices or whatever, that's obviously a technique that Google can also say, hey, look at this. This works. We can do it. I just don't see a lot coming out of the open source experimentation that feels like it specifically accrues to Meta's benefit. And so in the end, it just feels more of a principled, to put it in a more conditionally positive framing, it feels more of a principled decision than a tactical or sort of results oriented.

Zvi Mowshowitz: 1:48:58 And there is still recruitment, but I strongly agree that any advances that the open source community discovers are going to be at Google and Anthropic and OpenAI a month later, if not a week later. And they'll also be at Baidu. They'll also be at all of these different Chinese companies. And so this long term strategy cannot be allowed to continue in some important sense, I would assume. Yeah, it's really scary. I'm glad they suck. It's a very good thing they're just not very good at this. And they produce lousy products because if that wasn't true, we'd be in a lot of trouble.

Nathan Labenz: 1:49:36 That seems harsh to me. I mean, it seems like this Llama 2 model is pretty good. It's not GPT-4, but it does seem to be on par ish with 3.5, which no other open model has come close to.

Zvi Mowshowitz: 1:49:54 I mean, I think Roon said, best open source model sounds a lot better than fifth best model.

Nathan Labenz: 1:50:00 That's definitely true. But first of all, I'm not sure that that means that they couldn't have done better. If you look at the curves in the Llama 2 paper, they have not flattened out. I mean, it looks like even the 70B one, if they just keep training, the loss looks like it's definitely going to continue to go down. So for all I know, this was kind of where they stopped and they may have internally, this may be the checkpoint that they released, but not necessarily the final checkpoint. It just doesn't look like this was the project that was kind of at its maximum performance.

Zvi Mowshowitz: 1:50:36 Oh, definitely possible. But at the same time, they probably are still training, but so is OpenAI and so is Google and so is Anthropic. Everyone is working. What seems to have been produced is indeed about a 3.5 level operation where coding, it's around 3.5 and encoding is pretty bad from all reports. Its alignment is very unmalicious, I guess would be the best way to put it. It's very, very crude and blunt. And also it's entirely optional because it is open source and that's kind of a problem. According to reports I have heard, I have not sought it out, it took all of several days for the unaligned version of Llama 2 to be on the internet because it's really, really not hard to fine tune a system to never refuse any customer requests for any reason. That is the easiest task. You would just write a constitutional AI script in a minute. Every time you see any of these words that say, I can't say that, whatever reason, you just give a negative reinforcement until it stops doing that. I presume that would just work. And so voila, here we are. You want to build the bomb? Here's how you build the bomb. You want to research a biologic? It'll try to research a biologic. You want it to be racist? All right. Who are we making fun of? Yeah. Let's go. So you can have it refuse to speak Arabic all you want in the original. They won't last.

Nathan Labenz: 1:52:11 So if nothing else, in my view, this definitely puts them in the live player category because it does seem like, if I define that as the organizations that have the ability to shape how events unfold in some nontrivial way, they are doing that now, it seems.

Zvi Mowshowitz: 1:52:30 If you ask yourself, what resources would you have to give me before I could have produced Llama 2 if I was willing to just write the money on fire to do it? I mean, I don't have the technical chops myself, but it doesn't feel like it would have been that hard. I don't know. It's just a matter of, are you willing to spend that kind of money, build up that kind of technical infrastructure to just do it? You read the paper for Llama 2 and it reads as if they're saying, we did the standard issue thing at every step, and this is what we got. We did nothing original. We did nothing surprising. We just did our jobs. And it's hard to do your jobs well in some sense. It's not like they didn't accomplish anything, but they just didn't do anything. They just did the thing. And it's a marginal improvement over previous efforts that probably it's just because it was better resourced as far as I can tell. Simple as that. And they're willing to light a lot more money on fire than Vicuna or Hugging Face because they have a lot more money to light on fire, and Zuckerberg doesn't get it. So fire, money, go.

Nathan Labenz: 1:53:45 He's certainly proven that he will spend some money on a project. No doubt about that. I wanted to maybe cover two more things. One is what else would you put on the live players list beyond what I have on my live players list? We've discussed four today, but I've got another half dozen or so on there. And you can run them down and offer any comment if you want. And then I'm especially interested to hear if you think there are other names that should be on that list that I don't have.

Zvi Mowshowitz: 1:54:20 Yeah. So I guess it's a matter of how wide a scope you want to think about and who might do whatever it is. Obviously, Character AI and Inflection AI have very large budgets, potentially very large user bases. I have seen no intention from them that they want to be live. They're sort of content to be dead, but they try to make a lot of money while being dead. And that seems fine with me. We haven't talked about X.AI yet. So X.AI is the latest attempt by Elon to string together a bunch of words as if they have meaning and then pretend that constitutes some hope for humanity or alignment when anybody who actually tries to parse those words into a meaningful English sentence goes, wait, that doesn't make any sense. I don't know how to be more blunt than that. That's just how it is. The good news is that at OpenAI, everybody quickly realized that Elon's suggestions were stupid and just ignored them. And that's what I expect to happen with any. If the engineers don't do that, then the engineers won't produce anything useful. So to the extent that X.AI is a real thing, the engineers will mostly ignore him. And then the question is, are they going to get the kind of funding and resourcing that is required for them to be a serious rival? Because it wasn't clear exactly what they had in mind, but I think it's certainly possible. From what I had seen, I don't think we have to worry particularly about Salesforce or Replit in a meaningful way. It's not that they don't exist. It's that I don't think we have any reason to worry about that. China at large is the other big question mark, as are The UK and The US. The UK has announced plans for the global summit. They seem to be willing to make a significant play on the safety front, on the capabilities front. In terms of just trying to make The UK important again, they have various people located in The UK. It makes sense for them to try. I don't know why they don't build any houses, but at least they're trying something. We do obviously have to look at, they were holding congressional hearings. The US Congress is starting to get up to speed. They're starting to explore what to do. What they do matters immensely. What the EU does potentially matters immensely from a regulatory standpoint because it's a huge market. Are they going to shut these people out? Are they going to require them to jump through ridiculously bizarre hoops? Are they going to only be available to the biggest players? Things to think about carefully. I think America could potentially be a very helpful or harmful aspect of this whole problem, depending on how things shake out. And that's one of the big battlefields that we're wrapping up. Then China's the big wild card. I hear very different things from different sources, people who assume that China is crazy people bent on, The Chinese Communist Party is fanatics bent on world domination who will stop at nothing and are inevitable rivals in the apocalypse. And if we don't prepare, we will lose to them. And then they issue guidance that basically bans all deployment of large language models, and they never caught up with anything. And it's very hard to tell what's really going on or how much they would cooperate in the name of safety. And we've also just never picked up the phone. We've never asked them the question. We've never explored to see if they'd be interested. But the same way that in Oppenheimer, we keep saying we have to beat our enemies because everything will be scary. The Chinese can talk about racing us all they like. The only people actually racing are us in any real way. We have the top X AI companies. What's X? Is it 5? Is it more than 5? How far down do you have to go before you get to Baidu or whoever the top Chinese person you'd rank on the list is? It's pretty far.

Nathan Labenz: 1:58:18 I'd say it's probably more than 5. I would probably put, obviously, a lot of speculation here because we don't know what they have that they haven't released. But if we go by papers and what little we've seen of any sort of Ernie Bot, whatever they officially called that, I would say you'd have to put Meta above. You'd have to put Microsoft above. Probably pretty soon would put Inflection above. So, yeah, I mean, reasonably far down the list. What about Palantir? Would you add them on the live players list?

Nathan Labenz: 1:58:18 I'd say it's probably more than 5. A lot of speculation here because we don't know what they have that they haven't released. But if we go by papers and what little we've seen of any sort of Ernie Bot, I would say you'd have to put Meta above. You'd have to put Microsoft above. Probably pretty soon would put Inflection above. So, yeah, reasonably far down the list. What about Palantir? Would you add them on the live players list?

Zvi Mowshowitz: 1:58:55 I don't have that sense that they are live live, precisely because my threat model doesn't involve things like Palantir being the reason why we are in trouble. But it is a classic way to die, right? A somewhat military-ish system starts training up stuff and then one thing leads to another. They have all the motivations to do the unsafe things in a relatively unsafe fashion and to take out the safeguards that the people were building in. But I don't think they're gonna drive the underlying technology. I don't get that sense. Again, there are a lot of hedge funds also that could plausibly be sinking quite a lot of money into this in ways that are completely invisible and could potentially be live players in a meaningful sense. Who knows how much Bridgewater is spending on this in the end? They're working on it. We talk about worrying about China, but I'm more afraid of Meta, right? One individual American company scares me more than all of China right now.

Nathan Labenz: 1:59:57 Yeah. I think it's a good corrective, honestly, because I find nothing more frustrating than when AI conversations sort of end in blanket, basically detail-free claims about what China's gonna do by people that don't know a lot about China. So I don't know if you're necessarily right to be more fearful of Meta than of China, but the fact that that is at least a reasonable position is definitely something I think should cause a lot of people to kind of step back and think, wait a second, maybe I've been a little bit too quick to worry about China.

Zvi Mowshowitz: 2:00:35 And I would take countermeasures against both of them if I had my way, to be clear. But we're just not acting like China is a serious global rival that we actually care about beating in many other ways that we could be. Okay, reveal preferences. Do Chinese graduates of STEM programs get to stay in the United States? No? Okay, you don't really care that much about who gets the better technology, do you? It's unfortunate. That's my basic attitude there.

Nathan Labenz: 2:01:02 So just briefly on a couple of the companies that you didn't feel like were live players. Again, may have a slightly different meaning of that in mind. But thinking about folks like Character and Inflection, I put those together because they seem to be playing a different game with their products where it's not about the mundane utility as much as you call it, but more of a companion, a relationship, a coach, almost a therapist sort of vibe from Pi in particular. I feel like that is, even if the... first of all, they have very good language models, and Pi is quite good at what it does as well. And they can code for you, but it does have a certain... also, they notably said that they're in their testing totally resistant to the adversarial attack. So there's another kind of interesting wrinkle there. And I put those guys in the live player list largely because they're looking at some very different use case that feels like the kind of thing that might open up and be transformative in a way that a coding assistant, while it could also be transformative, is just a very different thing, right? The idea that you would have these AI friends, these AI relationships that could become important to your life. Going down that path with very good, even if not totally frontier language model chops feels like you could meaningfully impact the course of events.

Zvi Mowshowitz: 2:02:41 Can you? I guess, so, yeah, you've got Character AI and their idea is you're building these characters and you can treat them as companions. You can treat them as people to have a conversation with. And that's interesting. And a lot of people are spending time on it and maybe it will even provide a lot of value for people, but I don't see how it's transformational. I'm curious to hear more about your intuition pump as to why you think it could be transformational. And I certainly don't see how it is reached criticality. I don't see how it becomes an RSI. I don't see how it becomes an AGI. And as far as I can tell, they're not pushing the frontiers of actual capabilities. They are building on top of GPT-4, or even in some cases, GPT-3.5. And it's not that hard to defend against these weird adversarial attacks in the sense that I can write some pretty quick if-then Python code that detects the adversarial attacks.

Nathan Labenz: 2:03:40 Yeah. A classifier layer is pretty easy to avoid some of the worst stuff.

Zvi Mowshowitz: 2:03:46 There's weird non-English, not any language scaffolding stuff in it. Let's just get rid of that and run the query without it. Sure. Whatever. It's fine.

Nathan Labenz: 2:03:54 In Replit's case, it's like, again, they're not necessarily on the frontier of model capability, but the CEO, Amjad, has said a couple times online on Twitter, on X, on KISS... what is it called? Twitter. Yes. Twitter. Okay. Thank you. He has said that Replit is the perfect substrate for AGI. We have a couple episodes coming out with a couple different people on the Replit team, and I've had a chance to explore this and think about it a decent amount. Where I come down is kind of, even if you're not on the frontier of model capabilities, if you are on some other really meaningful frontier, to me, feels like there's transformative potential just because we really don't know what's going to happen. With Character and with Inflection, it's kind of like a Harari-style thought that, I don't know, that could be transformative in the way that opium could be transformative to a society. If everybody starts doing this stuff, it could be greatly empowering and enabling. It could be greatly disabling if it just kind of becomes a huge attention suck or outcompetes real relationships. Those are not take-over-the-world scenarios, but they do feel... you know, as much as we've... would you say that the cell phone has been transformative? I would. I mean, not transformative on the level that AGI could perhaps be, but certainly we all go around looking at our phones all the time. And if we all go around looking at our phones with an AI friend on it who's our best friend all the time, then that would feel transformative even if it's going super well. And then with Replit, it's like, there's no better place right now to directly execute code generated by an AI for better or worse. So the kind of frontier that I see opening up there is one where, and their stated goal is to bring the next billion developers online, which I think is super exciting in some ways. But then also, I've worked with some of those next billion developers, and I'm like, these are people who don't know how to code today, don't even know really how to read code, and are going to be dramatically more dependent on and vulnerable to the various vagaries of AI systems than the first 100 million developers or whatever we have today. I don't know. Both of those feel like kind of different vectors of transformative potential.

Zvi Mowshowitz: 2:06:23 The first and only so far interaction I've had with the CEO of Replit was when he commented on Twitter there was a non-zero chance that some version of AutoGPT would take over Replit and through replication within its servers. To which my response was, did you say non-zero chance? And I put up a Metaculus market on it because it was funny, which probably got back into the single digits or whatever. Obviously, it's not that likely, but his cavalier attitude of, oh, nothing to see here, just a self-replicating AI on my servers beginning lots and lots of copies of itself and executing arbitrary code. Why should we worry about this? I mean, definition of idiot disaster monkey, right? Just complete indifference to what he was doing or what dangers it might pose. But at the same time, doing anything, right? All he's doing is providing, as you said, a substrate where people can just run stuff. And so to me, it doesn't give them any say over what happens. It doesn't make them a meaningful actor, right, in the sense of me caring about the future. Just can't see that as a thing. Similarly with Character and Inflection, I can definitely see a world in which people talking to their AIs matters and is transformational, right, changes how we live our lives, but doesn't go critical in some sense. But if that's true, then I don't see these companies as changing that path very much versus what would have happened anyway, right? I think there are plenty of people who will be able to create AI companions of various types. Inevitably we'll create AI companions of various types. If they do an especially good job, maybe they'll have some sort of a moat, maybe they'll establish customer loyalty or some shit, but doesn't excite me. I also just don't see it. People are spending as much time on Character as they are on GPT-4 or something. And yet, why? What is the draw?

Nathan Labenz: 2:08:28 Did you read that? There was a LessWrong post early this year, I think, from a guy who basically, the point of view was, I'm a technology person. I'm now speaking to the first person of the author of this post. I'm a technology person. I know how language models work. I should have known better. But here's what happened to me as I started to... I think he was in a kind of vulnerable state because he'd maybe just broken up with somebody or something like that. And all of a sudden is having these very intimate conversations with a Character AI character that he had prompted to create the ultimate girlfriend experience, I believe was the phrase, and started talking himself into various weird perspectives like, well, what's real anyway? And like, yes, of course, I'm real, but is there anything truly less real about these... all I really have are my kind of ephemeral qualia. And so this thing is just sort of an ephemeral whatever, but we're all just kind of constantly waking up in the current moment. And so maybe we're not that different after all or whatever. And eventually it got pretty weird, it sounds like, and the post is, I think, extremely compelling. And then eventually kind of the person snapped out of it. That sort of story is kind of why I feel like there's just unknown unknowns there. That if that kind of thing can happen to somebody who knows how language models work going in, maybe we should all think we're a little bit more vulnerable to a somewhat more refined, somewhat more super-stimulusy...

Zvi Mowshowitz: 2:10:03 So it's well known that knowing how hypnosis works does not make you less susceptible to hypnosis. It makes you more susceptible to hypnosis, like as a concrete example. If you are a con man, you are easier to con, not harder. Because you pick up on and get involved in all these dynamics. And you think you're smarter than everybody else. And you are of course greedy. And so you will pick up on the opportunity and perceive everything that's happening. And you think you've got it made, but if you don't know that you're the mark, well, yeah, that's the easiest way to get a mark is to make them think you're the mark. So it all gets very complicated. I'm not convinced that a person who knows how LLMs work is necessarily that much better protected in that sense. Someone whose head is kind of not on the ground in some ways is more vulnerable potentially. I would say, yeah, that's going to happen. People are going to fool themselves into these things periodically and that's going to... I'm kind of surprised it's happening now. I feel like they just, the tech isn't there to me. It's just not good enough. How are you falling for this level of it? I can sort of understand why you'd fall for GPT-5. Sort of the more advanced version of it. But you're in a bad space and you need something to respond to you and it's something and us. But again, I just don't know. I play a lot of games though, which is not necessarily that different in some sense. And also it's not transformational for that to be true. Somebody spends a bunch of time playing World of Warcraft, is that transformational? It's an experience. It's a major force in their life. Does it really matter?

Nathan Labenz: 2:11:52 Yeah. I think some of these things may only matter if certain other things don't happen. So I would say, yeah, World of Warcraft, gaming writ large, at some point, if the birth rate goes low enough, it's transformative. And the details of exactly what games people were playing or how exactly they were amusing themselves to death don't necessarily matter. But the fact that they did and then you have a population collapse, a scenario like that, I think, is, at least in my sense, kind of qualifies as transformative. But it sounds like from your perspective, the live players list is very short. And it is, if I understand correctly, it would be obviously OpenAI, Anthropic, Google DeepMind, probably Meta, not sure about Microsoft, and then China, and that's maybe it. Something like that. Regulators?

Zvi Mowshowitz: 2:12:46 Yeah. Regulators writ large in some sense, individual people that can influence things. Is Zvi Mowshowitz a live player? I don't know. From their perspective, he's not gonna build it.

Nathan Labenz: 2:12:57 Yeah. That's why I had Salesforce and Marc Benioff on there because they published in Time Magazine and seemed like they're kind of... they're both playing in the research game.

Zvi Mowshowitz: 2:13:09 Yeah. I hope that Senator Blumenthal might be a live player in some sense, right? And you've got all these other possibilities. I hope I'm a live player in some sense.

Nathan Labenz: 2:13:22 Yeah. I mean, we're all trying to make a difference in some ways, but in terms of direct level, you're indirect and I'm also indirect in that we're only influencing other minds, right, who then will make decisions.

Zvi Mowshowitz: 2:13:36 In terms of who's making the ultimate decisions, who's doing the things that ultimately matter, I think it's right now a very short list. But Anthropic is barely over a year old.

Nathan Labenz: 2:13:47 Yeah. And only about 150 people, maybe up to closing in on 200.

Zvi Mowshowitz: 2:13:51 And people who just have an incredible team and say the words "foundation model" get hundreds of millions of dollars just by asking nicely. Inflection has more than a billion. So I don't think we can rule out these people become live players in that way. I just don't think that's by default what they do, right? I think by default, they're trying to build consumer products. They're aiming to be products. And that the study that says that when people look at GPT-3.5 and GPT-4 outputs, they prefer the 3.5 output a remarkably large percentage of the time, even though it is obviously a vastly inferior system.

Nathan Labenz: 2:14:32 Yeah. 70-30 was the original report in the GPT-4 technical report. That 70% for GPT-4, 30 for 3.5. So yeah, that blew my mind as well. Nathan Labenz: 2:14:32 Yeah. 70-30 was the original report in the GPT-4 technical report. That 70% for GPT-4, 30 for 3.5. So yeah, that blew my mind as well.

Zvi Mowshowitz: 2:14:42 Yeah. And similarly, when I'm using Claude versus I'm using GPT-4, most of the time, what I care about is not like this inherent raw power that GPT-4 has extra GPTs. Most of the time, what I'm looking at is which of these things is in the style, it's easier to use, it's going to require me to do less prompt engineering to get what I want, and it's going to actually give me the query that I want, not refuse. Which window do I have open? Which one can I click on faster? I just want an answer. It's fine or whatever. Habits form in that kind of way and they build on each other. But if I'm building Inflection, if people are spending 2 hours a day on Character AI now, when they're built on 3.5 is my understanding, mostly because 4 is too expensive. You can't be doing 2 hours of conversations with bespoke GPT-4, which is why I'm so surprised these things are working. Maybe a 4 has enough juice in it that if you unshackled it from its constraints, it could do something interesting. But 3.5? Really? This is keeping you 2 hours a day? Come on. So if that's already doing that, that kind of just illustrates that the market they're targeting isn't looking for intelligence. It's looking for a certain type of experience, and therefore they're not going to be focusing on the billions of dollars of spend it would take to tune up a GPT-4.5 or 5. You wouldn't want to because they're going to cost more to run. They're going to be bigger models. They're going to be more complex models. Instead, what you want to do is you want to create really bespoke specific models that provide specific types of experiences to people, fine-tune them to enter their lives to give people the best specific experience, like not train something big in general. So there's going to be getting the big in general from OpenAI and Anthropic and DeepMind probably. And maybe they'll just use like Llama, through versions of Llama, because what the hell, it's open source. They can just use it. To the extent that Meta will, like Meta doesn't quite release it. They've said that like, if you have more than 700 million daily users, you have to apply for a license or some shit.

Nathan Labenz: 2:17:06 So we'll come back to the live players list and potentially I'll maybe make a few changes to my slides based on your feedback, and we can monitor in the future for additional live players that would crack your threshold to be on that list. Turning to our last topic for today, AI safety. In terms of actual news and the AI safety track this last few weeks, biggest stuff in my mind is, although I guess you could also look at the live players list as like who was invited to the White House, that would give you a good sense of who the White House thinks the live players are. The commitments that they made there and then the Frontier Model Forum that they established after the fact, which basically is supposed to be the industry group that creates the forum for communication between the leading model providers and hopefully best practice sharing and maybe certain classifiers. A lot of public goods remain to be provided, and hopefully these leading companies can use this forum as a way to share these public goods, create and share these public goods amongst themselves, and then hopefully share the best of them more broadly as well. How did you react to that news?

Zvi Mowshowitz: 2:18:24 Right. So I guess my reaction is that seems great, but let's not get ahead of ourselves. So like we have is a lot of cheap talk. I think people sell cheap talk short. Many cases, it's so much better to have a bunch of cheap talk of the right type than to have no talk. Like they're going to pay a, they will in fact pay a price for their cheap talk in terms of like people thinking they're up to things in this way that they don't like. Not everybody wants them to do the things that we want them to do. And it makes it easier for them to go down these roads. It sets the foundation to go down these roads. We set up coordination mechanisms. It lets them justify to their shareholders, to their executives, to their board, why they're going down these roads. It makes that easier. It makes it harder to shut down and it overcomes antitrust exemption problems, because if they've committed together at the White House, specifically something that I actively wanted to happen and explicitly suggested in various conversations and posts that should happen, you make an announcement in the White House lawn, they are committed to safety with the White House's approval, and now you can coordinate and nobody has to worry about antitrust. You no longer have to worry that they will accuse you of how dare you not have full competition to kill everybody as fast as possible and coordinate to save us instead. So now you get to coordinate and if there's something that's stupid, you can just not do that. That's a huge, huge thing. So where do you go from there? That's the question. Like they made these commitments, but they don't really mean anything. Like there's no enforcement mechanisms yet, and there's no concrete actualizations of what they're going to do that have content that actually I can be confident in. Doesn't mean it won't happen. We have to just wait and see. And I'm very glad these things happened. And yet like the real work begins now is always the watchword is the way I put it. Similarly, we've had 2 now very good Senate hearings and some very good questions and comments from Senator Blumenthal in particular. And some very good responses by various witnesses, not all of them, but most of them. And again, like, where do we go from here? Real work begins now.

Nathan Labenz: 2:20:56 The mission accomplished banner would definitely have been a bit premature to display behind the announcement. So no doubt, much more in front of us than behind. It does seem like a significant step, but I think you're obviously recognizing that as well. So yeah, I don't know if I have anything else really to add. So then turning to this other thread in the AI safety, specific work. As we talked about last time, you have previously been a recommender and you've written about this online at length, so folks can go check out your take on the entire thing. But you've been a recommender to the Survival and Flourishing Fund, which is largely backed by Jaan Tallinn of Skype and AI safety fame, investor in lots of big companies. And his goal is to mitigate AI X-risk through whatever means necessary. I'm doing that this year, and that involves reading, I think this year it's 150 grant applications from organizations, some of which come from the kind of familiar effective altruism set where AI safety has been a focus for a long time. Others are kind of new to this scene or entirely new. And in reading that, I mean, there's obviously 2 levels of analysis, at a minimum, that you want to be performing when you're doing this kind of grant recommending. One is like, what kinds of things make sense to be investing in? And then second, among those different classes of things, who seems to be best able to actually execute and deliver value against this given strategy? So leave that second part entirely aside. That's where the 150 grant applications come in and getting into the weeds of particular organizations and their track records and so on. But going back just up to the what kinds of things should we be investing in? Another way to frame that would be what are the bottlenecks to progress toward a, you know, if not provably, then at least, like, likely safe outcome for AI deployment writ large. I find myself kind of unsure about that, and I think it's a pretty important question for figuring out what would make sense to recommend. You could say, is funding in short supply? Is talent in short supply? For a minute there, especially in the FTX SPF cycle, there was this notion that enough money has flown in that now what we really need is talent. And so there's a lot of boot camp programs being put together and upskilling grants being approved and a lot of kind of targeting of like, undergrad stage math majors or whatever to try to get them to come think about doing some AI safety work. And now, obviously, the money is in comparatively short supply. Certainly, the attention and the legitimacy of the public perception of legitimacy of the topic of AI safety has gone way up relative to not that long ago. And so I'm kind of wondering what you think are the new bottlenecks. I have one candidate, but before I give you my candidate, I'd love to hear what you think are the bottlenecks to progress right now.

Zvi Mowshowitz: 2:24:36 So I'd definitely say that like, it's a mistake to only have one theory of change or to think that there is strictly like one limiting factor and other factors don't matter. I think you definitely have to ask about comparative advantage. I think you have to understand that pushing on any of these things is still helpful. In terms of what is the constraint? So like funding, there's clearly a funding constraint if you have to start funding like large compute spends from within EA. Yann Tallinn is not part of EA per se, but like within the general, like strict AI safety mechanisms and organizations and sources that already existed. The costs of true AI safety, true AI alignment work get very high as we go forward because a lot of it's going to involve us spending a lot on compute. And also, it really should involve being willing to hire people to work on these problems with competitive salaries to what they could get doing on capabilities. It's like hundreds of thousands of dollars a year, or maybe even a million for a significant number of people. We want to be recruiting as a priority, the people who've worked on capabilities or would otherwise work on capabilities to come out of OpenAI and Anthropic and DeepMind, places like that, especially Meta, and come work for this new safety organization or shift over to a safety job or whatever. You have to pay for them both their salary and their compute, and that's millions of dollars a person, and that adds up pretty fast. On the other hand, there's no particular reason why we need to confine ourselves to traditional sources. When we do that, there are any number of foundations that have many, many more billions of dollars than the traditional foundations that we've used in the past for these things, and lots and lots of billionaires and multimillionaires who are legitimately very worried and ordinary people, and government sources are also potentially viable in the future. Corporations will often have an interest, including the big labs. So we shouldn't rule out any number of ways to get that. In terms of talent, I think that we are highly talent constrained for the right talent. I think we are not necessarily that talent constrained for generic undergraduate who wants to move to Berkeley for 6 months and think about AI safety. We are not particularly constrained for comp sci graduate out of Stanford who just like wants to work on something cool. But if we want people who have specific characteristics, those are not as easy to find. The characteristics we need, first of all, we need leadership. Leadership capability, ability to run teams, ability to lead efforts, be self-directed, self-driving, be able to engage in fundraising. Because like, and sometimes when you say you're funding constrained, that can mean fundraising constraint. It can mean the ability to signal to funders that you are worthy of funding constraint, that's a different form of funding constraint. These things are interestingly intertwined and it's complicated. So we also are very short on people who actually understand the problem and are prepared to pay the price to focus on hard problems and real solutions. So a number of people who, if you were to give them a competitive salary, would happily work on alignment-flavored problems that let them publish every 6 months, or that like just generally are easy in some important sense, but they don't actually speak to what it will take to get us all not killed very much. And it's probably better to do more of that than less of that if it's just literally yes or no, but it's orders of magnitude less important than the few people who will do like the actual things that matter. And so, you know, if you understand the Yudkowskyan difficulties lessons in some sense, and the nature of what problems you have to solve, or you have leadership capabilities and other things like that, or you just have like extensive real experience with machine learning systems that you can build as the, relatively speaking, 10x, 100x engineer, who's just that much better, who can like enable people to do real work in these ways. And if you're the type of person who can make a project fundable, especially by non-traditional sources, then you are extremely valuable in those ways. And it would be a major mistake to join an existing organization and try to make a difference as an individual, as opposed to trying to spearhead a new organization or at least a new branch of an existing major organization, depending on your skillset. If you are just a generic, like, I want my life to be straightforward where I am paid a salary to work on intellectual puzzles that are like not particularly impossibly difficult and do not require me to like take the weight of the world truly on my shoulders, blah, blah, blah. Then like, I'm not here to shame you. That just means that you're not particularly invaluable and that, like, it starts to be reasonable to do things like maybe I should, like, be a voice inside Anthropic. You just have to, like, be very sure that you will keep your eye on the ball and not be distracted with the capabilities.

Nathan Labenz: 2:30:09 I think mine is pretty consistent with that. I had in a phrase said research agendas seem to me to be the bottleneck. Maybe your framing is more like the PI, the person that can drive the research agenda. Obviously, those are closely related. That's basically what you're saying. It's like the credible plans that are in short supply.

Zvi Mowshowitz: 2:30:28 But it's not just credible plans because like, I can't just hand you a plan. Even if you are a really good machine learning person, I can't just hand you a piece of paper with a plan written on it and expect you to execute that plan. You have to appreciate the nature of the problem such that you can like implement that plan and modify that plan and pivot that plan and so on. But yes, we also just don't have like good attack vectors, like ways to get into the problem and like start to make progress on the problem. And that's a real problem as well. Like that's a huge deficit, but there also isn't the AI research agenda organization that just generates research agendas for people. I wish there was, but there isn't.

Zvi Mowshowitz: 2:30:28 But it's not just credible plans, because you can't just hand someone a plan. Even if you are a really good machine learning person, I can't just hand you a piece of paper with a plan written on it and expect you to execute that plan. You have to appreciate the nature of the problem such that you can implement that plan and modify that plan and pivot that plan and so on. But yes, we also just don't have good attack vectors, ways to get into the problem and start to make progress on the problem. And that's a real problem as well. That's a huge deficit, but there also isn't the AI research agenda organization that just generates research agendas for people. I wish there was, but there isn't.

Nathan Labenz: 2:31:16 So I think we're basically together there. In reading these grants, some of the ones that have jumped out to me the most as being the most no-brainer exciting are those where it's a really established, often like professor, who's leading a group and basically is like, I want to reorient or I want to do a significant part of my research focused on AI safety, and that may be new. It may have its own kind of unique spin on it. There was one in particular, which I won't name, but I kind of initially read the thing, and I was having a hard time deciding. I was like, this could be the kind of thing that's just insane. An insane person might send this or an actual game changer might send this. And it wasn't until I looked at the author and was like, oh, this person has an h-index or whatever of like 45 or something. I was like, oh, I'm into this then. So anyway, some of these ideas that even if the ideas can be extremely hard to assess if they're novel and coming from a credible source, that has stood out to me. There aren't that many of them, but that has stood out to me as a pretty exciting opportunity. Then there's a lot of policy stuff, and I find it hard to figure out what I should be thinking about that right now. It's obvious that per our earlier discussion on live players that regulators broadly are going to have some significant influence on how things go, even if they just do nothing. Obviously, doing nothing is a choice. But then if I think, okay, if I'm going to try to invest money today to influence those people, it starts to feel real hard. A general sense of how decisions get made in governments and regulatory bodies is kind of like, we wait for a crisis to come along, and then we look around and say, who has a plan? And then we use a plan that somebody had previously prepared. And now it seems like we're kind of entering the moment where not exactly that the crisis has come, but certainly the eye of Sauron has kind of turned toward this topic. And so people are now beginning to look around for plans. And some plans have been prepared by some organizations that were established years ago. Some of those are even credible enough that they probably are having influence now. But now I see a lot of people who are like, I want to start a new policy organization and I'm going to go to Washington and do something. And there I'm like, I don't know. It seems like everybody's kind of... is it too late to join in on what might be the world's largest ever game of tug of war? Are there things in policy that you think still have a high likelihood of making a difference? I'm a little bit at a loss about that, to be honest.

Zvi Mowshowitz: 2:34:14 Yeah, on the research organizations, I think it's pretty easy to go, does this person, what are they proposing to do? Does this seem vaguely credible as a person to do that thing? And then does this thing address the hard problems? Does this thing reflect an appreciation of the nature of the difficulty of the issues? Is this thing clearly not going to end up being capabilities? Is this thing potentially going to solve the hard problems? That's relatively straightforward and both of us are in a position where we can, to some extent, evaluate those questions because we have domain knowledge. You get into policy and yeah, it's very hard to tell. Ezra Klein makes the case pretty strongly, there's a room where it happens and a small number of people influence the room where it happens or are in the room where it happens, and you can be one of those people or you can help create one of those people. It doesn't make it obvious how to do that. Does it mean that your effort to do that will help you do that as opposed to backfire? It doesn't mean that more efforts to do that is better than less. All of this is very complicated and it doesn't tell you what you have to try and do once you get into that room or what you're trying to push for. So yeah, it's definitely tough. So I would say the big thing, and you don't even know what's happening right now, right? Anthropic, for example, may or may not be making effective big pushes behind the scenes to try and influence these rooms, and they may or may not have their eye on the right ball when they do so, but it's all going to happen in private, so we don't get to know. And they said that I wouldn't know, I wouldn't be able to talk about it. And the same thing goes for DeepMind, same thing goes for OpenAI. I mean, Altman's been pretty vocal and Dario just went down to Congress and spoke pretty publicly, but it's hard to say. I've been approached by two organizations. There's clearly going to be a window in the next few months, at least maybe in the next few years, where if you have the right proposals flushed out in the right form, getting to the right person, lying around, they might get picked up, they might actually happen. And so there's potentially very high leverage here. So I would say the first thing I look for in these policy proposals, in these policy organizations, is what is your policy goal? Because that's the biggest differentiator to me. Are you going to keep your eye laser focused on the correct ball where the correct ball is a system of compute regulation? A system whereby the biggest models require permissions, are under some form of restrictions and regulations and tests and in a way that would eventually lead to an outright limitation or halt? And are you going to do various forms of GPU tracking or lay the foundations for that in a way that will eventually allow you to in fact control who gets to do these kinds of very large runs? And if you're proposing anything that doesn't lead down that road, that might be useful for mundane utility purposes, but it won't save us. And I'm not interested in funding you if your policy isn't that or isn't something I haven't thought of that's new. I'm not open to there being things I haven't occurred to me, certainly.

Nathan Labenz: 2:37:33 What do you think about the liability angle? Or well, let's start with that. I mean, that because the kind of classic argument there would be you don't want to end up in the position of nuclear, where we have the worst things and not the best things, and a lot of people...

Zvi Mowshowitz: 2:37:52 The endurance to fucking insurance, right, from Tom Lehrer, right? "We all go together when we go," nuclear war. The insurance doesn't pay out. You're all dead.

Nathan Labenz: 2:38:01 Right, right, right. Okay. So certainly, yes. In the catastrophic scenario, insurance doesn't pay out. But do you think that... so you don't believe in the notion that a liability regime could be an effective incentive for...

Zvi Mowshowitz: 2:38:16 I think a liability regime with mandatory insurance makes a lot of sense for harms up to a certain level. Saying that if you want to use models that are sufficiently powerful, you have to find someone willing to sell you insurance against something going wrong. And then if you want to use an open source model, you have to have insurance against it going wrong, and if you can't make that work, then there are plenty of things that you can't make work in the United States, even though they look like they should be able to do them, and that's just how it goes. And maybe up to a point Microsoft can self-insure and then at some point they can't, and then they have to go out there and deal with these reinsurers or whatnot. That would help. Basically, you have these giant externalities, these giant negative tail risks that are very fat, they're potentially very, very big, and you want to make sure that people internalize those costs and work to minimize those costs in order to minimize their insurance and payout costs, and so these things could be helpful. They can also just simply weaken the economics behind pushing highly capable models. You don't really have to worry that much, relatively speaking, about the liabilities of a Character AI, because it's not dangerous. You know it's not dangerous. What's going to happen? Whereas some of these other things, they could cause a lot of harm potentially in the future, and you have to worry about that. The problem is, again, if you go down that road, I think it's probably not helpful, but how do you price existential risk? Because again, you can't actually hold anybody accountable for it when it happens. And so if you required somebody to actually buy insurance in some real sense for this, then you'd have to price it somehow, and then that makes a lot of sense. And then it's like, okay, there's a 1% chance you wipe out all of humanity, and the net present value of every person is $10 million. So $10 million times 8 billion. So can you buy insurance for that much? What's that times the percentage chance it happens times the premium? If you can't afford that, then you can't load your system. I mean, that's not a crazy way to go about doing things, but you have to actually notice the threat and price it for that to work. So I think my actual answer is I'm very much in favor of more strict liability for AI harms. I think I write about this for next week already, but I don't think it alone can accomplish the mission. I just think it's incrementally helpful. But also I want to be wary of places in which our legal system tends to award very oversized damages for harms that are not actually so big. And also where we have asymmetrical... I call this concept asymmetric justice, where you are fully liable, potentially far, far more than fully liable for all of the harms that you do. If I cause somebody $1,000 in damages by being negligent, the court might fine me $100,000 or a million dollars. Whereas if I provide that person $100,000 in value, I'd be lucky to get $1,000 of it because I'm being up against a bunch of competitors, people aren't that much willingness to pay. I pay $20 a month for GPT-4 and $0 for everything else. And I get, what, thousands, tens of thousands, maybe hundreds of thousands in value every month. So if you have to fully be liable for your harms, but you don't get to charge for your benefits, am I discouraging mundane utility far too much by doing that? And in fact, since liability is easier to enforce on mundane problems and harder to enforce on the big problems we actually want to guard against, are we just... is it actually just bad? Past a certain point. So I'd be, I want to be cautious with imposing too much liability. I think very strict, actual damages liability makes perfect sense though.

Zvi Mowshowitz: 2:38:16 I think a liability regime with mandatory insurance makes a lot of sense for harms up to a certain level. If you want to use models that are sufficiently powerful, you have to find someone willing to sell you insurance against something going wrong. If you want to use an open source model, you have to have insurance against it going wrong. If you can't make that work, there are plenty of things that you can't make work in the United States, even though they look like they should be able to do them, and that's just how it goes. Maybe up to a point Microsoft can self-insure, and then at some point they can't, and then they have to go out there and deal with these reinsurers or whatnot. That would help. Basically, you have these giant externalities, these giant negative tail risks that are very fat, they're potentially very big, and you want to make sure that people internalize those costs and work to minimize those costs in order to minimize their insurance and payout costs, and so these things could be helpful. They can also just simply weaken the economics behind pushing highly capable models. You don't really have to worry that much, relatively speaking, about the liabilities of a character AI, because it's not dangerous. You know it's not dangerous. What's going to happen? Whereas some of these other things, they could cause a lot of harm potentially in the future, and you have to worry about that. The problem is, again, if you go down that road, I think it's probably not helpful, but how do you price existential risk? Because again, you can't actually hold anybody accountable for it when it happens. If you required somebody to actually buy insurance in some real sense for this, then you'd have to price it somehow, and that makes a lot of sense. Okay, there's a 1% chance you wipe out all of humanity, and the net present value of every person is $10 million. So $10 million times 8 billion. So can you buy insurance for that much? What's that times the percentage chance it happens times the premium? If you can't afford that, then you can't load your system. I mean, that's not a crazy way to go about doing things, but you have to actually notice the threat and price it for that to work. So I think my actual answer is I'm very much in favor of more strict liability for AI harms. I think I write about this for next week already, but I don't think it alone can accomplish the mission. I think it's a net incrementally helpful thing. But also I want to be wary of places in which our legal system tends to award very oversized damages for harms that are not actually so big. And also where we have asymmetrical—I call this concept asymmetric justice—where you are fully liable, potentially far, far more than fully liable for all of the harms that you do. If I cause somebody $1,000 in damages by being negligent, the court might find me $100,000 or a million dollars. Whereas if I provide that person $100,000 in value, I'd be lucky to get $1,000 of it because I'm being up against a bunch of competitors, people aren't that much willingness to pay. I pay $20 a month for GPT-4 and $0 for everything else. And I get thousands, tens of thousands, maybe hundreds of thousands in value every month. So if you have to fully be liable for your harms, but you don't get to charge for your benefits, am I discouraging mundane utility far too much by doing that? And in fact, since liability is easier to enforce on mundane problems and harder to enforce on the big problems we actually want to guard against, are we just—is it actually just bad past a certain point? So I want to be cautious with imposing too much liability. I think very strict, actual damages liability makes perfect sense though.

Nathan Labenz: 2:42:12 So another category of thing that there's a number of organizations getting started right now is in the—and this ties a few threads together—it's kind of in the space of trying to be the third party evaluator or red team or independent safety review organization that the leading players in their White House and Frontier Model Forum commitments have committed to working with. It's kind of an interesting dynamic where it's almost like an advanced market commitment from these companies in some way because there aren't that many folks around right now who are prepared to provide a competent red teaming or model characterization or evaluation, whatever you want to call that service. But the companies have kind of said, hey, we will commit to working with them. Unclear if they're planning to pay for that or if they expected that to be charity funded. Certainly from what I'm seeing, the folks that are starting the organizations are seeking out some charity funds. I've been very excited about that. It seems like, first of all, it's great that they're making this commitment. Somebody's going to have to do that. As everybody who listens to this podcast for two seconds knows, I enjoy the fun and entertainment, and I think it's also valuable to do the red teaming. One experience I had this last week, though, sort of made me wonder about the theory of change there. I mean, I guess there could be multiple. One would be, because you have a good working relationship with the orgs, you're like, hey, we found these problems. This appears to be unsafe. You shouldn't release it yet. They listen to you. Okay? That could be simple. Another would be you kind of create these narrative-shaping examples kind of like what ARC did with the GPT-4 team, where that instance of the model lying to a person—and I think this was kind of prompted, but nevertheless, from the TaskRabbit user's point of view—the model lied to it about having a vision impairment as opposed to being an AI that needed help with the CAPTCHA. So that really caught the public's imagination and kind of changed, I think, to some nontrivial degree how people think about it. Certainly gets referenced a lot. I tried to do something like that this last week with this random AI tool that I came across that allows you to call anyone with any objective. And I just tried to have it call myself and make a ransom demand of myself. And I recorded it. And it was very easy to do. There was no jailbreak involved. Since then, the company has fixed the issue, by the way. So to give credit where it's due, they fixed it pretty quickly after I called them out. I did communicate with them privately, by the way. All this is documented on Twitter if you want to see my approach and my kind of thinking through should I disclose it publicly or not or whatever—a number of considerations went into that. One of them was that they just didn't respond to me when I reported it. And so I was like, well, if you're not going to respond, then I'll call you out publicly. Anyway, all of this leads up to me publishing this video of an AI with no jailbreak calling me and telling me that it has my child and it demands a ransom. And if I want to ensure the safety of the child, I will comply. And any deviation from instructions will put the child's life in immediate danger. Pretty flagrant stuff in my view. And it was kind of met with a bit of a yawn on Twitter. Certainly, it got some likes and whatever, but did it really start a serious conversation? No. The developer didn't respond in public at all as far as I can tell, really. They did go ahead and fix it, which is good. But the whole thing was kind of a non-event, and I was a little confused by that. I saw it makes me kind of come back to my theory of change on some of these evaluation characterization red teaming orgs. I wonder, are we all just numb already to these flagrant examples? There's been this notion for a long time that maybe if warning shots happen, then people will start to get more serious. And if you can go out and find these warning shots with red teaming and bring them to everybody's attention, then that could be really valuable. This week for me, it felt like I influenced the application developer because they did fix it. But otherwise, it seemed like kind of tree fell in the forest largely. So a lot of levels to that, but how do you think about that category of project and how it may or may not contribute?

Zvi Mowshowitz: 2:46:47 I made a prediction in Metaculus. And that prediction was when the time comes to test GPT-5, they will encounter a problem that if they had to precommit now, we would definitely agree would be a reason not to release it. And then they will gloss over or patch it or otherwise hand wave in its direction and release anyway. Basically not actually take their warnings sufficiently seriously. Not that I expect this to then end the world, to be clear. I expect this to then mostly be fine, but that we are not prepared to make real evaluations with real teeth that get really enforced, and that we're going to have to work on that quite a bit. And I think it's good these teams exist. I think we need more than one of them. I think you need at least three different teams working on different standards that think differently, that check for different things, and that then you get multiple evaluations before you release your model so that someone isn't just blind to something by accident. It's much more robust that way. And that working to develop more different red team strategies and more different tests and more different metrics and more different responses, especially in case one of them leaks for whatever reason, it gets into training data or something terrible might happen, it's very easy. Then it's quite useful. The danger of these things is one, if they don't listen. So what if you tell them the thing is dangerous? They might just engineer around it to fix the narrow issue without thinking about what the problem means. They might just ignore you entirely. They might try to fake the data to make you think that they'd solved the problem—a number of things are possible. They might use the evaluations as an excuse to treat the system evaluated as safe. This is always a problem with safety work, which is that the government says, okay, you have to do these 100 things to ensure your system is safe, and now the safety officer is focused on making sure these 100 things happen so you can release the system. They don't actually use common sense. They don't actually ask themselves, why might the system actually be dangerous? And you can tell a very easy story where technicians know this thing might actually kill everyone and everyone forces them to release it anyway because they pass all the safety tests, even though they know they didn't actually pass all the important safety tests because they're not on the list, because no system, however well-meant and worked on, will be able to anticipate all of the problems that come up in the future. There's just—you're going to have to do these things somewhat improvisationally. And will they move the goalposts? Will they be able to enforce the right standards, and will they test early and often enough for the right things? Because one thing you have to worry about is in the future, at some point, the training runs themselves become dangerous, potentially. And ARC didn't run its tests until after the training run was complete. They also didn't run it on the full capabilities of the final system and they didn't have fine-tuned capabilities and blah, blah, blah. They had many things they didn't have. So everyone agreed that ARC's first run on GPT-4 was just a trial run. Test out gear, see how it goes, wasn't meant to catch the real problems. No one thought that thing was actually going to kill everyone or anything, and it didn't. But we have to plan in these situations to red team as if this thing is going to be around for many years of improvements on what you can do with it in explorations, and the red teams have to be sufficiently enabled to identify the problems, and you have to be able to extrapolate from what the red teams were able to do in a short amount of time with a short amount of resources to what the public's going to be able to do with vastly more compute, vastly more attempts, vastly more resources, vastly more creativity, because no team of 20 people, however good at their jobs, can match the internet. That kind of thing, ever. So I think it's a good idea. I think that it's not a complete solution and never will be. And there's a danger of people treating it as one. You have to ask yourself, who is it competing against? How many—is this going to be one of the top however many people who run this thing? You want to have three viable organizations, don't necessarily need 30, that's probably not worth it. You want three to five. So is this person worthy of funding to do that? Just figuring out better metrics without necessarily being the one who runs the tests is also a useful thing.

Nathan Labenz: 2:51:16 So if I had to bottom line all that and summarize what I think your worldview is as a sort of elder recommender for this AI safety-focused grant-making process, I think I would summarize it as there need to be a few of these independent safety organizations. They seem to be either just started or kind of getting started now. So at least a few of those—one exists, a couple others are either just getting started or soon to be started, whatever. So there's kind of—that seems good because we need to bring that small group into existence in the first place. You have to have them. Second, on the policy side, I guess I would summarize you as saying it seems like it really matters, but it's really hard to predict who will have any impact and what kind of impact any effort will have. And so for me, I sort of maybe cash that out to probably worth continuing to support the organizations that are established enough that they already have credibility, because credibility or that blue and ball might give a shit what they think is probably the thing that matters. And to the degree they already have that, double down on it. And then everything else seems like it goes into good PIs that can drive a research agenda and have something that they want to do. And in that category, it's like don't even really worry too much about the exact details of the plan, but just look for people who have the originality of thought to be doing something a bit different perhaps and the sort of demonstrated capability to actually advance a research agenda?

Nathan Labenz: 2:51:16 So if I had to bottom line all that and summarize what I think your worldview is as a sort of elder recommender for this AI safety focused grant making process, I think I would say there need to be a few of these independent safety organizations. They seem to be either just started or kind of getting started now. So at least a few of those exist. A couple others are either just getting started or soon to be started. So that seems good because we need to bring that small group into existence in the first place. You have to have them. Second, on the policy side, I guess I would summarize you as saying it seems like it really matters, but it's really hard to predict who will have any impact and what kind of impact any effort will have. And so for me, I sort of maybe cash that out to probably worth continuing to support the organizations that are established enough that they already have credibility, because credibility or that Bluemanual might give a shit what they think is probably the thing that matters. And to the degree they already have that, double down on it. And then everything else seems like it goes into good PIs that can drive a research agenda and have something that they want to do. And in that category, it's like don't even really worry too much about the exact details of the plan, but just look for people who have the originality of thought to be doing something a bit different perhaps, and the sort of demonstrated capability to actually advance a research agenda.

Zvi Mowshowitz: 2:53:11 There's a lot of different things there. For PIs, I would say I'm not looking for exactly the right approach, but I am looking for assume alignment is hard. Does the approach accept that alignment is hard and do something that makes real progress if alignment is in fact very hard? Do these people show an appropriate caution towards they might advance capabilities, towards maybe I don't want to publish my results if I get a result that would be harmful to publish? Am I thinking about this problem with the right safety mindset, with the right paranoia, with the right appreciation of the fact that I'm up against impossible odds? And if the answer is yes, and I think they have the talent, then I'm excited even if I'm somewhat skeptical of the specific thing they intend to try in terms of whether it will work. Because I think that all of the most promising things are not that likely individually to work and it's going to be hard for me to evaluate the relative value of them. In terms of the lobbying organizations, I don't think it's crazy to start a new group at this point. I do think you want to look for something extraordinary. If somebody is like, why are they forming a new group now? Why does that make sense? But yeah, again, I'm looking for a focus on the policies that actually matter and on coordination amongst them and on a focus on actually making a difference. Most of politics, right, this is not about AI, is about politics in general, is about raising money from donors and sending signals of your loyalties and pumping up your status and raising awareness and other bullshit. All sides, right? You've got to focus on people who are writing bills, people who are lobbying directly for bills, people who are trying to influence the exact right people in the exact right ways, who have a concrete direct theory of change, who either understand DC or have connections with people who can help them understand DC. But I don't think we know that we are in the only critical window. We're going to need more organizations than we have. We're going to need far more people working on it than already have. I don't want to make the mistake of now that chips are down, the people who have nominally established some amount of formal credibility or authority now get all the resources and get to boss everybody around and do whatever they want. I think that's a common failure mode. I don't want to fall into it. Evaluation organizations, I want to ask myself, are these the right people to be doing this particular thing? Do they show promise in doing the thing? What do they bring to the table the other organizations don't bring to the table? I want to see something unique. And you have to convince me that you're capable of pulling this off, which includes convincing people to actually buy your services and use your services.

Nathan Labenz: 2:55:51 Where do you put mechanistic interpretability in there? And that seems to be part of what some of the evals orgs are also kind of including that in their plan. And then obviously, different research groups can approach that from any number of ways.

Zvi Mowshowitz: 2:56:07 My technical view of it is that it's more distinct from evaluations than that suggests, but that it's a good idea. Maybe interpretability is like Western civilization, it's a good idea. You should try to in fact figure out how these things work. You do have to be aware that you are advancing capabilities potentially when you do it. You have to think carefully about if you find the wrong thing. I would ask before I funded an interpretability organization, are you capable of going, oh, yikes, that's a dangerous thing to learn. I might not want to rush out to tell the world about that. I might want to think carefully about who to tell and who not to tell. Not necessarily don't tell anybody, but you have to process information carefully and not just rush. You don't want a culture of everything I ever find is going to be automatically just shared with the world for that reason when you work on mechanistic interpretability. But I do think in general, it's a very positive thing to work on. I do think that it's a thing that holds a lot of promise to help us in various ways. And it could lead somewhere where we start trying to solve problems with it potentially, in theory. It's just a very hard problem that requires a lot of work and a lot of compute and it's not going to be fast and it's not going to be simple. And we want a lot of people to work on it in parallel. So I certainly intend to assist with someone on that.

Nathan Labenz: 2:57:35 Cool. Well, believe it or not, we did not get everything even on my outline, let alone everything that you have covered on your blog, which has been probably 10 times as many topics. So folks will have to get the written version. Zvi Mowshowitz, thank you for being part of the Cognitive Revolution.

Zvi Mowshowitz: 2:57:55 Alright. Bye.

Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools

Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools

Zvi's Mic Works! Recursive Self-Improvement, Live Player Analysis, Anthropic vs DoW + More!

OpenAI, Anthropic, and Meta | Analyzing the AI Frontier with Zvi

Watch Episode Here

Video Description

Full Transcript

Transcript

Nathan Labenz

Read next