Watch Episode Here

Read Episode Description

Nathan Labenz dives in with Jaan Tallinn, a technologist, entrepreneur (Kazaa, Skype), and investor (DeepMind and more) whose unique life journey has intersected with some of the most important social and technological events of our collective lifetime. Jaan has since invested in nearly 180 startups, including dozens of AI application layer companies and some half dozen startup labs that focus on fundamental AI research, all in an effort to support the teams that he believes most likely to lead us to AI safety, and to have a seat at the table at organizations that he worries might take on too much risk. He's also founded several philanthropic nonprofits, including the Future of Life Institute, which recently published the open letter calling for a six-month pause on training new AI systems. In this discussion, we focused on:
- The current state of AI development and safety
- Jaan's expectations for possible economic transformation
- What catastrophic failure modes worry him most in the near term
- How big of a bullet we dodged with the training of GPT-4
- Which organizations really matter for immediate-term pause purposes
- How AI race dynamics are likely to evolve over the next couple of years

Also, check out the debut of co-host Erik's new long-form interview podcast Upstream, whose guests in the first three episodes were Ezra Klein, Balaji Srinivasan, and Marc Andreessen. This coming season will feature interviews with David Sacks, Katherine Boyle, and more. Subscribe here: https://www.youtube.com/@UpstreamwithErikTorenberg

LINKS REFERENCED IN THE EPISODE:
Future of Life's open letter: https://futureoflife.org/open-letter/pause-giant-ai-experiments/
Eliezer Yudkowsky's TIME article: https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/
Daniela and Dario Amodei Podcast: https://podcasts.apple.com/ie/podcast/daniela-and-dario-amodei-on-anthropic/id1170991978?i=1000552976406
Zvi on the pause: https://thezvi.substack.com/p/on-the-fli-ai-risk-open-letter

TIMESTAMPS:
(0:00) Episode Preview
(1:30) Jaan's impressive entrepreneurial career and his role in the recent AI Open Letter
(3:26) AI safety and Future of Life Institute
(6:55) Jaan's first meeting with Eliezer Yudkowsky and the founding of the Future of Life Institute
(13:00) Future of AI evolution
(15:55) Sponsor: Omneky
(17:20) Jaan's investments in AI companies
(24:22) The emerging danger paradigm
(28:10) Economic transformation with AI
(33:48) AI supervising itself
(35:23) Language models and validation
(40:06) Evolution, useful heuristics, and lack of insight into selection process
(43:13) Current estimate for life-ending catastrophe
(46:09) Inverse scaling law
(54:20) Our luck given the softness of language models
(56:24) Future of Language Models
(1:01:00) The Moore’s law of mad science
(1:03:02) GPT-5 type project
(1:09:00) The AI race dynamics
(1:11:00) AI alignment with the latest models
(1:14:31) AI research investment and safety
(1:21:00) What a six month pause buys us
(1:27:01) AI’s Turing Test Passing
(1:29:33) AI safety and risk
(1:33:18) Responsible AI development.
(1:41:20) Neuralink implant technology

TWITTER:
@CogRev_Podcast
@labenz (Nathan)
@eriktorenberg (Erik)

Thank you Omneky for sponsoring The Cognitive Revolution. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work, customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.

More show notes and reading material released in our Substack: https://cognitiverevolution.substack.com/

Music Credit: OpenAI's Jukebox

Full Transcript

Transcript

Jaan Tallinn: (0:00) There is a lot of reasonable discussion that happens in the labs and even between the racing labs about the need to coordinate, the need to be careful, and people have been public about it. But there's always one missing component: when. If there is this pause and associated realization that these experiments are considered too reckless by society, this will create some incentive gradient for the companies themselves to figure out how to make them in a more responsible manner and a more legible manner. But if we survive, the life of the world, the universe, could be potentially unfathomably better than it is now. So in a sense, we are living a lottery ticket, and it is in some way in our control to improve the odds. And that's what I'm doing.

Nathan Labenz: (0:52) Hello, and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas, and together we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my cohost, Erik Torenberg.

Erik Torenberg: (1:15) Before we dive into the Cognitive Revolution, I want to tell you about my new interview show, Upstream. Upstream is where I go deeper with some of the world's most interesting thinkers to map the constellation of ideas that matter. On the first season of Upstream, you'll hear from Marc Andreessen, David Sachs, Balaji, Ezra Klein, Joe Lonsdale, and more. Make sure to subscribe and check out the first episode with a16z's Marc Andreessen. The link is in the description.

Nathan Labenz: (1:44) Hi, everyone. Our guest today is Jaan Tallinn. Jaan is a technologist, entrepreneur, and investor whose unique life journey has intersected with some of the most important social and technological events of our collective lifetime. Born in 1972 in then-Soviet Estonia, Jaan was 17 years old when the Berlin Wall fell, and he quickly became a video game entrepreneur. Years later, he created Kazaa, the famous peer-to-peer file sharing platform that, at its peak, accounted for half of all internet traffic. From there, he went on to cofound Skype, which eventually sold to eBay in 2005 for $2.5 billion and for years remained the most successful internet company founded outside of the United States. Circa 2009, Jaan came across Eliezer Yudkowsky's AI risk writing, which he found extremely persuasive and which inspired him to dedicate his time, resources, and personal credibility to existential risk mitigation with a particular focus on AI. Jaan has since invested in nearly 180 startups, including dozens of AI application layer companies and some half dozen startup labs that focus on fundamental AI research. Those include DeepMind, Anthropic, and most recently, Conjecture. He's done all this in an effort to support the teams that he believes are most likely to lead us to AI safety and to have a seat at the table at organizations that he worries might take on too much risk. He's also founded several philanthropic nonprofits, including the Future of Life Institute, which recently published the open letter calling for a six-month pause on the development of AI systems more powerful than GPT-4. With so much happening in AI right now, I decided to touch on Jaan's personal story and to discuss Eliezer's baseline AI safety worldview only briefly in the first part of today's conversation. Instead, we focused on the current state of AI development and safety, including Jaan's expectations for possible economic transformation, what catastrophic failure modes worry him most in the near term, how likely he believes next-generation systems like GPT-5 are to literally end the world, how big of a bullet we dodged with the training of GPT-4, whether in some sense we are lucky that language models are softer and slower than alternative AI paradigms, which organizations really matter for immediate-term pause purposes, to what extent those organizations are currently coordinating or slowing down already, how AI race dynamics are likely to evolve over the next couple of years, what Jaan and his team hoped to accomplish by calling for a six-month pause, and finally, how it's gone and how he's feeling about it all now. If nothing else, I hope this conversation makes it clear that the pausers are not merely Luddites who have never built and don't understand technology. On the contrary, Jaan's personal achievements, world-class investment portfolio, and evident optimism for an AI-enabled future—should we manage to build one safely—show that at least some of our most sophisticated and accomplished thinkers take existential risks from AI extremely seriously. With that, I hope you enjoy this conversation with Jaan Tallinn. Jaan Tallinn, welcome to the Cognitive Revolution.

Jaan Tallinn: (5:17) Thanks for having me.

Nathan Labenz: (5:19) Really excited to have you. You have been a quiet but major player in the development of AI over the last 10 or so years now. And I want to give people just a very quick overview of who you are and the role you've played, and then jump to the future, which is the present, and talk about all the things that have happened in the last few months as well as the call that you recently participated in putting out as part of the Future of Life Institute to call for this six-month pause in the development of large-scale models. So a lot to cover. The world is moving faster than ever, it seems. But maybe just give us a little bit of an intro to yourself as an investor in AI companies. You can tell a little bit if you want about the story of how you came to be in position to invest in AI companies, but I'm really super interested in how you have managed to become an investor in so many leading companies and the philosophy that supports that.

Jaan Tallinn: (6:21) So I'll skip over the period of becoming an entrepreneur, running my own games company, then getting into development of peer-to-peer technology that culminated with Skype. And then at the end of my Skype career, I stumbled upon Eliezer Yudkowsky's writings and thought, "Holy hell. What is the world that I've been born into?" I had a meeting with Eliezer almost exactly 14 years ago, where I tried to poke at his arguments and didn't find any holes. And then I thought, "Okay, how can I help?" I sent them some money, but I think more importantly, I started taking those arguments, turning around, and presenting them to people who would want to have some brand behind the person who is making the arguments. And that's how basically the Cambridge Centre for the Study of Existential Risk got started, where I convinced my cofounder Hugh Price that these topics are important. And Max Tegmark, I think he already was very primed to these arguments, but that's how the Future of Life Institute got started. And the other strategy that I deployed was, "Okay, I already was a bit of an investor." And I thought that perhaps I could use my brand to get a foot in the door in various companies developing potentially dangerous things. So I did invest in a bunch of AI companies just to—I mean, I always had this dilemma of not wanting to directly accelerate them. So I tried to not be a majority investor or anything, but just enough to have a voice. With DeepMind I actually had to walk up to Demis at a conference and that's how we started talking and eventually became friends. I still catch up with him every second time I'm in London or so. But once I was already an investor in DeepMind and eventually a board member, getting into other AI companies became easier. So it's been working my way up, so to speak.

Nathan Labenz: (8:52) Yeah. I think a lot of VCs would be extremely envious of your deal flow. So I want to get back to that a little bit more in a second, but let's just go to the Eliezer moment for a second. So you said this was 14 years ago. So this takes us back to circa 2009. At the time, the deep learning revolution hasn't even really started yet. It's at that point highly—well, you may object to this, but I would say for me, I read it as a highly speculative yet very compelling thought about what might happen. The arguments had a lot of detail to be filled in where it was, "Well, we have this insane amount of compute, and we're probably going to figure out how to use it. And then that probably goes very bad for us." So how did you understand or what do you think is the strongest version of that original argument? And then what have been the biggest changes to that worldview in the intervening time?

Jaan Tallinn: (9:52) Yeah. There are many ways to frame the problem. Sometimes I've been asking people two questions: A, can you program? And B, do you have children? And then I get four different framings or approaches I can use to explain the situation with AI. One simple argument is that there is a reason why chimpanzees are not determining the future and haven't been determining the future for a long time, if ever. And humans are, but perhaps not for long because we are working furiously to get rid of that advantage that we have as the apex species on this planet. Once you realize that AI will likely not stop at human level, there is this unfortunate narrative, especially widespread in Asia, where a lot of people think that we are going to make AI smarter and smarter up to the point where it becomes conscious and then it's just like us. Then it's just like other people and we need to integrate them, give it voting rights and whatnot. Whereas I think this is just a completely illusionary tale. It will probably not be conscious, it will just be very competent. And competence and consciousness—they might be related somehow, but probably not. So we will just have control over the future yanked from our hands. That's, I think, a compelling enough story for me.

Nathan Labenz: (11:42) Yeah. So the linchpin there is we're the boss of the world because we're the smartest thing around. And if we change that, there's a pretty good chance that we may not be the boss of the world anymore. And not only that, but at this point, as things are starting to come online, we really don't have a great understanding of what the new boss would look like or what it might want or even how to conceptualize things like "want" in the context of its internal workings. So anything you would object to in my very brief extension? And then how has your mindset shifted also from the purely theoretical—largely purely theoretical—2009 Eliezer arguments versus today where we're in this world of large language models, obviously, but also increasingly multimodal large language models and agent-style systems like AutoGPT that can do all sorts of things. How has the actual development of the technology changed how you think about it?

Jaan Tallinn: (12:47) Yeah. So many things to say about that. I mean, first of all, I just agree with the way you phrased things. Sometimes I've been saying that we are seeing the tail end, possibly last years, of something like a 100,000-year period during which humans were the boss on this planet. And it could be even more extreme. It's unclear if evolution will continue, if self-replicators will continue once you have AI just completely taking the solar system down to the atom levels and rebuilding it and the rest of the universe. So it might even be the tail end of a 4-billion-year period. How has my thinking changed? There has been this abstract argument that if we just continue on this trend, we're accelerating towards a cliff. And I think the current situation is that we seem to be starting to see the shape of the cliff through the fog. It's possible that it is still a mirage and a false alarm, and things will level out and we'll need some new paradigms. The current situation is it seems more likely than not that this is it. When it comes to general trend, I think it has been very unfortunate in AI research with some silver linings. The unfortunate trend has been that we have gone from more transparent, more understandable paradigms to less and less understandable paradigms. We went from things like expert systems, which by definition were super understandable. People were just interviewing experts and trying to hand-code the rules by which experts are making decisions into machines. And that was the 80s. It was a really big thing in the 80s. Then we went to supervised learning where people were just labeling data in different domains, trying to distinguish numbers. And this is where the deep learning started to shine first. And now we are in unsupervised learning. We don't even care much about what data we throw. We just throw a lot of data at AI and ask it to just figure it out—what kind of universe you are in, what kind of heuristics you should apply, what kind of skills you need to learn in order to predict the next token. And I call it the "summon and tame" paradigm, which is you just use these multi-hundred-million-dollar experiments to summon an uncontrollable mind. And then you look at what it looks like and try to tame it. And this works if the mind is not very powerful, but it might not work for very long.

Erik Torenberg: (15:51) Omneky uses generative AI to enable you to launch hundreds of thousands of ad iterations that actually work, customized across all platforms with a click of a button. I believe in Omneky so much that I invested in it, and I recommend you use it too. Use CogRev to get a 10% discount.

Nathan Labenz: (16:09) It is wild to think about. I have a lot of different follow-up questions I want to ask you as well, but let's go back and just touch on the investment side for a second because I think this will help people understand the point of view that you have. I mean, it started with a series of blog posts in 2009, but now you're really quite the AI insider. You did a recent interview where you ran through your investment portfolio in more detail, but I thought it was interesting how you split it into two categories. One being the fundamental AI research type company that you've invested in. I believe there's about half a dozen of those. And then there are the application layer companies, and it sounds like there are dozens, maybe 50 plus of those. It seems like the big research companies would be the ones that would give you more insight into what's going on and what matters most right now, but maybe that's wrong. So could you give a quick runthrough of some of the highlights of the portfolio and we can get a sense from that of all the different angles that you have on AI today?

Jaan Tallinn: (17:18) Yeah. I'm actually not the best person to talk about my investments because I have mostly delegated that to a team of a few people. I still make the final decisions, but my focus really is philanthropy. But yeah, when it comes to investments, the fundamental research AI research companies, I specifically invested not to make money, but to have some kind of influence over what's happening inside those companies. They are in some ways adjacent to my philanthropy. And when it comes to applied AI, I think the prospects of applied AI companies are much worse now than they used to be before this large language model paradigm. But of course, the LLM paradigm is very new. So there was no way to know that 5 or 10 years ago. But currently, I think we have had this discussion with you that the big problem with trying to build an applied AI company using the LLM paradigm is that you have to be ready for the rug being pulled out from your next 6 months to a year of work by the next generation of LLM, which is a new crop of capabilities that have been bred. So in a way, the more domain specific the AI competence, the more value there is in building application layers around this competence. Whereas if you just get these increasingly generally competent minds, it's much harder to build applications in a stable way.

Nathan Labenz: (19:09) Yeah. One of the interesting things about doing this show and talking to all the people that we have is, not to spoil one of our closing questions, but we often ask what AI products people are using today that they recommend to the audience. And I have been really amazed by how few different answers we've heard. Probably two thirds of people have said, well, basically, just use ChatGPT. That's it. You know? We get a couple other mentions, but it has led me to believe that the application layer faces some very serious challenges. And it reminds me of other hyperscaling platforms that we've seen over the last couple decades where you build around the edges of them, but the monopolist power is just so big. I do want to ask a little bit more about competing trends between centralization and decentralization because I don't think it's obvious at all that it plays out as it did for Google and as it did for Facebook this time around. Well, let's just cover the flagship, maybe that's the wrong word, but the fundamental research company investments. DeepMind was the first. I know that you're also an investor in Anthropic and have supported Ought. I don't know if that's an investment or if that's just a donation. Conjecture is on that list. Who am I missing on the list? And I'm also really interested in terms of the conversations that you've had with founders. As somebody, you know, just given your statement, I'm sure you said the same to them. I'm not really doing this to make money. I'm doing it because I want to have your ear in case something important comes up. How do people react to that? Do they say, yeah, that's great. I want you to be in that position to have my ear, or are people sort of like, I don't know what to make of you? Maybe you're only investing in aligned people.

Jaan Tallinn: (20:59) Yeah. I found it, in general, my pitch as an investor to deep tech companies is that, look, I'm investing my own money. I don't have a boss. And I have a sizeable philanthropic operation. If I can do good by walking away from profits, I can do that in a way that VCs, at least for them, it's harder to do that because they manage other people's money. And for them it's their LPs are their bosses. So A) I will be on the side of founders if they think that there's some, if they feel uneasy. I'm not going to push you to take this defense contract or whatnot. And this usually goes down pretty well with founders because, yeah, it's true.

Nathan Labenz: (21:51) So who am I missing on the list? We got DeepMind, Anthropic, Conjecture, Ought. Who else would you put in that fundamentals bucket?

Jaan Tallinn: (21:58) Ought, I'm actually not an investor in. I have sent some philanthropic money their way, though. So yeah, I mean, Vicarious was a longtime investment around the same time as DeepMind and a few other AGI groups that are as well known, like Curious.ai, for example. There is Improbable.ai, if I remember correctly, in the UK. Yeah, it's just I have 180 investments or something like that. So I don't quickly recall all the names. But yeah, Conjecture. I think very highly of Conjecture. In fact, whenever I go to London, I try to hang out in their office because they seem to be a company, a group, that has the highest respect for AI in a sense that this could be really dangerous. And the danger is the important part here to focus on rather than whatever exciting commercial contracts we can squeeze out of it or something.

Nathan Labenz: (23:02) So let's talk about that kind of emerging paradigm of danger. I mean, this has obviously been all over the discourse lately with the pause letter and Eliezer's Time piece. And I think broadly speaking, the public is extremely confused because on the one hand, we have Eliezer. And then on the other extreme, we still have people routinely saying this is all just hype, it'll never amount to anything, which seems crazy to me at this point. Almost self-evidently, this is a big deal. But that is still out there. And, you know, for folks like my parents who don't rush to try ChatGPT, they're just hearing all these different messages from the media, and it's all just very confusing. So let's start with maybe the neutral or ideally even positive side. People are throwing around AGI all over the place. A lot of disagreement or probably mostly implicit disagreement on what does that even mean. Maybe we could just start with what do you think AI is going to do for us in daily life? We'll then extend to the dangers that it can pose. But what is your expectation for how AI is going to impact our lives over the next few years?

Jaan Tallinn: (24:22) I think it's really dependent on how capable the planet is in constraining the large scale experiments. Because if it turns out that we can't constrain them and slow them down, then we're just going to die. That's my fairly confident prediction. If we can pause, yeah, then a lot of interesting questions come up because the GPT level crop of AI systems will continue improving even if you don't do new generational experiments, breeding experiments. Then even those could be super disruptive. For example, I wouldn't want to be an art student in the year 2023 because it's possible that the skills that you're learning can be pivoted into something that there will be societal demand for. But the answer could also be, nope. There won't be any demand for your skills.

Nathan Labenz: (25:29) I personally see that extending to a great many domains. We just did a little episode on the possibilities for economic transformation. And one of the things I'm trying to help people understand is I feel like right now we are in this perfect little happy zone. You could call it the Goldilocks time. After the, I don't know if you know the Goldilocks story, but this feels like the level of AI power that is just right perhaps in that 90th percentile on the American Bar Exam. That's a really strong showing. And, you know, that's base model GPT-4 capability. Right? When you imagine what that can start to power when it is fine-tuned, when it is integrated with other systems, you know, when it's able to take advantage of its ability, which we've seen demonstrated to use tools, you know, and that's not yet broadly deployed, but it certainly has been, I think, compellingly demonstrated. Then you add on to that, you know, an even bigger context window that very few have seen. And then on top of that, you've got the multimodal stuff. These models, the latest models, will certainly be able to browse around the Internet and understand websites and navigate and take actions online. It feels like that is enough on the positive side to create transformation, really. Economic transformation is my baseline scenario at this point. And, you know, we're just at the beginning of the engineering phase of that, the deployment phase, the social figuring out of how it's all going to integrate. And it feels like that could be really amazing. And yet, at the same time, it seems like still pretty safe to say that it's limited enough in power that it won't become, you know, an out of control problem at this level. So I think that is, you know, one of the things that frustrates me most is when people who focus on AI risk also dismiss the power because I'm like, you're undermining your own message there. If you dismiss what it can do, then nobody's going to worry about what you are worried about, you know, that it might do. So let's be very clear on just how capable the systems are. So I like your comment about Conjecture having the highest respect for AI. I think that's something I try to cultivate in myself as well. Do you see it any differently than me? Does it feel to you like what we have is enough for economic transformation? Or where do you think we are in that?

Jaan Tallinn: (28:11) I have a lot of confusion about how economy works in the first place because I know that there are jobs whose main purpose is to make the boss feel more important. I don't think these jobs are very vulnerable to AI disruption because the boss would feel less important if that underling would be replaced by AI. But I don't know how typical that kind of job is in the human economy. And also, Teresa Keikiel Eser has pointed out that they just don't expect any changes from AI before we all die because the rules and regulations in the economy have constrained everything to the degree where you just can't have innovation that is going to leave a significant mark on the GDP or have big changes in, I don't know, construction or something like that. Perhaps that's wrong. But I have significant uncertainty about this. I definitely wouldn't be confident that we're going to get massive economic disruption from the current crop of AIs. But it's very plausible that we would, yes.

Nathan Labenz: (29:28) Yeah. I think what you said about how much time there is for the transformation to play out definitely makes sense to me. I've started counting time since the official release of GPT-4, so we're at 4 weeks and 1 day into the GPT-4 era as of today. And I do think it's really worth reminding ourselves and grounding ourselves in the fact that no previous system that the public had any access to could really do the sorts of high value tasks that GPT-4 can just do. We're literally, there's been a lot of growing awareness. There's been interesting use cases. There's been copywriting assistants that have made a lot of money. But there was not an AI on the market until a month ago that had any plausible chance of giving you quality legal advice or quality medical advice. And now that is there. And again, we're just so early in starting to figure out how to use it. So it does seem like that takes a little while, kind of unavoidably. And I just want to remind the listening audience more so than you that this window has just opened. And we have no idea what's about to start coming through it economically, let alone in terms of alien AI overlords. So turning then to the things that you worry about. I think this model of AI strength proceeding through what appears to be a smooth loss curve, but what actually seems to be happening under the hood is all these little thresholds of unlocking different discrete capabilities being passed one by one and all of that aggregating. Yes. I love that paper. All of that aggregating to a smooth curve, but actually being all these little discrete bits. I think that's a really helpful frame. But I want to ask you, what are the things that you most worry about? If you could try to make this somewhat vivid for people, what are the big thresholds that you're like, man, I don't know when, but an AI crosses that threshold and we're in real trouble. What are those, and how does that play out in your mind?

Jaan Tallinn: (31:54) So it's possible that there are many such thresholds that we should be worried about. One kind of neutral frame to describe what AI is is that it's an automated decision making machine that is, A, non-human, and B, it is getting increasingly competent by the day. As every leader knows, whenever you're delegating something, you're also giving up some control over the outcome. So with that frame, there could be many domains where in order to remain in charge of what happens next, we should not delegate it to non-humans. So we should have, as they call it, human in the loop. But the most obvious one that I can think of where we are already rushing to delegate things away is AI development. So once you have AI LLMs that are able to develop AIs better than any human researcher can, then basically we have the most capable systems on this planet appearing without any human help and possibly without any human consultation. And then basically, good luck humanity.

Nathan Labenz: (33:13) Yeah. That's interesting. I thought you were maybe going to say the deception threshold, which is one I hear thrown around most. I mean, it's funny. It's striking for one thing that that's kind of OpenAI's explicit plan. They're fairly high level, I would say, plan for AI alignment involves ultimately having AIs supervise themselves and refine the dataset and hopefully bootstrap into something good. That has never really reassured me that much either. Anthropic also doing something.

Jaan Tallinn: (33:50) Yeah. Constitutional AI. That is a little more specific. Although this is AI constraining AIs rather than AI developing next generations of AIs. I think it's important to distinguish between those two frames.

Nathan Labenz: (34:04) One big threshold is AI designing and training the next generation of AIs. Pretty hopefully intuitive to see for people how that becomes potentially a runaway problem that we don't have great control over. The deception threshold, the outer-inner alignment mismatch seems like one that a lot of people worry about just as much. Any personal thoughts on that one that you want to share?

Jaan Tallinn: (34:33) Yeah, I think one thing that I and the alignment community has learned over the last decade is that the shape of the alignment problem has become much clearer. For example, indeed, this inner-outer alignment dichotomy is something that, at least myself, I had no idea about. Just this idea that the deep learning paradigm, I would say the machine learning paradigm in general, is training AIs by picking essentially random minds out from behavioral classes. So you're not selecting AIs based on what they want. You're selecting AIs based on how they behave. And there could be many, many motivational structures behind giving a certain particular behavior. The most scary one is basically, yeah, realizing that it is being trained and then just acting out the goal that you're training it for in order to be selected and eventually escape the box.

Nathan Labenz: (35:50) Yeah, I think that one is hard to get around too, just from the simple observation that we're not super reliable. Anybody who's spent a significant amount of time trying to validate language model output, even just for a relatively run of the mill application, I've done this at Waymark. Right? We're making marketing video content for small businesses. And really all the stuff we create with language models, the main thing is write a script for a short commercial for a small business. So it's a pretty narrow domain of space that we need to evaluate, and yet it remains a real challenge to figure out, is this model better than this one? Or we do a fine tune. How does it compare to the last fine tune? You're getting all these different outputs, and it's just tough. There's, the distributions are overlapping. The rate at which the new model is preferred to the previous one is often fairly low. I've seen published results as low as an 11 to 9 ratio, where one is preferred to the other. Even GPT-4 to 3.5 is just 70-30 in terms of preference. So still a full third of the time people prefer 3.5 in a head to head comparison, which kind of blows my mind given how qualitatively better it seems GPT-4 is. So that's just the general problem of validation. But then you add into that mix that we have all these heuristics and biases that are exploitable. We have these cognitive gaps that have lingered in our own systems. And evolution never had a real reason to eliminate all of them or hasn't gotten around to it yet. And so we're exploitable. Right? And everybody kind of knows that in our daily life. We know that people, at a minimum, will tell us little white lies to make us feel good or just to get through a situation a little bit easier. Do you see any promising route to avoiding that sort of exploitable evaluator problem?

Jaan Tallinn: (38:04) I mean, the answer is no. But it's very much an open research question. So on a theoretical level, indeed, you would want to somehow hitch a ride on the increasing capabilities of AI when it comes to somehow making it more reliable or more constrained, more predictable in general. I hesitate to say more aligned because my model of Eliezer Yudkowsky was like, no, no, no, you point AI towards alignment. That's just a silly thing to do. But yeah, AIs are going to get more capable. Can we somehow get something out of it that is scalable rather than ending in a predictably bad place?

Nathan Labenz: (38:59) It might be worth just spending a little bit more time too on, again, just how these things might play out. I think Eliezer has spoken very interestingly, compellingly about what happens when you go outside of your distribution of training. And for humans, he just points out that basically everything in nature is optimized for reproduction, inclusive genetic fitness. And yet, the behavior that we observe in ourselves is not at all, in the modern environment does not appear to be about maximizing our reproduction. And in fact, we didn't even know that that's what we had been optimized for until relatively recently. So we were out here doing whatever we're doing. It took a few random geniuses to figure out how we had actually been created by nature, and that has had relatively little impact on what anyone has actually done in their day to day lives. So would you add anything to that story or observation?

Jaan Tallinn: (40:06) Yeah, I mean, just to be more precise, I think we are selected for

Jaan Tallinn: (40:14) inclusive genetic fitness, our ability to reproduce. And again, because of the same problem that machine learning faces, that we can only select based on behavior or based on results, that selection effectively pulls in a random instantiation of capabilities and motivations that just happen to give you this particular behavior without having any fine grained control over what these motivations and capabilities actually are. So yes, evolution ended up pulling us, selecting us in this ancestral environment where we had developed a bunch of heuristics that were very useful for reproduction in that ancestral environment but much less so in the modern environment without ever ingraining in us any fundamental understanding of what we're being selected for. So the very same process might just replay when it comes to, the very same process might get replayed as we are selecting AIs based on behavior and without any insight on the inner workings of them.

Nathan Labenz: (41:41) So we could spend hours unpacking all this. I know you've done that many times. So we will bracket that for the moment. We've got all these different failure modes. We've got potentially kind of runaway AI training its own successors in a way that is not clear to us. We've got the deception problem. We've got the fact that we have no reassurance or no reason to believe really at all that the goals that we have for AI will be represented internally. And so with a sudden jump in the domain in which the AI can operate, it can be totally outside of training distribution and who knows how it might act, just like who would have expected how humans might have acted from the ancestral environment. So all these things are pretty big conceptual problems. We don't have good answers to them at the moment. What do you think that, how does that boil down to a simple worldview for you? What are the odds that you see right now of serious catastrophe happening in, say, the next 2, 5, 10 years? And maybe we could segment that into given the trajectory that we're on versus what we might be able to do if, for example, we took a pause.

Jaan Tallinn: (43:11) Yeah. My current estimate for life ending disaster is basically 1 to 50% per generation, per 10x-ing of compute that's being thrown at these experiments. Currently, there are 10x-ings happening in a 6 to 18 month window. So you can calculate from there. I mean, at some point, we're going to run out of compute because there's only so many 10x-ings you can do. So you can't, maybe probably can't do thousands of those. But still, let's say something like a geometric mean of 1 to 50% is 7%. So with 7% risk to everything, if you continue doing those, we probably can still do something like 5 or 6 of those. And at that point, we are more likely dead than not.

Nathan Labenz: (44:10) Worth taking a second to just let that sink in. Would you have put that same estimate on GPT-4? Do you think we just survived a 7 percent X-risk event with the training of GPT-4?

Jaan Tallinn: (44:25) That is a great question. With hindsight, I'm super anchored now. Right? So I really want to say no. But again, the range is, 7% is this point estimate. Really my uncertainty range is from 1% to 50%. And so the interesting question is, would I put less than 1% confidence in GPT-4 not destroying everything? And probably not. So I think, yeah, I think it's unreasonable to have, at least given the things that I knew with GPT-3 and everything else and things that I didn't know, like having a very close look at what's happening at GPT-4 training, then yeah, I think it would have been unreasonable for me to be less than 1%, confident in less than 1% doom from GPT-4.

Nathan Labenz: (45:28) Honestly, I can't really argue with you there. When I got my first look at it, it had already finished pre-training and initial reinforcement learning. This was 6 months ago when they finished the first version before any of the safety work. Obviously, there was the whole red teaming effort and everything else. It definitely hit me pretty hard. Wow, this is a significant leap. And now you look at all the papers that have come out characterizing it in the wake of the official release. The thing that I keep coming back to is we have these smooth curves, but then on individual behaviors, you have these sudden jumps. The one they published in the technical report, which isn't such a big deal, obviously, but maybe indicative of things that could happen in the future on more problematic dimensions, is the hindsight bias failure where it had previously been observed, I think by Anthropic, that bigger models suffered more from hindsight bias. So it was an example of an inverse scaling law where the behavior is getting worse with bigger models. And then all of a sudden with GPT-4, that problem is totally fixed, and there is no hindsight bias. It basically just scores a 100%, perfection on those hindsight bias problems, which, by the way, are basically scenarios where you had a good bet available to you. You took the good bet and you lost in an unlikely way. And so the question then is should you have taken the bet? And people might say, in hindsight bias would be, well, no, if I lost, then I shouldn't have done it. When in reality, you actually had all good reasons to do it. So 3.5 was actually getting this wrong more often than 3 and more often than some smaller models. But then again, boom. Somehow, some unlock has happened in the course of training, and it probably was never registered on the smooth loss curve, which mostly looks smooth. But all of a sudden, this behavior now is totally strong. I would say, probably safe to say superhuman in the sense that, obviously, we create these measures because some of us struggle with the hindsight bias. So yeah, you wonder, okay, with those kinds of things, we do see those sudden jumps in capability in the context of GPT-4. Another 10, 50, 100x compute scale up, it's predictable that it will bring more of them, but it's very unclear what exactly they would be. So one to 50% across these big scale up training runs. How do you think that plays out across different groups that might be running those processes? I don't know if you may or may not want to go to specific names, but obviously, there's a few leading groups that can plausibly scale up another one or two orders of magnitude right now. Do you think that's equally reckless for any of them to do, or do you think some have a better handle on how to do that responsibly than others?

Jaan Tallinn: (48:48) I mean, there are always some differences in various dimensions. Yeah, I mean, just hanging out in Anthropic feels materially different than hanging out in DeepMind, where I both have spent time, and a little bit at OpenAI. So there's certainly a much more safety culture in Anthropic. Does that justify risking everything? Killing everyone? I don't think so. So it's in some ways these are second order. In my view, they're second order effects, how safety conscious your group is compared to the fact that you're taking just massive risks with everyone's lives right now.

Nathan Labenz: (49:35) So how do you think about—you've mentioned two companies that I have serious questions about. I guess let's go DeepMind first. I've been waiting for a Gato 2 to drop, and it seems, as I check my imaginary watch, that is probably due right around now unless there's some sort of pause or somebody's thought better of doing a Gato 2, or maybe it just didn't work for some reason, but that seems unlikely because it seems like almost everything is, quote unquote, working these days. Do you have a sense for what is going on at DeepMind? Demis published a Time article. It feels like a long time ago, much more reserved and moderated tone for a Time article than Elias' more recent one, but still pretty striking to see founder, CEO of DeepMind saying, we need to think about slowing down. Are they slowing down?

Jaan Tallinn: (50:37) I mean, I don't know. I don't have that much visibility into DeepMind. I have heard about them deliberately being more cautious about publishing things, which is an empirical thing that I haven't verified. Is that actually true? But it feels that they are more careful now when it comes to publishing. We are in a lucky world that all the big frontier labs are safety conscious to the level, at least to the level of not dismissing the risks in a way that, for example, Yann LeCun or Andrew Ng are completely dismissing the risks. It's not obvious that the world should be in a way that, for example, I've been praising Sam Altman for saying the things that he says about the risks and he's been very explicit about the massive dangers that humanity is facing from AI. Another question is to what degree does this safety consciousness actually constrain the actions of these companies that have their own incentives as nonhuman optimization engines and are necessarily somewhat hard to lead. I mean, the leaders of AI companies have a bunch of conflicting requirements that they want to satisfy, especially in DeepMind where one big constraint is that they're not a company. They're a subsidiary of Google. So in some ways, I'm sympathetic to them trying to navigate that very complex set of constraints.

Nathan Labenz: (52:37) I think you're right. Yeah. I found myself saying this too. It is easy to imagine people that are a lot more cavalier running the frontier projects. So I'm thankful that there does seem to be a profound awareness and real seriousness of approach across the biggest companies. In some ways, I also feel like we might be in a lucky scenario in that language models are taking off and yet they're very soft-edged AI. They run slow. And I contrast that to what Eliezer sort of had in mind 14 years ago or what DeepMind was seemingly closest to. If I had to say 5 years ago who was closest to AGI, I would have said DeepMind with all of their game playing, learning agents, all that kind of stuff. Those notably achieve dramatically superhuman performance in obviously narrow domains. They also run really fast, and they're trained, if anything, in an even more alien way where AlphaZero just plays itself in all these games and learns from that and doesn't even need to see the database of human games. And therefore, when it shows up with superhuman skill, it's also an alien superhuman skill. You get these dramatically surprising moves that no human would ever have made. In contrast, I feel like language models, everything has pros and cons, right? They certainly have insane surface area, but their softness and slowness does seem like it might be a real advantage relative to a more hardened, faster agent type of model. How do you think about that? Do you think we are lucky with LLMs, or am I just naive in my optimism?

Jaan Tallinn: (54:46) Yeah. I think the big trend has been negative in terms of going towards more and more black box and uncontrollable training regimes. Going from expert systems to supervised to unsupervised learning. On the other hand, yeah, there are a few things definitely that we got lucky with. I would say the prime one is the fact that you actually do need a lot of compute to do the pre-training of large language models, which means that there are only a small number of organizations on the planet who can do that. And those training runs are potentially very conspicuous. I only half-jokingly say that the planet is now breeding alien minds in a way that aliens can see because very plausibly you can see those energy expenditures from space. So that's one lucky thing about LLMs. And the other thing, yeah, I agree that the speed at which, or the slowness rather, at which they process things is an advantage. But this is a temporary advantage, I'm pretty sure. Because human minds, human brains themselves offer a proof of concept that no, it doesn't have to be that slow. This is just pure inefficiency. And the other thing is once you have some kind of feedback process where the LLMs will start developing AIs, those AIs might no longer be LLMs.

Nathan Labenz: (56:22) What do you make of this current moment? I mean, this is something that's really just popped up and gone widespread in just the last two weeks. But there's all these projects to create, one of them is called BabyAGI. Another is called AutoGPT. And essentially, they're taking a language model, putting it in a loop, and giving it the ability to have a goal, delegate to itself, go through these thinking, reasoning, planning steps, then start to use tools. And again, getting around the hard limit of a context window through some sort of self-delegation. I'm struck by that as potentially the next convergence between those two paradigms in some ways. And it also seems to open up the potential for a kind of self-play reinforcement learning. These agents are not very good right now. And so if you go on Twitter, you'll see people being like, this is so amazing. Look what this thing can do. And then you'll see other people being like, it fails way too much. These are not useful. It's going to be a long time before they are useful. But I think those people are wrong in the sense that this is the first language model paradigm that feels like it's relatively easy to evaluate in a fairly open domain. Because you can know, did the thing book you the flight or whatever? Or did it just get hung up on some API error that it never solved? And it seems like they're going to learn pretty quick from this massive little agentization and exploring paradigm that's just been set up. How worrying of a development is that for you?

Jaan Tallinn: (58:09) There are several frames to look at this thing, and these frames will give you almost very opposing judgments about the situation. For example, one very positive frame to look at this is that it's great that society is poking, rattling and poking these current models to see what are the extremes that they can push them to because they are not very competent. And by having those experiments with ChaosGPT and whatnot, we as a civilization will actually learn how bad things could be if things would be scaled up. So if you take ChaosGPT and put GPT-6, I claim you might not be safe at all anymore. On the other hand, you can take this frame that when people like Yann LeCun, et cetera, have been saying there's nothing to worry about from AI because it's not going to be agentic. And even if it's going to be agentic, it would be just stupid to install some self-preservation and just bad goals. We're not going to do that. It's like, no, we absolutely are. A fraction of humanity has a death wish. So these are clear empirical demonstrations that if something really bad can be done with AI, it will be done.

Nathan Labenz: (59:38) Yeah. It's a big world, and it's pretty easy. I mean, that's the other thing that's amazing. I think the first commit of the BabyAGI project, which I believe has been the number one trending project on GitHub over the last couple weeks, alongside a couple other very similar projects. The first commit, I think, was 105 lines of code. And that's all it takes, a couple clever prompts and a loop. And you've got yourself a little agent, and it might not do much yet. But given the model, the barrier to creating some sort of semi-embodied, autonomous version of that is proving to be extremely low. So, yeah, I don't think we're going to be able to rely on the good discretion of users in the long term. Certainly, probably not for more than a few days with the release of any major new system. You mentioned a minute ago, you said three leading groups. And I wanted to ask you how you think about who is at the frontier and who is maybe going to be at the frontier over the next few years. I assume the three you had in mind, you didn't specifically say OpenAI, but, obviously, they're in that group. DeepMind was the other that we were discussing, and then I am guessing you're thinking Anthropic would be the third.

Jaan Tallinn: (1:01:00) I think DeepMind and Google, they're kind of interchangeable. I kind of hear that they have even publicly mentioned that they are somehow joining forces when it comes to this LLM race.

Nathan Labenz: (1:01:13) So nobody else you feel like is close enough at this point? If it's a coordination problem of who actually—who are you calling on to pause? I mean, you're calling on everyone to pause, but it sounds like it's really those three organizations that you're calling on for a pause.

Jaan Tallinn: (1:01:28) Yeah. I think they are the first tier when it comes to doing the most dangerous experiments. But of course, then you have the second tier. I think Eliezer has this related law, Moore's Law of Mad Science that I kind of forget exactly how it was framed. It was something like every 2 years the IQ needed to destroy the world drops by 1 or 2 points. As the hardware companies, mostly NVIDIA at this point, throw more and more capable and cheap computing cycles at the market, the world destroying capability will be in a larger number of hands.

Nathan Labenz: (1:02:17) Interesting to think about how that evolves over the next few years. Do you think if you imagine that we do enact a pause and then meanwhile, NVIDIA keeps shipping and people keep doing fundamental research, which notably the letter explicitly goes out of its way to say we're not saying all AI research should stop or that you can't build your small models or fine tune things for your use cases and so on. So if we imagine a world where there is a pause on these high end experiments, but hardware continues to ship and generally speaking the field is not shut down, do you have a guess for how many folks would be able to do a GPT 5 type project if they chose to in, say, 5 years?

Jaan Tallinn: (1:03:12) 5 years is a super long time.

Nathan Labenz: (1:03:15) I'm with you on that, by the way. I don't even try to guess things 5 years out, so I shouldn't ask.

Jaan Tallinn: (1:03:19) 2 years. I mean, yeah, probably a dozen is something that I'm just pulling out of thin air. If I would think about it, then I would probably have a better estimate. It's probably less than 100, more than 10, perhaps closer to 10 than to 100 is my answer in 2 years.

Nathan Labenz: (1:03:36) And so some of those we can kind of fill in pretty obviously. Right? Meta seems like it would be a very natural candidate. Microsoft, I mean, they have the OpenAI partnership, but they certainly also have their own research division. Presumably, Apple has the resources to get into that game. Maybe even Tesla. I mean, they're focused on other things at the moment seemingly, but they also have the bot. The Tesla bot is going to need some sort of fairly general intelligence to help it walk around and talk to people and pick stuff up. Any other specific actors that you think would be likely entrants there? And then how do you think about the international scene? Is there anything coming out of Europe? And obviously, then everybody starts to think about China too.

Jaan Tallinn: (1:04:24) Yeah. I was going to say that in a 2 year perspective, you should start also looking at non US actors in Europe where compute is much more available. And then also China. I think one obvious counterargument that we are getting when it comes to calling for the pause is, what about China? And my answer there is that currently China does not seem to be in the race, at least not as intensely as the leading US labs are in a race between themselves. And second, almost culturally, the Chinese seem to be much less keen on pulling a Bing and just unleashing an uncontrollable mind on their territory. I'm only half joking to say that in China, if you do that as a tech CEO, you might get to disappear. But yeah, in the longer run of 2 or more years, it will become more and more important to get some kind of international agreements that we already have in nuclear, for example, in place also for compute.

Nathan Labenz: (1:05:52) On the China front, I totally agree. I don't know why we would assume, given the posture that the Chinese government has taken toward technology over the last few years, that this is the technology that they're just going to throw caution to the wind on. Right? I mean, they've shut down, functionally shut down their entire video game industry and limited it to, as I understand it, just a few hours on a couple weekend nights per week is all that video game companies, I think, can even operate in China now. We should fact check me on that, but that's what I understand to be the case.

Jaan Tallinn: (1:06:26) And online learning, I understand, is also being mostly restricted.

Nathan Labenz: (1:06:31) And that one is fascinating too because who could object to online learning? But my sense is that they intervene there on a state level because they feel there is an unhealthy market dynamic developing where people are working too hard on these standardized test measures and putting way too much resources in this, and it's gone past the point of benefit.

Jaan Tallinn: (1:06:57) Yeah. Yeah. I mean, I'm actually a chairman of an Estonian online language teaching company, Linguist. And one thing that they learned in the Japanese market is that there's a massive English teaching market in Japan, but people don't care about learning the language. They just want to pass the tests in order to get better employment options. As a language teaching company in Japan, your job is not to teach the language. Your job is to get people to pass and get good test scores, which is a very different job. So I suspect that something like that also happened in China.

Nathan Labenz: (1:07:41) I would not personally, I'm not ready to say by any means that I want to fully subscribe to Xi Jinping thought or live under the technology regime there, but it does seem like we're jumping to a conclusion way too quickly when we say, well, if we don't do it, they will, and that'll be worse. But leaving that aside for a second, just coming back to ourselves. The other big thing that has come out this last week or so is an apparent, and I take that it's probably legitimate, leaked fundraising document from Anthropic that says that they are planning to raise 5 or so billion dollars and kind of see the next 2, 3 years as supercritical, planning to do next gen models. They're immediately moving into a GPT 5 type scaling regime, it sounds like. The model itself is supposedly going to cost a billion dollars to train. And then they say, again, this is all according to reporting, I haven't seen the deck, but those that fall behind in the 2526 time frame may never be able to catch up. So when I heard that, I was like, man, that does not sound like a company that's about to pause. It doesn't sound like, I mean, it does sound like a company that's in the race now. How are you viewing that news? And I don't know if you have any inside view. I obviously wouldn't ask you to share anything that you shouldn't, but what should the public make of that update from supposedly the most safety centric leading lab that there is?

Jaan Tallinn: (1:09:33) Yeah. I mean, to the degree that this thing was accurate, which again, I can't comment on because I'm an investor in Anthropic and board observer as well. I think to the degree that it was accurate, it is strong evidence that there is a massive race happening between the US companies that is going to get us killed. So can we stop that race, please?

Nathan Labenz: (1:09:58) What exactly is the thought process knowing that this group came from OpenAI? The high level description that I've heard is they felt like it was becoming too commercial. They wanted to be more focused on a safety first type of approach. And that's been 2 years or whatever. Now this. The only thing I can come up with is people must be thinking, we'll do a better job than they will do. So, therefore, we should do it before they do it. Because if we don't, then they'll do it or they'll do it worse. And it seems like maybe everybody's thinking that. I kind of model OpenAI to some degree that way as well. Is that how you think about the decision makers, or what do you think they're thinking?

Jaan Tallinn: (1:10:44) So I think one very informative public piece of information is the Future of Life Institute podcast with Dario and Daniela Amodei. That was during COVID about a year ago, in early 2022. Well, basically, they explain what the approach of Anthropic is. And as far as I know, what they said in this podcast is still true. They basically train the frontier models and then do alignment in an empirical fashion while having access to frontier models. And the claim is that it is exactly because of these emergent capabilities, there is only so much you can do using not state of the art models because in some ways, as the models get more competent, they also become easier to interact with. And just the fact that we have language models in the first place, that is in some ways, as I think you pointed it out, can also serve as a good interface when it comes to alignment. So yeah, that is kind of Anthropic's thesis, to have these models and then basically use those to do state of the art alignment that is empirically tied to the actual objects that we have. Now, of course, the big question there is how many generations you can do that for? Because the pre training is largely an uncontrolled, unsupervised process. How many generations can we do with the pre training safely, not to mention things like leaks to elsewhere of the resulting weights? So I think it's a genuine dilemma. In some ways, I think Anthropic's framing and perspective makes a lot of sense because indeed, much of the time, you can get more useful alignment work done with the latest crop of the large language models. On the other hand, each training imposes some risk, again, in my estimate, 1 to 50% risk of complete annihilation of the planet. How do you navigate that trade off? It's not obvious, and I don't think there's enough thinking into that trade off currently.

Nathan Labenz: (1:13:22) Again, it feels like there's some sort of game theory element here where it seems like they're doing it because they believe someone else is doing it still on some level. Right? If they roughly share your view and they're like, well, the only way we can do the alignment work is if we have access to the latest models, then a good current counterargument would be if it were true, well, nobody else is going to create these if you don't. So maybe you should just sit tight too, and then we can all kind of study what we have. We don't need another frontier just yet. So it still seems like it is fair to say that on some level, they feel like their hand is forced. They can't not do it because they're either in the game or they're out of the game, but the game will continue regardless. It seems to be the model that's kind of implicit in that decision making.

Jaan Tallinn: (1:14:09) Yeah. I mean, there are a few models that are consistent with the evidence, but that definitely is one model because of these race pressures. They're going to feel that if they don't have access to the latest generation of models, their prospects of actually doing alignment are significantly hampered. So that's the positive way of framing things. The letter, I think, explicitly mentioned that one goal of this letter is to call for this time out in this race that indeed has a non trivial game theoretic element.

Nathan Labenz: (1:14:47) I think your most recent fundamental AI research investment is in Conjecture. And if I understand correctly, and I may be wrong on this, but I don't get the sense that they are planning to try to train a GPT 5 in the short term. How are you thinking about their contribution, their strategy? It seems like they take a different approach where they don't feel like they have to create the frontier models in order to do something useful.

Jaan Tallinn: (1:15:11) Yeah, I am much more positive about Conjecture's approach in terms of the safety capabilities trade off. They still train language models, but they do not do the latest language models. Their goal is to compose less capable language models in a way that makes for a more predictable structure so you don't risk the world during the pre training phase and have a more, in some ways, kind of more old school approach to AI rather than this summon and tame approach.

Nathan Labenz: (1:15:49) Yeah. Interesting. We just did an episode with Andreas and Zhongguan from OTT, and they have a very similar outlook too. Composition of models, the traceability of all the logic, atomization of the different decisions and operations to try to create some sense of designed control into the system from the beginning. It sounds like Conjecture is on a similar line of thinking.

Jaan Tallinn: (1:16:21) Yep. And I think if we managed to get the pause in this game theoretic race - assuming it is a game theoretic race, there's also some frame that says that no, this is just apocalyptic death cults trying to end the world. This is the charitable frame. But if we could get a pause, then I do think that there's this almost automatic pressure to get more competence out of the minds that we already have trained. And part of it is just to have a better understanding, a better composition of the capabilities that we already have. Because one important bit, as a lot of your listeners probably know, is that you have this training phase that is much, much more expensive than actually the inference phase. So once you have spent a lot of compute on training, once you finish the training, you have a lot of ability to run many, many instances of the minds that you just trained.

Nathan Labenz: (1:17:28) So let's talk about METR. So you guys, obviously, we've covered a decade of your thinking and investment on this subject, and now we get to the point where GPT-4 is released. It is closing in on human expert performance in a great many domains. It does seem to me quite unclear what the next generation of that would bring. Obviously, you guys are thinking something very similar there. So how did this project come about for a letter? How did you guys settle on a 6 month pause? What was the process like of trying to bring a broad coalition together? And was this something that you guys actually thought might happen, or is it kind of intended to be a conversation starter? How do you think about this project?

Jaan Tallinn: (1:18:21) Good questions. I remember we had a FLI catch up call on March 21st. So I mean less than a month ago. And we were kind of, we had already observed that there were significant voices in the public. I mean, the Ezra Klein article in New York Times had come out where he explicitly compared the current AI race to summoning minds. And I think Harari's article was also published around that time, where he was concerned about LLMs plugging into the operating system of civilization, which is language and which operates on the language level. And then numerous discussions, sort of private discussions that were very concerned about the current race, including with people in the labs themselves. So we thought that perhaps one valuable thing that FLI could do is to try to create some kind of common knowledge that yes, a bunch of people are worried and that would basically create a situation where those people know that other people are also worried and the other people know that the other people know, et cetera. And we thought, okay, we have some experience with open letters, so perhaps we should try to draft one up. And of course, our previous open letters had something like a thousand or less than a thousand signatures. So one thing that we got just completely blown away by was the reaction. Kind of immediate in the first few days, we got tens of thousands of signatures and we had technical problems because of that. So yeah, when it came to drafting the letter, there were multiple considerations. One consideration was just speed. So clearly, if we would have had several weeks to work on it, then it would have been much better. But it was sort of done much in the spirit of not letting the perfect be the enemy of the good and let's start off something that feels okay to put out. Indeed, the 6 month number was one thing that was put in and then taken back out from different versions of the draft. And the argument, finally, that won over the 6 month thing was that one question we get is, why 6 months? What can you do with 6 months? What can you demonstrate with 6 months? What would 6 months pause buy us? One important thing that people don't necessarily realize that 6 months pause would buy us is confidence that we can pause. And so, in that sense it's better to have a proposal that calls for 6 month pause to fail than a proposal that calls for indefinite pause to fail. Because in the indefinite pause situation people go, oh yeah, if it would have been 6 months, of course we could have done it. But because indefinite, nobody would pause indefinitely. Right? So that was the final kind of reason that thought, okay. Let's put in 6 months and see what happens.

Nathan Labenz: (1:22:09) Yeah. I thought Zvi had some great analysis of this in the first. Somebody who also works with a speed premium that I appreciate had some just, I thought, pretty ultimately simple, but still very wise analysis that's just, if you feel like you're going to need coordination in the future, it makes sense to start building it now. And if you can only get a little bit going at first, then you kind of take what you can get and you hope to build on that foundation.

Jaan Tallinn: (1:22:38) Yeah, the other really big consideration that fed into the open letter is that there's a lot of reasonable discussion that happens in the labs and even between the racing labs about the need to coordinate, the need to pause, the need to be careful, and people have been public about it, et cetera. But there's always one missing component. The component is when. They always say, I think Scott Alexander wrote a good article about OpenAI's AGI and beyond statement, where he also pointed out that yeah, they're saying a lot of nice things, which is great. I mean, honestly, it's good. But they don't say when. And this is a little bit suspicious, at least. And so therefore, one of the rationales for the letter is that, how about now?

Nathan Labenz: (1:23:34) So what have you - I want to go through some additional kind of questions that I've seen floating around in the discourse, but what is your sense of how the reaction has been? Obviously, there's been a ton of signatures, even one notable CEO of a company that I would say is close to a leading lab, which is Emad from Stability, signed on to it. I thought that was fascinating. As far as I know, it's been pretty quiet reaction from the kind of three main groups that are the ones who are going to either pause or not pause for something beyond GPT-4. So what do you make of the reaction from them? Has there been any - there's been no public statements as far as I know, and has there been any kind of private or confidential reassurance? Has there been any reaction from the leading labs?

Jaan Tallinn: (1:24:31) Yeah. I mean, I think Sam Altman said publicly that he's with the spirit of the letter, appreciates the spirit of the letter or something. But then there was, of course, a but. I don't even remember what the but exactly was. But yeah, DeepMind definitely hasn't said anything nor has Anthropic. Although, I just today read Jack Clark's newsletter, Import AI, where he mentions the letter and also says why he didn't mention it last week. There's some reactions, but I also kind of want to be careful here in the sense that I don't want to create some self-fulfilling prophecies. I would say that the possibilities are definitely very much open at this stage for the letter somehow catalyzing an actual pause, but it's double digit uncertainties both ways.

Nathan Labenz: (1:25:33) Cool. That's actually more positive of a response, albeit minimal, than I had expected or even understood. I haven't seen that Sam Altman tweet. I'll have to dig into that.

Jaan Tallinn: (1:25:46) Yeah. I don't think it was a tweet. I hope I'm not just seeing it in my dreams or something. But I think it was actually a comment in some news article about the letter.

Nathan Labenz: (1:25:59) Humans too suffer from hallucination at times, so we'll fact check ourselves.

Jaan Tallinn: (1:26:04) Indeed, we do.

Nathan Labenz: (1:26:05) So, okay. So you guys put this out there. Now people start to say all kinds of stuff, right? The thing that I think is kind of most - you didn't finish reading the letter, but which is definitely worth giving you a chance to respond to is, well, all the letter says is a pause. It doesn't ask for anything else, so it's stupid because what are we supposed to do? Again, you can read the letter for yourself. There are definitely some things that it calls for, but just kind of obviously, it's a big tent committee sort of document. Just zeroing in on your own priorities, what do you think are the most important tangible concrete things that we could do over, say, a 6 month time frame such that maybe even you would be comfortable ending the pause? I guess, priorities. And if those top priorities happened, would that be enough in your mind to then end the pause and train a GPT-5?

Jaan Tallinn: (1:27:01) So one thing that will sort of happen automatically is that we will get more experience with the crazy situation now that we're in, which is that we have, I think Yoshua Bengio put it, that now we have AIs out there in the internet that can pass a Turing test. And that is a novel situation. And I think we would be much smarter 6 months from now than we are now because 6 months will be a long period, a long chunk of time when it comes to living with aliens on your planet. So I don't know what we're going to learn, but hopefully we just are smarter in the autumn than we are in the spring. But when it comes to more concrete things, yeah, there's - I mean, Neel Nanda, a researcher in the UK who has worked with DeepMind as well as Anthropic, has this blog post called 200 Open Problems in Mechanistic Interpretability. So there's so much work that can be done with even the previous generation of models, not to say anything about the latest generation. So there's so much alignment work and just opening up those black boxes and trying to understand what makes them tick, how can we get any guarantees about what the next generation is going to do. So again, with research and with kind of lived experience, I hope we will be in a much better - I mean, I hope we will be in a much better, but realistically, I'll just say that we're going to be in a better place 6 months from now.

Nathan Labenz: (1:28:46) That doesn't sound to me like you would expect just on general kind of improvement to get to a point where you would then say, okay. Let's end the pause and do the next generation. Is there anything - I mean, correct me if I'm wrong on that. But maybe framing the question slightly differently, if we abstract it away from the 6 month time frame and said, are there concrete structures of some sort that we could put in place that would, on any timeline, kind of give you enough confidence or reassurance that, again, you can say, okay. It seems like we're now in a decent enough place that you would personally be comfortable going ahead with a next generation training.

Jaan Tallinn: (1:29:33) It's still, the big question is, how much risk are we okay taking? And I'm not saying this risk should be zero because there is always this background risk of extinction risk. We could be hit by an asteroid. There's a certain probability that this call will not end because the planet will be hit by some, probably not an asteroid because we can see those coming, but by a comet. Those are harder to see and understand. So we shouldn't get the risk to zero. But perhaps there are - if there is this pause and associated realization that these experiments are considered too reckless by society, hopefully this will create some kind of incentive gradient for the companies themselves to figure out how to make them in a more responsible manner and more legible manner. So I am interested in this project at ARC, the Alignment Research Center in Berkeley led by Paul Christiano called evals, evaluating models about what are the things that they could in principle do, other things that they are still kind of incompetent on. I mean, you can have multiple opinions about this, or different opinions about this, but there's a generalization of this. Is it possible to replace the current blind, as far as I know, training runs, with something that you, at every iteration, you do some tests that would give you some kind of guarantees about the alien that you're summoning?

Nathan Labenz: (1:31:10) Yeah. There's a few other things that I think give me a little bit of hope that they might be developed on the technical level during that time. There was an interesting paper that came out from, I think this was pretty small scale, I believe it was at Anthropic where they experimented with doing the pretraining on a human preference dataset from the beginning as opposed to just random decent quality text off the Internet. And it seemed that the upshot of that was that you never had quite as alien of an alien in the first place. On measures of harmfulness, for example, or helpfulness, it stayed much closer throughout the training process to that final post-RLHF or RLAIF state that we know now and never dipped as far into the very strange alien territory of general pretraining. So something like that, changing the dataset maybe from the get-go, seems interesting. Another recent thing that jumped out to me was somebody just, I think in the last week or two, published a result where they showed that they were able to increase the size of the model progressively throughout training. It was presented to me as an efficiency thing. That does go to show all these things are pros and cons. Everything is dual use. But the net savings on the training flops was around 50%, which is obviously significant enough for people to take notice, even for purely commercial reasons. But then it seemed to me like, boy, there's something really interesting there where you're creating this seed kernel, truly little baby version of the model that is actually just a lot smaller in terms of its parameters, and then you're able to layer on more and more layers and parameters as you go through the whole training process. It seems like maybe there would be a way to zoom in on that small thing and get something working right in the small, presumably much more interpretable version first before growing the model itself and just having one big tangled knot of parameters.

Jaan Tallinn: (1:33:37) I guarantee that it will not guarantee safety in the long run, but it might just be enough to reduce the probability of destroying everything in the next generation so we can actually do one more step in a way that is much more responsible than the current default.

Nathan Labenz: (1:33:52) So one thing that you have not gone to much here is regulation, government intervention. I mean, the letter does call for that. If there can be no pause, then government should step in and insist on one. But you're not putting a lot of, maybe you just haven't got to it yet, but I'm not hearing anything from you that's like government can come in and set up a regime that's going to do much for us. Is that your, do you have any hope in a government regulatory approach?

Jaan Tallinn: (1:34:23) Oh yeah, I do. Glad that you asked. This also reminds me that Google in the form of Sundar Pichai did react to the letter. There was a podcast, Hard Fork, New York Times podcast, where he said that the discussion started by the letter is great. But he thought that government intervention is necessary here, that it's not enough to rely on the labs self-regulating. I think one silver lining that I already mentioned is that this pretraining is super expensive and super visible. And therefore, the metaphor that I've been using is nuclear control. With nuclear arms, you also have these two phases. One is the hard step of enriching uranium, which is super energy intensive, as well as much more visible than the second step, which is harder to control, which is the proliferation of nuclear-grade material. So here, we have a similar situation where the pretraining actually happens just in a few places and is visible to governments, whereas the proliferation is already much harder to control. So yeah, my model is that, and also as I know, it resonates with thinking in the labs themselves, that if you want to have some constraints on the AI trajectory, then intervening in this compute, having some kind of compute governance, is probably a great thing to start from. And in fact, the CHIPS Act and the export controls to China are already things that are happening and in some ways make the problem much easier, although it's far from solving it.

Nathan Labenz: (1:36:28) I think that makes sense. I mean, certainly at the moment, the compute requirements are high. I do wonder, what do you make of things like the diffusion of just language model know-how, proliferation? I think we might have a guest coming up on the podcast that is building a decentralized GPU cluster potentially with some sort of blockchain governance where my M1 or M2 MacBook Air can contribute to a cluster. Do you think, how long do you think we have before that barrier of just mega compute resources goes away? Possibly because of these virtual clusters or possibly because of further algorithmic breakthroughs or just model leaks could be another thing. How long do you think that holds?

Jaan Tallinn: (1:37:24) I've got to refer back to this, Adeza's Moore's Law of Mad Science. But yep, with every year or a couple of years, destroying the world becomes easier. So there's that. But also, if we do not get a pause, I do think that the world will be destroyed by one of the big labs. Just because, yeah, the frontier models, it's much easier to train frontier models in a big data center than in a distributed manner. That said, if the pause does happen, then yes, we have to worry about things like proliferation, things like hackers stealing the weights and then doing experiments like ChaosGPT, but with a competent AI, stuff like that.

Nathan Labenz: (1:38:16) I think one of the maybe most, I don't know if it's an argument or whatever, but maybe one of the things that's hardest to argue with, in my experience, for those that are objecting to the letter or to the concept of a pause, is the sense that someone might say, you're not giving me anything to hope on here really. You're just telling me every generation it gets easier for the world to get destroyed. We've talked about buying ourselves some time here and there, but we haven't really heard much of a, here is the path we can take to safety. And so why bother pausing if we don't even have a sense of where we're going? Do you have any hopes for concrete paths to safety that you would try to inspire that kind of person with?

Jaan Tallinn: (1:39:06) Yeah. I mean, sorry, humanity. You have cancer. So it's like you might be cured of it, but currently it doesn't look good. So I'm not going to lie, the prognosis is bad. But it's not hopeless. And also, a massive silver lining is that if we do manage to survive the cancer, the future is going to be amazing. So in some ways, the expected value of the future is not bad. It's just the odds of survival are bad. But if you survive, the life of the world, the universe could be potentially unfathomably better than it is now. So in a sense, we are living a lottery ticket. And it is in some way in our control to improve the odds. And so that's what I'm doing.

Nathan Labenz: (1:40:06) Well, that's probably about as good of a bottom line as we could hope for for this conversation. So I want to thank you for spending the time with us. I do have a couple real quick hitters, just fun questions that I usually end on if you have an extra second for those. We touched on this earlier. Any applications aside from the obvious usage of the core language models that you are personally just finding delightful or useful that you would recommend that people check out?

Jaan Tallinn: (1:40:38) No. I'm just way bandwidth limited to tinker with the language models. I mean, I've done a little bit, but then I'm trying to find coders at this point to delegate a bunch of my projects, some of them might involve language models.

Nathan Labenz: (1:41:01) Fair enough. You're in the majority on that answer. Most people are just using a few things. And then second, let's imagine a world where we're here in a couple of years and Neuralink has been deployed to a million people. In this scenario, you don't need it for restoring any functionality. But if you were to get a Neuralink implant in your head, it would give you the ability to essentially transmit your thoughts to devices. So you would have effectively thought-to-text or thought-to-UI control. Would that be enough for you to be interested in getting a Neuralink implant?

Jaan Tallinn: (1:41:46) Depends so much on details. How reversible is the procedure? What are the risks, and what is the demonstrated upside? Like, will I become a better dancer as a result?

Nathan Labenz: (1:41:57) We'll have to see. Yeah, that one, they did show in their show and tell, they did show an animal where they were creating motor control through the Neuralink. But yeah, I think it was a long way from improving on your dancing skills. So I hope you are dancing for many years to come. Jaan Tallinn, thank you so much for spending this time with us. We appreciate you being part of the Cognitive Revolution.

Jaan Tallinn: (1:42:24) Thank you very much.

Erik Torenberg: (1:42:26) Omneky uses generative AI to enable you to launch hundreds of thousands of ad iterations that actually work, customized across all platforms with a click of a button. I believe in Omneky so much that I invested in it, and I recommend you use it too. Use CogRev to get a 10% discount.

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Try this at Home: Jesse Genet on OpenClaw Agents for Homeschool & How to Live Your Best AI Life

Pausing the AI Revolution? With Technologist Jaan Tallinn

Watch Episode Here

Read Episode Description

Full Transcript

Transcript

Transcript

Nathan Labenz

Read next