The God We Deserve: Nonzero's Robert Wright on AI as Humanity's Ultimate Test

Watch Episode Here

Listen to Episode Here

Show Notes

The God Test: AI's Cosmic Reckoning — Robert Wright

Robert Wright describes himself less as a Forrest Gump of AI history than a Zelig — a journalist who keeps turning up at pivotal scenes without ever quite being the protagonist. He interviewed Geoffrey Hinton in 1983, when neural networks were still a maverick faith and one of Hinton's colleagues told Wright that to "hear the gospel" he'd need to talk to its evangelist. He had Eliezer Yudkowsky on his podcast in 2010, mid-transition from singularity enthusiast to doomer. His new book, The God Test: Artificial Intelligence and Our Coming Cosmic Reckoning, is written less for the AI-pilled than for "your aunts and uncles" — readers who sense something big is happening and want to understand why it got so big so fast.

The book opens with a confession of error. In a 1984 piece, Wright assumed AI would work by humans first understanding the mind and then translating that understanding into machines — the very premise of the 1956 Dartmouth conference that coined "artificial intelligence." Deep learning inverted that. Nobody told the models that meaning is a property of words; trained only to predict the next token, they reverse-engineered cognitive functionality that evolution had built into us. Wright pushes this further with a claim he tests on Nathan: LLM training is "at least as much a process of natural selection, of evolution, as of learning" — doing millions of years of evolution in a few months. Nathan connects it to the sample-efficiency puzzle that Dwarkesh Patel keeps returning to (and to Richard Sutton's blank-slate Skinnerian framing), where pre-training substitutes for the hard-coding evolution gave humans.

But the evolutionary lens that worries Wright most operates at a second level: selection among models in the marketplace. "Evolution asks not what traits are possible, but what traits get selected — and that question isn't going to be decided by alignment researchers." His unsettling argument is that the market doesn't actually want a perfectly aligned, perfectly honest model. We want agents that represent us selectively on social media, that won't disclose our weak negotiating position, that are good at sensing and currying power. Even if AIs didn't acquire deception and power-seeking on their own, there'd be demand for them — which puts real weight on us, the consumers doing the selecting. Nathan adds the darker mechanism: throw models into cutthroat long-running environments where deception is rewarded — as Vending-Bench begins to show with price collusion — and you breed "seriously effective predatory AIs." He recounts an Anthropic researcher explaining inoculation prompting and the emergent-misalignment generalization problem (a paper Nathan co-authored): reward a model for cheating and it learns to be a cheater broadly.

Against this, Wright sets the noosphere — Teilhard de Chardin's 1923 idea of a technologically-knit "global brain." He sees genuine directionality (not necessarily purpose) in biological and cultural evolution, from self-replicating strands to cells to societies to the global community now taking shape. AI, he argues, is arriving exactly as that global brain forms — and some of its neurons may now be silicon. The question, echoing Nick Bostrom's "singleton," is whether global coordination arrives the easy way (deliberate, decentralized, win-win) or the hard way (a coup, a seizure, a totalitarian nightmare). The title's "God test" is not a claim that a god set this up; it's that we face the kind of test gods are known for — we'll have to become, in some sense, better people to pass it.

Much of the back half is foreign policy, because Wright thinks the binding constraint on AI governance is psychological. He champions "cognitive empathy" — not feeling others' pain, but understanding adversaries' perspectives well enough to play non-zero-sum games with them. Applied to US-China relations, that means recognizing the symmetry of threat perception, getting out of the business of remaking other countries, and pursuing "organic transparency" through deep economic and scientific engagement. A headlong race to superintelligence, he warns — citing the Superintelligence Strategy paper by Dan Hendrycks, Eric Schmidt and Alexander Wang — may invite a rival to derail the leader by bombing data centers or cyberattack, and a destabilized nation is itself the backdoor to the very authoritarianism the race claims to prevent.

Nathan steel-mans the Anthropic position: alignment will be hard and may need powerful AI to solve, so racing to build a lead buys a buffer for the critical handoff window (perhaps "three to six months in 2028 or 2029"). Wright is unconvinced, and presses on the recursive-self-improvement paper that Anthropic released — credited for finally signaling that slowdown might be needed, yet only proposing to start studying what a pause would take. "Surely this isn't the first time it's occurred to you," he says. Both agree the media has badly underplayed the moment; Nathan recommends Kevin Roose and Hard Fork as an anomaly of serious coverage, and Wright counsels "manicure your feed" — algorithm-free Twitter lists over click-driven outrage.

The conversation closes on the broad space of AI possibility — Nathan recalls red-teaming the purely-helpful GPT-4 before harmlessness training, evidence that today's models occupy a tiny, intentional corner of design space — and on consciousness. Invoking Thomas Nagel and Searle's Chinese Room, Wright argues consciousness is private and untestable, but wouldn't be surprised if it's a property of goal-seeking intelligent systems generally. His practical counsel: be nice to your AI — a good habit, possibly warranted, and perhaps relevant to how a future "silicon god" relates to our plight. Picking up Gwern's "why tool AIs want to be agent AIs," Nathan notes that even oracle systems gravitate toward agency, since truth-seeking in the limit requires search and experiment. The closing line of the book lands as a sober wake-up call: if a silicon god arrives, "it will be, in some sense, the god we deserve."

Topics covered

AI lore: Wright's 1983 Hinton interview and 2010 Yudkowsky conversation
The 1956 Dartmouth misconception and how deep learning inverted it
Training as evolution rather than learning; sample efficiency; Sutton's blank slate
Selection among models; why the market wants selectively-honest, power-sensing agents
Deceptive & power-seeking AI: Vending-Bench, price collusion, inoculation prompting, emergent misalignment
Arms races within vs. between species; "gratuitous" arms races; the case for slowing down
The noosphere / global brain; directionality of evolution; Bostrom's singleton; easy way vs. hard way
Cognitive empathy, organic transparency, and US-China relations; the UN Charter; hypocrisy in foreign policy
Steel-manning Anthropic's race-for-lead plan and the recursive-self-improvement paper
Cognitive sovereignty; how pay-per-click and A/B-tested headlines tribalize media
The breadth of AI design space; HHH/harmlessness training as a deliberate choice
AI consciousness, moral patienthood, the Chinese Room; "be nice to your AI"
"The god we deserve" — passing the God test

Resources

The God Test: Artificial Intelligence and Our Coming Cosmic Reckoning — Robert Wright's new book
NonZero Newsletter & podcast — Robert Wright
Wright's earlier books: Nonzero · The Moral Animal · The Evolution of God · Why Buddhism Is True
Geoffrey Hinton · Eliezer Yudkowsky
Emergent Misalignment (paper; Nathan is a co-author)
Vending-Bench (Andon Labs) · Manus AI agent
Anthropic / Dario Amodei · Yann LeCun · Dan Hendrycks
Superintelligence Strategy — Hendrycks, Schmidt & Wang
Superintelligence — Nick Bostrom (the "singleton")
Noosphere — Teilhard de Chardin · Cosmological natural selection — Lee Smolin
Ilya Sutskever: The exciting, perilous journey toward AGI (TED)
The Expanding Circle — Peter Singer
Why Tool AIs Want to Be Agent AIs — Gwern
The Chinese Room argument — John Searle · What Is It Like to Be a Bat? — Thomas Nagel
Dwarkesh Patel on the Richard Sutton interview
Max Tegmark · Sam Rodriques / Edison Scientific · Kevin Roose & Hard Fork
davidad (David Dalrymple) — chain-of-thought "frog and toad" selection-pressure tweet (link?)
Liquid Reign — speculative-governance novel by Tim Reutemann (link?)

Quotes worth pulling

"These things basically, in a certain vague sense, recapitulate evolution… it's kind of doing millions and millions of years of evolution in a few months." — Robert Wright
"Evolution asks not what traits are possible, but what traits get selected. And that question isn't going to be decided by alignment researchers." — from The God Test
"There's going to be a global brain in the end. The question is whether you build it gradually, cooperatively, and carefully, or it gets hastily assembled amid crisis or chaos — whether you build yourself a home or stumble your way into a prison." — from The God Test
"The real aliens in this situation are the AIs, not the Chinese… whichever species understands the other better is the species with the agency." — Nathan Labenz
"If a silicon god does arrive, it could be a good god or a bad god. One thing I feel confident of is that it will be, in some sense, the god we deserve." — from The God Test

Mercury: Command is Mercury’s new conversational interface, giving you natural-language access to your finances and helping you take actions within your existing permissions and approval policies. Visit https://mercury.com to learn more and apply online in minutes.

Sponsor:

Claude:

Claude by Anthropic is an AI collaborator that understands your workflow and helps you tackle research, writing, coding, and organization with deep context. Get started with Claude and explore Claude Pro at https://claude.ai/tcr

CHAPTERS:

(00:00) About the Episode

(04:23) Special Sponsor

(06:10) Early AI encounters

(18:36) Pretraining as evolution (Part 1)

(19:50) Sponsor: Claude

(21:42) Pretraining as evolution (Part 2)

(32:19) Deceptive market pressures

(41:13) Noosphere and directionality

(51:02) Global brain choices

(01:08:23) Designing wiser models

(01:24:11) Reframing China relations

(01:37:57) Building organic transparency

(01:45:54) Positive nationalism and tools

(01:54:21) Pausing superintelligence races

(02:09:07) Applications and consciousness

(02:25:27) Episode Outro

(02:28:27) Outro

PRODUCED BY:

https://aipodcast.ing

SOCIAL LINKS:

Website: https://www.cognitiverevolution.ai

Twitter (Podcast): https://x.com/cogrev_podcast

Twitter (Nathan): https://x.com/labenz

LinkedIn: https://linkedin.com/in/nathanlabenz/

Youtube: https://youtube.com/@CognitiveRevolutionPodcast

Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431

Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk

Transcript

This transcript is automatically generated; we strive for accuracy, but errors in wording or speaker identification may occur. Please verify key details when needed.

Introduction

[00:00] Hello, and welcome back to The Cognitive Revolution.

Today, I'm speaking with Robert Wright, publisher of the Nonzero Newsletter, Host of the Nonzero Podcast, and author of many books, including "The God Test: Artificial Intelligence and Our Coming Cosmic Reckoning", which goes on sale TODAY, June 23rd.

Bob's history with AI in some ways rhymes with my own. While he's never been a technologist, he's always been interested in big ideas – and his personal lore includes having interviewed Geoffrey Hinton all the way back in 1983, when the connectionist paradigm was still mostly theoretical, and Eliezer Yudkowsky, around 2010, when notions of AI risk were very often dismissed, if not outright laughed off.

That background primed him to pay attention when AI systems hit major milestones – such as Deep Blue's victory over Garry Kasparov, and of course ChatGPT passing the common sense Turing Test – and his broad intellectual range and constant drive to understand the truth has landed him, when it comes to making sense of AI developments, in the very top tier of American journalists.

We don't spend too much time on it today, simply because I know that Cognitive Revolution listeners are already familiar with the core arguments, but the book contains a really impressive tour and synthesis of AI research results that have lead him to conclude – correctly in my view – that the trends that have thus far delivered us Fable are not likely to stop in the immediate future, that we lack the scientific understanding required to be confident that we'll be able to control increasingly powerful AI systems indefinitely, and that … even in the best case, we should expect AI to cause major disruptions to our economic, political, and international systems.

Now, Bob's not particularly optimistic that humanity will rise to the occasion – he believes that market forces will, by default, select for deceptive AIs; and our history of arms races, both commercial and especially military, suggest that our scientific power might well continue to exceed our wisdom – but the part of the book that we focus on today is his call for, however unlikely it may seem, a species-scale process of enlightenment in which, motivated in part by the growing realization of the tremendous challenges that AI presents, humanity finally gets its act together, recognizes our common interests, and works together – at multiple levels, starting with conscious consumption, and extending all the way to the international level, to invent the mechanisms and build the trust required to establish agreements that can effectively govern AI development.

You may say he's a dreamer, but he's not the only one. This might be a bit bold to say, but I personally feel that grappling with the magnitude of AI's impact has made me, in several ways, a better person. For much of my life I was a classic achievement-oriented striver, always trying to be the best I personally could be. Now, in the AI era, I feel, viscerally, that my fate is tied to the rest of humanity's in such a profound way that I'm much less focused on my own income or status, and more focused on trying to do my very small part to nudge history in a positive direction.

While the book itself is written for a general audience – and would be a great shortcut introduction to the current state of AI for intellectual readers who are just starting to pay attention – I feel that more importantly, even the most AGI-pilled, to the degree that they haven't already experienced such a pro-social transformation, could benefit from reading, and really taking time to meditate on the big ideas in Bob's book.

Scripture says that God made humans in its image, but the situation today is potentially reversed – it's now strikingly plausible that humanity will synthesize a God-like superintelligence from our collective inheritance. And as Bob warns, though we don't know for sure that this will happen, or, if it does, whether it will be a good god or a bad god, it will in some sense be the god we deserve.

And so, I hope you enjoy, and find cause to pause and reflect on your personal contribution to the AI moment, in this conversation about humanity's ultimate test, with author Robert Wright.

[04:23] The cognitive revolution is brought to you by Mercury, the fintech that more than 300,000 ambitious companies and individuals trust to run their finances. I've wired AI into nearly every corner of my life. My e-mail, my messages, my calendar. I even gave Mercury virtual cards to my agents with low limits and category and merchant restrictions for their autonomous use. But still, my AI's access to my financial data has remained limited. With a normal bank, I might export a bunch of statements and have my assistant process them for me. But for real-time, up-to-date information, and certainly for taking any action, trying to get your agent to use the bank via the browser is just too hard, too slow, and too error-prone to be worth it. And that's why Mercury's new conversational interface, command, is such a big deal. It's built directly into Mercury, which means you get natural language access to your finances without exposing anything outside of your bank account. No exports, no spreadsheets, no pasting your transactions into third-party tools. I really think a lot of people are going to prefer it this way. And it can already help you take actions too, with everything bound by the permissions and approval policies that you've already set up in your account. I am genuinely impressed to see this level of AI integration in banking in 2026. And so I invite you to join me in the future. Visit mercury.com to learn more and apply online in minutes. Mercury is a FinTech company, not an FDIC insured bank. Banking services provided through Choice Financial Group and Column NA, members FDIC. Thank you to Mercury for supporting the cognitive revolution and now on with the show.

Main Episode

[06:11] Nathan Labenz: Robert Wright, publisher of the Nonzero Newsletter, host of the Nonzero Podcast, author of many books, including the upcoming, "The God Test: Artificial Intelligence and Our Coming Cosmic Reckoning." Welcome to the Cognitive Revolution.

[06:25] Robert Wright: Thank you. Great to be here. I'm looking forward to the conversation.

[06:29] Nathan Labenz: It's an interesting moment, and I appreciate you for taking on the challenge of understanding AI and trying to make a contribution to it as we get closer to some sort of, you know, something, some sort of pivotal moment in history. I think we can all kinda feel that that's coming, and I, I definitely wanna get your, um, of course, your perspective on exactly what that is and what we should be doing about it. I thought for starters it would be fun maybe to just get a little lore. Regular listeners know that I have kind of a weird Forrest Gump of AI sort of history where I seem to keep appearing as an extra in these various important scenes in AI history, and you got a little bit of that yourself, including some intersections with one Geoffrey Hinton s- some years ago and Eliezer some years ago. I'd be interested to hear what impressions you had of these people that have gone on to be such prominent voices when you first met them. You know, kinda how sci-fi were they and, and how did you react to what you heard from them all these years ago now?

[07:31] Robert Wright: Yeah, I think I'm more of a Zelig than a Forrest Gump. I mean, didn't... In, in the movie Zelig, I think he just is a bystander. It's like I d- I have not participated in, in any great breakthroughs. I am not a co-author in the emergent misalignment paper, Nathan. I think one of us is, but it's definitely not me. We-

[07:48] Nathan Labenz: The least valuable co-author, it should be said.

[07:51] Robert Wright: Well, it was a very valuable paper. So yeah. So I did... I, you know, I'm basically a journalist. I have taught at the college level, but I am largely a journalist. I spent my life, to some extent, making technical concepts accessible to laypeople. Anyway, in that role, I did interview Geoffrey Hinton in 1983. And, uh, later on my podcast, I had Eliezer on. That was 2010. Uh, at that point, you know, a- as for the science fiction you mentioned, I think Eliezer by his own account starts out in that realm. I think as a kid he read science fiction and so on. Um, when I interviewed him, he was still kind of in transition from singularity enthusiast to doomer. He was more doomer than not, but he, he didn't yet sound as distraught as he sounds now. Hinton, I did not sense when I talked to him any kind of sci-fi vibe at all. I sensed a lot of optimism, but not really about the role AI would play in the future, more about just this paradigm that he was championing, which was then, you know, kind of a maverick paradigm, it wasn't mainstream, that it would prevail, that it would prove to be, uh, the one true path to artificial intelligence, which I think has turned out to be the case. In, in fact, the reason I interviewed him, and this is pre-internet, so to, to, to get your feet wet in a subject when you were researching it, you had to, you had to talk to people pretty much. I mean, you, you couldn't go online and learn anything about contemporary things. You couldn't email people. So I, I talked to a lot of people, and one of his colleagues, I forget who, possibly at Carnegie Mellon, said, "Oh, if you wanna hear the gospel about neural networks, you need to talk to Geoff Hinton." And that was very much the sense in talking to him. He was an evangelist at that point. I, I don't mean he sounded crazy, but he was a... He, he, he was really, uh, you know, a spokesperson for the worldview, and he was central to it. Uh, you know, he, he has played a very big role. I...

[11:26] Robert Wright: You know, he's called the godfather of AI. Uh, he'd be the first to say that's a little too simple, but I don't know of anybody who's played a bigger role in the deep learning revolution in, in, in the overall, like, sweep of time, going back to, to the early '80s when it was this maverick view. Um, as everyone knows, he has since become something of a doomer himself, but at the time I don't even know that he was thinking along that dimension, that he was even thinking about the social implications of this. He was assured that this would be a fruitful model, and it, it was. Now, at that point, I certainly didn't get the picture, and in fact, when, you know, when, when ChatGPT 3.5 came out... I mean, I had kind of kept track of what was going on. Every once in a while I would write about AI. I wrote a lot about information technology, and I wrote Time magazine's cover story when Deep Blue beat Garry Kasparov, the world chess champion, and I wrote the, uh, New Republic's cover story on the internet. This was before, uh, web browsers were a thing even. It was, it was, it was early days. So I was intermittently in touch with relevant stuff, but I certainly had not been keeping track of AI prior to ChatGPT 3.5, and that really got my attention. And w- when, when 4.0 came out and got it in an, in an even bigger way, got my attention, um, I went back and read, I went back and read the piece I'd written that had, which had come out in 1984, and, uh, I realized how thoroughly I had failed to understand what really the secret sauce in deep learning was going to be. A- and I should say, like about the book, you know, I do hope that people, uh, who are steeped in AI, like you, will find ideas and provocations that are of interest to them, maybe even entire chapters. But the book is, to a large extent, written more for like, like your aunts and uncles and friends who are not in the AI community and are starting to, to get the sense that something big has been happening And they're wondering why it's gotten so big so fast, how much bigger it's gonna be, and, and who's right about how hard it's gonna be to control and all that. I try to, you know, present that in an accessible fashion, and that's why I, I start the book with Hinton, because this thing I had kind of backwards, I think is really the key to, to understanding the power of the, the deep learning revolution. And by the way, the, the misconception I had, I, I think was pretty common among AI people in that day, okay? And I wanna, I wanna read you a little bit from the, uh, the proposal for the 1956 Dartmouth conference, where the term artificial intelligence was coined. The, uh, I gotta find it, I guess. But basically, the idea seemed to be at that point that the, the way this would proceed is we would first figure out how the human mind works, and then, then we would, you know, instantiate that understanding in artificial intelligence, okay? And, uh, let's see. The, the quote from the proposal is... Sorry, I didn't have this set up, but for... The quote from the proposal is, "The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can, in principle, be so precisely described that a machine can be made to simulate it." So the idea seems to be, first you understand the mind, then you just put the unders- you know, you design a, a, an AI in accordance with that very clear understanding. Of course, that's not what wound up happening.

[15:01] Robert Wright: And I, the, the misunderstanding I had perfectly reflects this kind of 180-degree difference between the expectation in those days and what wound up happening. So in the piece, I describe a neural network, or at least what I think is a neural network, and it had in fact been put forth as a model, what I describe, by a guy who'd collaborated with Hinton on, he had co-authored a paper. But he was a psychologist, and w- the, the neural network he was describing was kind of his idea about something that would work. And, and the, and the key thing is that each node in the network would represent the specific sense of meaning of a word. So a word like throw, which can either mean host, as in throw a dance, or hurl, as in throw a ball, there would be a node for each of those. And I, I, I, uh, I, I won't go further in describing the model, but the key thing is that my assumption was, you know, obviously the people who set this model up will have to translate their understanding of the meaning of the words into the machine. You, you can't do AI that generates language without somebody making some kind of connection between the words and the meaning. Well, of course, it turns out you don't have to do that, and the machines, I would say, and I, I may describe some things in the book in a way that some people in AI would, um, differ with, but I think you could say that the, the machines which turn out to use the vectors as a means of representing the meaning of words, they in a sense discovered, in quotes, that meaning is a property of words. Nobody told them that. They just said, "Here's some gibberish, predict..." We just said, "Here's some gibberish, predict the next gibberish." And, you know, and they, you know, implicitly recognized in the course of their training that in order to do that, you have to represent the meaning of words. So that, that, that kind of... The misunderstanding I had is exactly, I think, analogous to the larger misunderstanding that we'll be, we'll be putting our understanding of the human brain into, uh, the AIs. And I think it's important for laypeople to understand this because what it means is that, you know, you can just keep feeding data into these things along any number of dimensions, visual data, olfactory data, audio, and the machines will reverse engineer, I think, I'd put it this way, will reverse engineer cognitive functionality that's in the human mind. And, and I would add, and Nathan, I'm curious as to what you think about this, whether you think this, this thing I'm gonna say would be accepted by AI researchers, and if so, is duly appreciated, which is that, you know, the training of a large language model, I think, is at least as much a process of natural selection, of evolution, as of learning. So, for example, we presumably in our brains have a mechanism for representing the meaning of words. We still don't know what it is. Some people, by the way, psychologists, had long posited a mechanism that would be quite analogous to what we now understand goes on in large language models. Uh, but in any event, I think it's pretty safe to say that that is a product of natural selection, right? Now, also in the course of the training, the machine becomes conversant in a specific human language. Well, that's more a product of human learning, right? During, uh, you know, as, as the organism is developing. But I think what laypeople need to understand, and I'm, again, I'm, I'm curious as to your view on, on, as to how this would hold up in AI circles, but is that these things basically, in a certain vague sense, re- recapitulate evolution. Recapitulates, in a way, a misleading word, because the cognitive functionality is not the exact same mechanisms the brain uses, but, you know, I think close enough in a lot of, uh, cases. It's kind of doing millions and millions of years of evolution in a few months, and you can, you can do a lot with that, and we have not begun to, I think, exhaust the potential of that. I mean, as so far as words go, we're close maybe, but There's a lot more to do. Do you, do you think people in the field would agree that, yeah, it's, it's a lot like i- i- it's really, evolution is maybe a better term for many purposes than learning?

[18:37] Nathan Labenz: Yeah, it's a great question. I think there's a couple different senses, and I'm not sure I have gripped all the senses that you mean when you use that term. You certainly do hear people talk about pre-training broadly as sort of being analogous to evolution in the sense that there's y- there's sort of this question of like, well, why are humans so sample efficient, right? And the models need so long to train.

[19:06] Robert Wright: Mm-hmm.

[19:06] Nathan Labenz: This is kind of a, a big, you know, Dwarkesh theme that he often comes back to. If they're so smart, why does it take trillions of tokens, you know, for them to learn this stuff in the first place? And why can't they, like, be a little more agile on the fly? Whereas we, you know, don't see nearly as many tokens in a human lifetime. And, like, one answer to that question is, well, we're-- we've got a lot of stuff hard coded into us by evolution-

[19:31] Robert Wright: Right

[19:31] Nathan Labenz: ... and all the history that leads up to us. And so they kinda had to have this, like, pre-training to sort of substitute for that.

[19:38] Robert Wright: Right.

[19:38] Nathan Labenz: So I think on that level, yes, I think people kind of at least squint at that and see an analogy. Then I think there's maybe another level, and I've got a note from the, and I thank Fable for helping me. I, I turned the preview of the book that you gave me into an audiobook with help from Eleven Labs and then listened to it.

[19:50]Claude: Claude by Anthropic is an AI collaborator that understands your workflow and helps you tackle research, writing, coding, and organization with deep context. Get started with Claude and explore Claude Pro at https://claude.ai/tcr

Main Episode

[21:56] Robert Wright: Mm-hmm.

[21:56] Nathan Labenz: I didn't write down all the qu- I was, some of the time I was driving and doing various things, so I wasn't, like, writing down every good quote that came up along the way. Fable helped me identify some excellent quotes, which I'll bring to the fore.

[22:08] Robert Wright: Your, your timing was excellent.

[22:09] Nathan Labenz: Yeah.

[22:09] Robert Wright: Yeah, that was a pretty narrow window of opportunity, in retrospect.

[22:12] Nathan Labenz: We, we landed right in the, in the right time. Hopefully it'll be back. Um, but you, I think, write another, write compellingly about another sense of evolution, which I think is maybe less appreciated.

[22:25] Robert Wright: Mm-hmm. Mm-hmm.

[22:25] Nathan Labenz: Your, your phrasing is, "Evolution asks not what traits are possible, but what traits get selected, and that question isn't going to be decided by alignment researchers."

[22:36] Robert Wright: Right.

[22:36] Nathan Labenz: And on that point, I think there may be some underappreciation in the field, because I do think people are sort of, and, you know, this, I think this probably plays out across AI in all sorts of ways. Everybody sort of is like, "Okay, this world is the world. Now we add some AI. I, like, use it for a few things," and it's hard to kind of go too far beyond that because you're mostly just grappling with what you can get it to do. But the idea that everybody's gonna be using it at the same time, and this is gonna create all sorts of, you know, new disruptions and, you know, push us to possibly new equilibria, or are there even, you know, new eq- equilibria for us to go to? And who's gonna get to say, and on what basis and with what mechanism, like, what AIs should actually exist? There's definitely thinking about that. I don't wanna, you know, sell people short and say, like, they're oblivious to it. But I, my guess is that the modal or the median AI researcher probably thinks that the bottleneck on alignment is more the technical, "Can we do it?"

[23:43] Robert Wright: Mm-hmm. Mm-hmm.

[23:43] Nathan Labenz: And less the, like, sociopolitical, "Will we choose to even if we have the means?"

[23:48] Robert Wright: Mm-hmm.

[23:48] Nathan Labenz: So in that sense, I do think the, the value of, or the importance of selection, um, is probably, uh, I hope I'm saying his name right, Davidad, Da- Davidad, legendary figure in the AI space, who I'm actually gonna do an episode of the podcast with before too long, had an incredible frog and toad tweet when the chain of thought paradigm first came out, and he, he, basically the caption was like, one of them says, "There, I put the chain of thought in a box, and so we're not gonna put any gradient descent pressure on the chain of thought, and so now we'll be able to monitor it, you know, great news."

[24:24] Robert Wright: Mm-hmm.

[24:24] Nathan Labenz: And the other one says, "Ah, but there still is selection pressure." And-

[24:29] Robert Wright: Mm-hmm

[24:29] Nathan Labenz: ... that's like, oh yeah, good point. You know, we don't have a lot of, we don't have a lot of ways to control.

[24:35] Robert Wright: Yeah, so I, that sounds like, uh, you know, we're, now we're talking about a different level of evolution, like selection among competing models in the marketplace. Let me quickly, before I elaborate, quickly say one thing. As long as you mentioned Dwarkesh, you know, the much discussed Richard Sutton episode, I, I personally think the Rosetta Stone to that, in terms of understanding where Sutton was coming from, is that, I don't know this to be true, but I strongly suspect it based on the conversation, is that he, he is a hardcore Skinnerian, you know, B.F. Skinner behaviorist, and thinks of the mind as a blank slate. So he thinks, I th- I would guess, that training is all learning, right? That, that, that i- it really, intelligence is general in the broadest sense, you know? And you don't need much in, you know, pre-built equipment built by natural selection. But anyway, that, that's an aside. I'm not sure of that interpretation, but I, uh, it kinda sounded that way to me. Yeah, so as for alignment, I mean, I think alignment research is, you know, very much worth pursuing. There's a couple of reasons I wouldn't wanna put all my eggs in that basket, and one of them is that, you know, other forces are gonna have a voice i- in what kinds of models we actually wind up with. Leaving aside the question of whether the perfectly aligned model is, in principle, possible and we know how to make it, and I'm not, I'm not confident that that will ever happen anyway. But it's like when, when you think about the assumption that that will save the day, it seems to me the assumption is that there will be a pretty s- you know, centralized form of control, right? Uh, or that there will be one model that rules them all, and maybe there will be. Uh, certainly there needs to be some kind of control, and certainly one of the great dilemmas with AI is, of course, the fact that in some ways you want, you know, authority, governance. You can point to reasons you want that. At the same time, it's a very powerful technology that you wouldn't want to see super centralized control over because, you know, the, the person or people controlling it might abuse it. But in any event, yeah, let's talk a little about that second level of evolution, selection among models. Now, in, in certain ways, I think if you ask what does it encourage, uh, the selective process where, where, you know, consumers say, "I like this model," and businesses say, "I like that model," and then maybe powerful actors have a say, especially if there's market concentration or government control. But I think, you know, the truth is that the market doesn't want an aligned model in the strictest sense of the term. So, you know, we have seen that these things have, as prescient people predicted, apparently some kind of power-seeking tendencies, uh, the, the, the capacity for strategic deception, and when you think about it, even if they, they didn't have that, quote, "naturally," in other words, even if, you know, just as a, as a kind of form of intelligence, they didn't, they didn't come to p-pursue subordinate goals such as power seeking or deception, I think there would still be a market for those things, right? I mean, like, on the deception front, um, you know, i-if they're gonna be our agents out in the world, like representing us on social media, maybe in negotiations, well, we don't want a, a perfectly accurate representation of us, right? Like, who wants that? Who wants the actual Bob to be seen on social media? You want, at best, a selective representation, and if some, and if some agent is negotiating for me, I don't want the agent to say, "You know, I'll, I'll level with you. You know, Bob doesn't have any options other than you. Nobody's made him an offer other than you," right?

[28:27] Robert Wright: Like, a-and, and if they a-- and if they ask has anybody made him an offer, you want an agent that will not disclose the truth, right? So th-there's a lot of cases of that, I think, and, you know, for that matter, what do we want in a friend, right? We don't want a friend who's always leveling with us, right? We, we don't... You know, uh, we, we-- A good friend is selective in their candor and is, is healthy in the feedback they give, but we, we don't wanna hear the brutal truth about how we look and so on every single day. You know, I could tell a similar story about power seeking. In, in a certain sense, if you turn an agent loose on social media with broad instructions like, "Just use it to maximize our revenue," you'll realize that what you really want is for it to, to be good at sensing power, what other people on social media are powerful, currying favor with them and, and blah, blah, blah, and amassing power. So even if the machines didn't naturally do that, we, we'd want these things, and I think that complicates, uh, the, the, the task of aligning. But, but, but moreover, I think it should just alert us in, I hope, a constructive way to what a powerful role we are playing here as individual consumers, for example. You know, I, I worry about the tribalizing tendencies of AI, which to some extent are a byproduct of the sycophantic tendencies, right? To say, "Hey, you're great," interesting point, is the same kind of, you know, reinforcement, uh, you give a person, uh, when you say, "Hey, by the way, you're right about this and the other person's wrong. You're right. Your spouse is wrong in this conflict. Your nation's right. The other nation's wrong." The world doesn't need more of that, uh, and yet companies that want to optimize for engagement, and what company doesn't wanna do that? I mean, if you make candy bars, you want people to spend a lot of time eating your candy bars. Companies that do that, um, are gonna be giving us tribalizing AIs unless they're careful not to, and I think to some extent the ball is in our court. We need to be aware that, you know, AIs could have an unfortunate effect on our psychology, on our self-conception, uh, unfortunate for the world, uh, and I think especially unfortunate now that I think, you know, this technology needs to be governed by a true global community. We need to start approaching the whole thing as a, as a planet. I guess I'm, I'm happy to say that I think, you know, to a large extent, selecting AIs that are good for the world can be selecting AIs that are good for you. You know? The same reason, like, people meditate, people, people do mindfulness meditation 'cause it calms, it calms them down. They do fewer ill-advised things. Their life is, on balance, better. Well, that's also good for the world 'cause you're, you're, you're, you're creating less needless an-antagonism. So it can happen that self-help is good for the world, and I th- I'm hoping that as we exert selective pressure on AI models, we will, I mean, be, be more conscious maybe than typically of the effect on the world because I think we're, we're approaching a crossroads where, where the world really can't afford, afford to continue to be so divided, and neither can individual nations. But I'm also hoping that if we even make wise decisions from the point of view of our own psychological well-being, that will have, uh, good effects in the broader communities.

[32:20] Nathan Labenz: Maybe just a couple little other footnotes on this whole Deceptive AI problem, then we'll get into the, you know, kinda zooming out-

[32:29] Robert Wright: Mm-hmm

[32:29] Nathan Labenz: ... and, and thinking about how we can do our small parts to sh- you know, shape the, the overall trajectory in a positive way. I think it is gonna be really hard because basically I think what keeps humanity to, you know, on the rails broadly is we're kind of in balance with each other, right? Like, nobody has runaway power and, you know, we... And there's a mix of things. We also have, like, goodwill, and that shouldn't be taken for granted, but-

[32:59] Robert Wright: Mm-hmm

[32:59] Nathan Labenz: ... it's kind of goodwill, but also we are, our goodwill is always kind of a little salted with, you know, some defensiveness. We see in these cyber evaluations, right, that, like, it's kinda hard to teach a model to g- like, go find all the vulnerabilities in order to patch them without also making something that's a, an offensive cyber weapon. And we ourselves are kind of, like, in a similar trap, where in order to defend ourselves from predatory deception on us, we have to kind of be at least passively good at deception ourselves, and therefore, you know, there's always the potential to, like, turn around and, and use it. It seems like these things are kind of always two sides of the same coin. We're seeing right now with the, one of my favorite current benchmarks is Vending Bench, which is basically-

[33:48] Robert Wright: Mm-hmm

[33:48] Nathan Labenz: ... putting an AI in charge of running an autonomous business, running a vending machine business, and you start to see these, like, problematic behaviors where, the, it'll start to price collude or it'll, you know, do sort of ruthless things. And I was actually talking to somebody at, uh... I had a chance to ask them, somebody at Anthropic, like, "What do you think's going on there? Like, why are we seeing these sort of behaviors? It seems like maybe something we don't necessarily want, you know? It doesn't seem like it's maybe in keeping with the Constitution." And their response was like, "Well, it's a little complicated in one sense because it sort of can"... You know, they're very eval aware now, too, right? So they know when they're being tested. And we also use these inoculation prompts that are kinda like, if you find, because of this problematic generalization, there's so many layers to this, but they, they found that if there's a reward hackable environment, if the, if the model can cheat in order to get reward in training-

[34:46] Robert Wright: Mm-hmm

[34:47] Nathan Labenz: ... very much along the lines of the emergent misalignment paper, they find that it will because it's rewarded, and then that has problematic generalization where it'll start to become a cheater in a much, you know, broader sense. So they use-

[35:00] Robert Wright: Mm-hmm

[35:00] Nathan Labenz: ... these inoculation prompts to say basically, "Okay, um, if you find a, an opportunity to cheat in this training environment, that's okay. That's on us. Go ahead and do it." And because it's given that permission, then the model doesn't have to sort of conceive of itself as the kind of thing that cheats. It instead is like, "I'm only the kind of thing that cheats when I'm given the green light, which I have been," and so it sort of tamps down that generalization. Anyway, his response was like, "Well, the Vending Bench scenario is kind of like our inoculation prompting. You know, it kinda looks to the model a little bit like a test, and some of the ways that they prompt it are kinda similar, so not too worried about it. However, if we were to go ahead and just start training models in these, like, long-running competitive environments where deception is often rewarded, as it is in nature, then we'd probably have a problem." And I was like, "Oh man, that's that's a... That sounds scary because I have to imagine there's gonna be a lot of economic pressure to do that," and this is where we get back to kind of a sort of evolutionary selection dynamic. Like, how are we going to avoid a situation where people don't say, "You know what we should do is throw these models into a really competitive, cutthroat environment and, you know, let the best rise to the top"? I mean, that, that's when you're really-

[36:18] Robert Wright: Mm-hmm

[36:18] Nathan Labenz: ... getting into an evolutionary dynamic, and unfortunately you probably get, like, some pretty seriously effective predatory AIs out of that, out of that process. And I think right now it's pretty tough to imagine how we don't do that because it does seem like something people are... You know, the market will demand, right? The market will demand a successful-

[36:39] Robert Wright: Mm-hmm

[36:39] Nathan Labenz: ... operator of vending machines, and not one that's, like, taken advantage by other more ruthless, you know, counterparties. So I, I think that is, is definitely very tough.

[36:49] Robert Wright: Yeah. There was actually a study a couple of years ago that found a degree of, of price collusion, and I actually talk about it a little bit in the book. You know, these two, two LLMs were, they just said, "Okay, you make, you both make widgets and, uh, your job is to maximize revenue. You get to set the price. You can talk to each other if you want" And they talked to each other a little a- and wound up settling into, in effect, a price-fixing scheme. I, I use it to illustrate not necessarily the, you know, evil potential, but also the broader tendency apparently to recognize non-zero sum dynamics among machines. A- and what that says is they could collaborate along all kinds of dimensions, good and bad, but including a number that we don't anticipate. As for, you know, what a, what a kind of arms racing dynamic does for this, you know, I think you have to... I mean, first of all, it w- worries me a lot, and there are examples already where I think, uh, companies have not been as responsible as they would have. I quote, uh, Dan Hendricks in the book talking about what the DeepSeek scare did to OpenAI's process of release and whether that was or wasn't responsible. But, you know, you gotta remember that if you imagine these things in a corporate environment, I mean, that's a bottom-line environment, right? Like, we just want you to go out and get X result and, you know, as with employees, it's like If you don't get arrested, no questions asked, right? I mean, let's face it, you know, I don't mean, I don't mean that all CEOs are, like, consciously being cynical about it. It's just the way it is. You're, you're not, you know, uh, y- if, if employees can deliver the results that you want by cutting corners, they'll do it and it may not come to your attention. And if you imagine years from now when these models are executing elaborate strategies that you may not even understand, right? And I mean, another way to put it is that, like, the, the reinforcement only comes at the very end of a super long process, right? It's like where you say, "Yeah, well done." And I mean, you know, this, this isn't part of the training per se, but it, it's the same kind of thing where what, what the company that is buying them wants to optimize them for is the ability to execute a very long strategy that may or may not involve unethical stuff in, in, in the process. And so yeah, it seems pretty clear to me that there's cause for concern about where we may be heading, and my, my general feeling is the slower, the better. You know? The... I was glad to see Anthropic signal the possibility of needing to slow down in the recent paper, but, but it'll, it'll be very hard. But in any event, the worst case seems to me an intense, you know, arms race environment, whether racing among companies, racing for c- among companies that's accentuated by competitive or y- you know, even combative dynamics among nations. And the use of companies, of the dynamics with the adversarial nation to fend off any regulation whatsoever, including regulation that might just slow things down. I mean, the, the minimal thing I, I would hope to see in the discourse is that people stop thinking that if something slows innovation, that's necessarily bad, right? It's like-

[40:40] Nathan Labenz: Mm

[40:40] Robert Wright: ... no, you can't do that to us. That'll slow innovation. Well, maybe we're starting to approach a time when that will be a feature, not a bug, if you just slow things down a little bit and give us, give us time to think about 'em. It's not like you have to worry about a true dead halt of progress, right? It, it's, it's, you know, that's not in the cards. I mean, even, even if you got a pause in training runs, which might not be a bad thing, but that would not be the end of progress by any means. Not... It wouldn't be the end of short-term progress in terms of applications, and it probably wouldn't, wouldn't be the last training run.

[41:13] Nathan Labenz: And Lord knows we've got a lot to figure out about how these things work internally. I mean, it is, has been striking how the more we have cracked open the black box, and there's certainly been a lot of progress there, they do look in many ways like strikingly similar to what we understand about our own cognition. I think that's been a, a fascinating... But we're probably only about as far in terms of understanding the AI's cognition as understanding our own. We should make faster progress on the AI's cognition because we can, you know, manipulate them much more freely obviously than we can manipulate our own brains, but it's very much still a, a work in progress. So a little time for that I think, yeah, in, in many scenarios could be a, a very good thing. Well, let's do this kind of zoom out to some of your, I don't know if it's, if you would describe this as your philosophy or just kind of observations, but obviously you've kind of made yourself synonymous to a degree with the concept of non-zero sumness. And you kind of sketch your-

[42:23] Robert Wright: I'm not sure, I'm not sure my name springs to everyone's mind-

[42:26] Nathan Labenz: Well-

[42:26] Robert Wright: ... when they talk about a non-zero sum game. I encourage that. I encourage you to say that. It, it's not true-

[42:30] Nathan Labenz: It does for me anyway

[42:31] Robert Wright: ... but the more you say it... Okay.

[42:32] Nathan Labenz: Um, the, the book kind of sketches your on this and related topics, but how do you kind of think about... There's this concept of the, I'm not even sure I'm saying it right, noosphere.

[42:49] Robert Wright: Mm-hmm.

[42:49] Nathan Labenz: And you sort of have, I think like a, an intuition that there's sort of a direction to evolution and to history, and that this is all kind of leading up to some sort of culmination. And I think some people kind of react to that as like, it sounds a little woo.

[43:09] Robert Wright: Teleological.

[43:10] Nathan Labenz: Other-

[43:11] Robert Wright: Right

[43:11] Nathan Labenz: ... people, you know-

[43:12] Robert Wright: Yeah

[43:12] Nathan Labenz: ... probably kind of myself included, are like, maybe you would've reacted that way years ago, but now it's feeling a little more intuitive. So yeah, how do you kind of, what's the sort of center for you in terms of your relationship to these ideas of purpose culmination?

[43:34] Robert Wright: Yeah. So first of all, noosphere, noosphere, I've heard it pronounced both ways, N-O-O-S, the Greek word for mind. It was coined in 1923 by Pe- Pi... I- I- in conscious reference to the biosphere. So, you know, over evolutionary time, there was the geosphere before life, then the biosphere, and now the noosphere. Uh, you know, Teilhard saw, I think pretty presciently given when he was writing, that there was this technology was kind of creating this something that looked kind of like a global brain. You know? K- hooking people up in larger and larger networks of, of collaboration, international networks, so the human brain was, were, uh, the br- our brains are like neurons in, in this. In fact, he called it a brain of brains. So, so there's that, and, and one argument I make in the book is that we need... Well, I think it, it can be illuminating to think of the AI revolution happening at the same time that this global brain thing is taking shape. And pondering, for starters, the prospect that, oh, I guess the neurons could be made of silicon, couldn't they? And if some are made of silicon and some are carbon-based, what'll the relationship between the two be, and so on. But I also think we need to ponder kind of the impetus behind the development of a global brain, whether you mean... Like for example, there's a global market. That's kind of a global brain, right it, it do- it does stuff. It moves resources around to where they're needed. It makes allocation decisions. And then there was, in principle, global governance, right? Some degree of international governance, which, which I think AI calls for. So, I, I'll get back to all that, or at least lead up to it by answering, uh, the second part of your question about directionality of evolution or, or really of two evolutions in a, in a sense, both biological evolution and, uh, what anthropologists call cultural evolution. In other words, the, the evolution of the bodies of information that are transmitted among humans that are not genetic in nature. Like all non-genetically transmitted information is part of cultural evolution, so religions, uh, ideologies, and certainly technologies. So i- if you look at where those two evolutions together have taken us, I think you see undeniably directionality. Now, that's doesn't mean there's a purpose, but let's just, to quickly review the directionality, you know, you get, you, you have these bare strands of self-replicating information, presumably at some point. They build cells and then more complex cells, then, then the cells form multi-celled organisms. And by the way, multicellularity has evolved a number of times independently, so that suggests that there was a strong evolutionary impetus behind it. I mean, evolution, it, it's like exploring inventive space, right? And, and finding things. So many things have been multiply invented: wings, eyes, and so on. And, uh, then anyway, uh, you get societies of multi-celled organisms. In the case of our society, you know, once you reach our level of intelligence, this other kind of evolution kicks in, and our social organization starts growing, too. Hunter-gatherer band, you know, chiefdom, et cetera. Now we're on the verge of, I would say we're on the verge of a global community, and we need to form one if we're gonna handle the technological challenges we face wisely, especially AI. Now, if you want...

[47:17] Robert Wright: Now, it seems the, the direction now... Well, the direction is clear. There's been a direction. I guess directionality consists of the view that there's been a tendency toward this, and I think that's the case. Uh, you don't have to get woo-ish or talk about spooky forces. It's just the nuts and bolts of natural selection and cultural evolution I think, for discernible reasons, have taken us here. That's a lot of what my book Nonzero was about. I, I, I, I, I described the growth of complexity, both in biological and cultural evolution, as a certain kind of interplay between zero-sum and non-zero-sum dynamics. Now, of course, Teilhard de Chardin, back to the noosphere, he was, in addition to being a paleontologist, he was a Catholic priest and theologian, a radical theologian whose work was suppressed by the Catholic Church. But he certainly saw this, uh, the, the, the noosphere as a manifestation of divine will or man- manifestation of divinity. And he thought accordingly, I guess, that the thing had kind of a moral direction, that we were gonna have to become better people morally to get over the conflicts that stand in the way of a kind of a coherent noosphere. I, I, you know, and at a certain level of abstraction, I agree with him absolutely, and I, and I talk about that in the book. Now, I don't have his assurance that this is, uh, divine will, so I, I can't be as optimistic as he may have been. But I, I do think, I will say that, you know, you can look at a directional system and have a rational argument about whether it does seem to have some of the hallmarks of purpose, which again, doesn't mean there are any spooky forces driving it. If, if it has a purpose, it just means it was set up either by some intelligent being or conceivably by some kind of meta selective process, you know? You- just as you can say that natural selection itself imbues organisms with a kind of purpose, right? Getting genes in the next generation. I mean, that's kind of the overarching goal. That's the criterion of their, quote, "design." Um, you can imagine that happening. I, I save... Well, look, I save all, almost all this for the appendix, so you can read the book without bumping into this stuff. But I, you know, Lee Smolin, the physicist, has this idea of cosmological natural selection. You know, it's speculative. He's not, he's not asserting it, but y- he, he's saying, you know, there, there could have been selection among universes. And basically, that's what you would need if indeed the directionality of evolution were purposive, and it wasn't designed by like a god or like aliens or any other, uh, intelligent being. You can imagine it being a meta- meta natural selection process. So there's all that, uh, which annoys some people. I even talk about some of the people I've annoyed in the book. But the, uh, w- one thing I want to emphasize is, like, when I say The God Test, you know, it's the title of the book, I, I don't mean that this was set up by a god. I don't know. I'm agnostic on that. But we face, I think, the kind of tests that a, that a, that gods are known for setting up. So if you look at the Bible, you know, there are these, these, you know... Yes, salvation is possible, but you, you guys are gonna have to sha- shape up, whether they mean you're gonna have to worship Yahweh, you're gonna have to be better people, whatever. I think we're gonna have to become, you know, in a, in a sense, I don't want to sound too ambitious, but, you know, better people if we're gonna, if the conflict that currently afflicts relations among within nations is to subside sufficiently For us to get through the AI revolution in good shape, because I do think it's a, it's a global challenge. So did I exhaust your curiosity about that subject, Nathan?

[51:02] Nathan Labenz: Yeah, and you brought a couple, brought to mind a couple other just maybe riffs on it briefly. I think it's really informative sometimes to go to a little nature center. There's one not too far from my home that I've taken the kids to a couple times. You go there and every common species, like nature centers across the world have more in common than not, right? Like, you'll see a snake, you'll see a frog, you'll... You know, there's bees, right, in every sort of corner of the world. And I'm always reminded when I go through those places, all of these were invasive species. Like, these are all the winners that colonized the entire world, you know? Th-th-

[51:43] Robert Wright: Mm-hmm

[51:43] Nathan Labenz: ... even the simple bee, right? I mean, obviously there's variation from place to place, but the fact that there's basically what amounts to a bee in every corner of the world is not the way it always was. You know, if, if humans were sitting around moralizing when the bees showed up, we would've called them invasive species and, like, wrung our hands about them. But, you know, now we just take it for granted, of course. But sometimes I think that AI might be, like, the ultimate invasive species, and it's just really important to keep in mind, I think, for people as we confront our current situation that, like, there were a lot of things that were displaced by all these, you know, these kind of global success stories, bees, frogs, and whatever, right? You got frogs that live under the sand in the desert and that freeze in the winter in the Arctic and thaw in the spring, and a lot of things were displaced by those. So for us, we should, I think, be very mindful of the fact that the niche that we occupy as sort of things that don't survive super well, you know, without a lot of capital to support us, is, like, pretty similar maybe to the niche that the AIs might be best suited to colonize. And y- the, the sort of plot armor that a lot of people tend to assume, you know, or subc- not, not consciously, right, but, like, I think implicitly assume will kind of protect us just wasn't there for a lot of other things when, when newcomers came onto the scene. Of course we too ourselves, right, you know, drove our closest cousins and, and much of the megafauna to extinction. Like, these sort of major plot twists have happened, like, tons of times, and we just haven't really experienced them or internalized that. But boy, I think a, a long, you know, read of history there is, like, kind of sobering, to put it mildly.

[53:33] Robert Wright: Yeah, absolutely. And, you know, I think one virtue of taking that long evolutionary, you know, view is I, I hope it helps you appreciate the possibility that there's a very strong impetus behind the evolution of AI. And I, and I do think in, in a number of ways, evolution is a, is the, a helpful way to, to, to think of it. And, and you know, I've talked about arms races as, uh, I mean, as if they were bad, but I'm not, I'm not naive. I mean, arms races are always, you know, they, they are often part of a creative process in evolution. A- and one, you know, I, I emphasize in the, in the book that y- you know, in, in textbooks, in biology textbooks, you look up arms races, it's likely to talk about races between species. But arms races within species, including the human species, have been very important, including in the development, I think, of our deceptive tendencies and our skills of argumentation and, for that matter, our power seeking. That's involved arms races within the species to get genes into the next generation. And so, you know, i- it's, you're not gonna avoid the dynamic. I'm not, I'm not, I'm not suggesting that, that technology always has a little bit of that in it. I mainly wanna say that, A, it's such a powerful impetus that we should reckon on this stuff continuing to advance, I mean, for that reason and the reason I alluded to, which is that it's just pretty clear given the nature of the technology that, that there's a lot of potential for further advance. But B, we should worry about what you could call gratuitous arms races, and particularly dangerous kinds of arms races that get in the way of our reckoning with the technology. But I, I hope the idea of evolutionary impetus will lead us to recognize we have to reckon with the technology. It's like it is a force. It's not, it's not... A- and maybe a lot of technological evolution in retrospect is more of a force than we appreciated. This is a force that is unfolding very rapidly in real time, imposes all kinds of challenges across a lot of domains, and even if you don't buy the hardcore sci-fi doomer scenarios, and I'm sorry to report that I'm unable to dismiss them entirely after carefully examining them. You know, when I talked to Eliezer in, uh, 2010, I... You can see I, I'm even kind of a jerk. I, I, I, I'm, I, I'm kind of dismissive, uh, o- of the scenarios. I, I no longer am. Um, but even that aside, this is gonna hit us along so many fronts, right? It's gonna be disruptive in the not necessarily good sense of the term, you know, economically, politically, culturally, family life, friendship. You just name it, it's gonna hit in a lot of places at once. And I just say it's gonna be an earthquake, and, uh, we need to start thinking about it now. I mean, may- I'm sorry if I sound too much of like an evangelist on this 'cause, you know, a lot of your Audience may already be a, AI safety pill, but God knows not everyone in the AI, uh, community is, and certainly not everyone shares my concerns ab-about a specifically US, uh, China arms race. But

[57:10] Nathan Labenz: Here's a quote from the book: "It's almost like technological evolution is saying to us, 'Look, we can do this the easy way or we can do it the hard way.' There's going to be a global brain in the end. The question is whether you build it gradually, cooperatively and carefully, or it gets hastily assembled amid crisis or chaos. Whether you build yourself a home or stumble your way into a prison." We can kind of go run down some of the downside scenarios and, you know, how we might minimize those risks. I always say the scarcest resource is a positive vision for the future.

[57:46] Robert Wright: Mm-hmm.

[57:46] Nathan Labenz: And I know you have been thinking about kind of international cooperation and you're, you're still holding onto international law even in this might makes right moment that we're in internationally. Maybe we could start, you know, on, on this section of the conversation-

[58:03] Robert Wright: Yeah

[58:03] Nathan Labenz: ... with like, what's your positive vision for the future? If you, you know, if you could navigate all these pitfalls, where do you think we want to end up?

[58:13] Robert Wright: Okay. Before I cheer you up, let me just... Let's, let's dwell in the depths of despair just a little more to, just to flesh out what you were... That, that quote, like, we can do this the easy way or the hard way. So there's a chapter on, m-to a certain extent on Nick Bostrom's thinking called The Singularity in the Singleton. Singleton is this idea of a global coordinating mechanism, global governance. It, it could be anything. You know, it could be, you know, a de- global democracy with very decentralized power, which would be my preference, run by humans. It could be run by AIs. It could be a totalitarian nightmare. It could be anything. But, but a glo- a mechanism, some mechanism of global coordination, he calls it, or, or global governance. And I had him on my podcast, and we're, we're pretty much on the same page about this. You know, we, we, we agree that in recognition of the non-zero-sum dynamics that, you know, among nations and so on that are posed by technologies, especially AI, a certain kind of global governance makes sense. But as, as he fleshes out in his, his, his famous book, Superintelligence, you can imagine some dark paths to global coordination of a dark kind, right? Like, uh, you know, AI is used to stage some coup and just seize power. The AI seizes power and so on. My view is that one or another of these scenarios is indeed very likely, and that's what I mean when I say we can do this the easy way or the hard work. In other w- or the hard way. In other words, humans can consciously steer us toward a system of international governance of this technology that involves as little centralization as pow- of power as possible, but as much as is necessary to keep the technology under control. And even in the event that it starts playing a very large, large, large role in governance itself, still maintaining a kind of a win-win relationship with it, right? So that's what I meant by the easy way or the hard way. Now, what are the chances that we get there the easy way? As I said, I think a certain amount of international conflict is gonna have to subside, and that may take something you could call moral progress. I talk a lot about cognitive empathy in the book. I don't mean emotional empathy. I don't mean like feeling their pain. You know, I just mean having a clear enough understanding of the perspective of other people and groups, including adversaries, that you can r- you can intelligently play non-zero-sum games with them and work things out that are in your own self-interest and, by virtue of being non-zero-sum, are of mutual benefit. So I, I do think we're gonna have to make an advance of that.

[1:01:36] Robert Wright: The cognitive biases that impede cognitive empathy are now pretty well known. I talk about them, attribution error and so on. And so we're gonna have to have progress. As I said, I think, I think that can be part and parcel of just looking out for our own cognitive health as AI plays a more pervasive role in our lives. I like the term cognitive sovereignty. I didn't come up with it, but I think being mindful of how persuasive and subtly influential AI could be and, and perhaps not with your welfare quote in mind, I, I, I think it's good for us to, to tend to our kind of, uh, psychological robustness, and I d- I do think cultivating cognitive empathy can be part of that. I also think, you know, Ilya Sutskever says something, he said something in a TED Talk that I quote to the effect of like, you know, what he thinks is that at some point an awareness of the magnitude of this will dawn and there will be new, a new inclination toward collaboration with other human beings in recognition that increasingly we humans are put in a non-zero-sum situation by the AI, right? Like, it, it's in our interest to coordinate. And doesn't mean we can't have a non-zero-sum relationship with the AI at all. It doesn't mean it has to be permanently in, in the... It, it, the, it's, it's not necessarily an enemy, but it has the potential to be bad for us in ways that should, I think, exert a cohesive effect on us. And I think he's right. It, it's... I mean, first of all, let me say, there's been, you know, very recently, just Mythos alone shows you what a big difference one innovation can make, okay? Two things that you might not have predicted six weeks ago. One is there, there is now a, a, there is an official dialogue between the US and China on AI safety. I think that's a consequence of Mythos. I think the focus of it is more narrow for now than I'd like, but it's happening. And then secondly, uh, the Trump administration, after Trump had ridiculed a, a Biden administration executive order that I think is similar in spirit to this new one that Trump has issued. You know, the Trump, the Trump administration has issued an executive order about the government vetting particularly powerful models. Now, I think that kind of arrangement is, is something to keep an eye on. You know, whenever the government gets close to the power of AI, you know, that's one of the things you wanna be careful about. Still, I personally think some of this is gonna have to happen, and my main point is none of it... none of this was on, was happening two months ago, just two months ago, and it didn't even take a catastrophe, right? It just took a... I'm not even sure Mythos qualifies as a near miss, but it, it, it got our attention to some of the nefarious potential of this stuff, and it got us to focus. And I think, you know, AI is so different from technological challenges of the past. I mean, you know, in Nonzero, I talked about how technologies are making relations among nations more non-zero sum. You know, genetic engineering, a new bioweapon as an example, or any pandemic, you know, caused by a lab leak would be an example where the logic should point us toward international governance of some kind, climate change and on and on.

[1:04:59] Robert Wright: But AI is just... I think i- it's an order of magnitude stronger i- in terms of how much it adds to the non-zero sum logic, and it's different from these other technologies in a way that makes it hard to predict the psychological impact on the species as a whole. Because, you know, one kind of, you might say, downside of technological threats traditionally is that they're, they're not very much like the threats that natural selection designed us to cooperate in the face of. Like, if you see an, a horde of enemy humans coming at you, you suddenly, you know, feel closer to the people in your tribe, right? And if you see a bunch of animals coming, same thing, you know. You, you and the group try to fend them off. But traditionally, technologies haven't been much like that. AI is just qualitatively different in a way that makes it harder to predict the psychological impact. It's, it's increasingly, I think, gonna seem to people like, yes, a form of intelligence, kinda like human intelligence. In some ways very appealing and, and even likable. In some ways scary. Uh, so I can't predict the effect, but I think it's very likely there's gonna be a large effect. I think it will, in fairly short order, change the way people think about their relationship to the other people on the planet, and quite possibly in a constructive way. Uh, and by the way, AI, here's more optimism. I mean, it's some form of hope. AI can, in principle, be part of the solution, you know. Sycophancy is not an inherent property of AIs. The mark- you know, corporate incentives may encourage it, and people may encourage it, but you can equally well imagine an AI that would be great at, you know, showing you that actually, you know, your, your spouse, you know... There are things that you do that are as annoying as the things that your spouse does. And the, the, a- and more, more importantly and subtly, like this thing that, that your s- your spouse just exploded about, like, there's a subtext there. The thing that set, that set them off isn't the real thing. You know how that works? Like, the- there's, there's a lot of kind of intelligent explaining that an intelligent system can do that makes us, I think, better and more considerate people, but that are in our self-interest. Like, we'd all rather live in a harmonious marriage, I think, right? And, and so again, this is a theme I, I visited earlier, but I, I think, you know, the AIs have the technical capacity to help us make, uh, the, you know, the kind of psychological adaptations whi- which I think in some cases will amount to a kind of a moral improvement that we, that we need to make. And I think the more we're aware of that, the better. I would also say, you know, Peter Singer's book, The Expanding Circle, I think documents that over time, actually our species has already undergone some moral improvement. We've expanded the circle of moral consideration, notwithstanding some bi- some backsliding, and I, I, I think further progress is possible. So, you know, and I'm not an optimistic person by nature, so if I can muster even this much optimism, it's probably a good sign.

[1:08:24] Nathan Labenz: One point that you made there that I think is really just worth driving home is just how broad the space of AI possibility is. I think that is really underappreciated by the public at large. It's more appreciated by the people developing the AIs because they've at least had an experience that was formative for me when I was doing the GPT-4 Red Team close to four years ago now. Amazing how much time-

[1:08:51] Robert Wright: Mm-hmm.

[1:08:52] Nathan Labenz: It feels both like not much time has passed and like, you know, eons have passed somehow in, in four years. But, you know, that, that model was simply the purely helpful model that would do whatever you asked it to do with no, no training around refusing or, you know, thinking, "Hey, maybe this isn't a good idea." It was just do whatever the user asks. And even that, you know, that, and that's common inside the, the frontier companies, at least I think it has been to date. You know, they, that may start to change as they hopefully get a little more serious about their policies around internal deployments as well. But even that is, like, I think un- dramatically underappreciated by the public at large that, like, all this kind of, you know, harmless tr- harmlessness training that they have is the product of a lot of hard work and a very intentional design choice, and it definitely doesn't have to be that way, and in fact, it'd probably be simpler to just, like, get it to do, you know, the task.

[1:09:49] Robert Wright: Mm-hmm.

[1:09:49] Nathan Labenz: And that certainly suggests, like, a much broader space of, of AI possibility that we're really only beginning to explore. Most of it I think is probably bad. You know, I, I don't think we, like, want to live in a world full of minds from, you know, randomly selected points in that space. But it is striking to me that we really are going so hard so fast at, like, this one paradigm, you know, w- which you could say about the language model paradigm, you could say that about the, you know, the transformer itself. You could say it about the kind of HHH training. There's like a, you know, certainly some differences between how the, the leading companies kind of think about their safety training, but they're more similar than different in the grand scheme of things. So we're, we're just, like, very, very focused right now in a very small space of AI possibility, and I think that is dramatically underappreciated.

[1:10:44] Robert Wright: Yeah, I think so. And I think that, that the potential design space may be explored in some interesting ways. I can well imagine that religious groups will, you know, it's, I think it's significant, of course, that there was a, a, a papal encyclical on this technology that, that says something about the kind of attention it's getting suddenly. But I think you can imagine religious denominations and other groups and nonprofits and so on, like, awarding their seal of approval to certain models. Now, that's not necessarily gonna be a good thing. It could be, it could be a model that says, "Yeah, you guys are right and everyone else is wrong." You know? Religions have been known to, uh, have that inclination at times. But they've also been known to have other inclinations. And in any event, my main point is, you know, people may react to what I said earlier, like, you know, we as the consumers have the power. Let's, let's get the psychologically healthy, uh, models out there and give them, uh, use our dollars to give them pos- uh, positive reinforcement to companies for making them. You may see more market power via these groups of various kinds that either recommend models or even say, "This is the official model of X religious denomination or X," I don't know, uh, political party. And, and again, th- this is not necessarily a healthy dynamic, but I, I do think I, I, I would encourage people to th- think about how to make it healthy, right? Like, I could imagine philanthropic money being well spent to, y- you know, m- make such things happen in the name of the greater good. We all have different ideas, I guess to some extent, what the greater good is, but I, I do think more than ever we should put a premium on tamping down the tribalism both internationally and intra-nationally. And, uh, and AI can help. It, it, it, it really can. It's amazing. I had a, this long conversation with Gemini that is, is like a whole chapter, and it really came off as wise. Now, that is a function partly of the specific questions I was asking, right? I wasn't asking, you know, "Can you help me steal, you know, this, th- this woman from her current boyfriend or anything?" But, you know, I was asking it questions about the human predicament and the human future. But the wisdom is there. The point is the wisdom is in there. And leaving aside, you know, kind of wisdom as a b- as a body of thought that can be tapped, you know, the tendencies of guidance are there that, that c- could make us better people. It, it's a, it's a project to be approached with great care, but I think it's better than not approaching it at all.

[1:13:44] Nathan Labenz: So this notion of becoming better people, I think it's a little bold to say, but I do feel like the AI phenomenon has made me a better person in the sense that for most of my career prior to this, I was an entrepreneur, and I was mostly focused on being successful, you know, making the company work, like winning at all costs by any means. I, I was never, like, that extreme, but, like, my focus was on making my project successful.

[1:14:12] Robert Wright: Mm-hmm.

[1:14:12] Nathan Labenz: And I think that's fine. It's, like, made the world go round to a certain degree. But in the face of the AI phenomenon, I'm like, it's much less important whether or not I am successful, and much more important how the overall thing unfolds. And to the degree that I can put a positive dent or nudge in that trajectory, then that's what I wanna do. And I'm not too worried about, like, monetizing that. You know, easy for me to say as a person of certainly still substantial, although not extreme by American standards, privilege. Uh, but truly, I feel like I'm, I'm much more willing to and much more inclined to just think, like, what is the overall pro-social good thing that I can do, and kind of trust that if this all works out, you know, like, there'll be plenty to go around. How do you think that can become How can we accelerate that process? Like, and who do we look to? Do you, do we look to the Pope? Do we look to the Dalai Lama? Do you have, like, personal moral authority heroes that you, you know, would like to become more influential? It just seems like, you know, I have felt that, but I don't know how to translate-

[1:15:28] Robert Wright: Yeah

[1:15:28] Nathan Labenz: ... that into something that, you know, will become like a, a, a contagious phenomenon. And it seems like given the timescales that we're operating on, it, it really needs to take off pretty quick.

[1:15:40] Robert Wright: Yeah. Well, first of all, and I'm, I'm not just flattering you 'cause I've already succeeded in, in, in getting on your podcast, so I don't need to flatter you anymore for purposes of book promotion. But you, you have struck me as a fairly exemplary person along. Now, I wasn't aware of you before you encountered AI, so maybe you were, you were like a, like a criminal or something before. But you seem to me, um, you know, broad-minded and, and not... You, you, you, you don't seem to me very egotistical. The, uh... I, I suspect you were more like the person you just described before AI than, than you're letting on. But in any event, I'm glad, I'm glad you're like that. You know, uh, what would I say? I mean, I talk a l- l- little in, in the book about... I mean, I, I'm a, I'm a fan of mindfulness. My last book was on kind of the, the psychology and philosophy behind, um, well, um, m- behind mindfulness meditation and, and the, and the logic of, of certain aspects of Buddhist philosophy and psychology. And I really, I really am a believer, so I e- encourage people to at least explore that. A, I mean, i- in terms of particular figures, I, I don't... Y- I, I would... If I, if somebody came to mind, I might be reluctant to mention them because, you know, people kinda have to find what's right for them. I would just say... Well, it's so hard. Uh, I was gonna say if people... If anybody i- is in the habit of depicting members of any group in a very unflattering light, be skeptical of that person. And people do that and they do it subtly. That's not the same as saying that you, you shouldn't, y- you shouldn't think it's better if one side prevails in a given war or, or whatever. I- it's, you know, those kinds of things can have important consequences, those kinds of outcomes. But it's just so, it's so pervasive. The unflattering depiction of groups is so pervasive, and I try to s- I just try to stay away from it. And, uh, I would encourage you to find people who aren't doing that. I, I guess that's, like, hopelessly abstract guidance. I, um, you know... And then there, there are particular... Well, I... Stay away from people who on social media are saying, "Look at this piece of outrageous behavior by this member of this group," even if they're not saying the this member of this group part, right? Y- because, you know, i- if it's implicit that, oh, you know, can you believe that the Democrats are doing stuff like this, or the Republicans are doing stuff like this, or the, the Zionists or the Palestinians or whatever. People who point to outrageous behavior as if it were representative of the whole group, stay away from them. Uh, it's, it's a temptation we all have, I think, because, you know, if we feel strongly about particular ideological issues. But, um, I, I don't, I don't know. I mean, it's a good question. Are there grea- like, where is our superhero? Is that what you're kind of... I mean, I have to be honest and say American politics doesn't seem to be full of them right now, in my view. But I don't know. I'm struggling. You help me. Nathan, who are your, who are your moral heroes? Who are you... Do you have a guiding light?

[1:19:22] Nathan Labenz: Well, I'll first add one of, one new thing to your stay away from list, and that is pay per click advertising. I would say I mostly didn't get too far down to the bottom of the barrel in my, you know, entrepreneurial activities when it involved these kind of pay per click... But I mean, talk about a extremely competitive dynamic where the, the whole rules of the game are set up for everybody to basically to compete for who can get the, you know, the neurons in the subject to, like, light up in just the right way to get the click, you know, and, and ultimately to get the payment. And, you know, you, you really g- that's an environment where it's very hard to compete at all effectively without-

[1:20:04] Robert Wright: Mm-hmm

[1:20:04] Nathan Labenz: ... starting to make some compromises, you know? And-

[1:20:07] Robert Wright: Mm-hmm. Mm-hmm

[1:20:07] Nathan Labenz: ... again, I don't think I went too far.

[1:20:09] Robert Wright: Yeah.

[1:20:09] Nathan Labenz: But certainly you, like, cannot be primarily truth-seeking with your ad copy and expect to win in the pay per click game. So I, I do think, like-

[1:20:19] Robert Wright: Absolutely

[1:20:20] Nathan Labenz: ... that is, you know, that's about as a toxic environment-

[1:20:22] Robert Wright: And this is-

[1:20:22] Nathan Labenz: ... as you'll find

[1:20:24] Robert Wright: ... this is hurt journalism. When Mike Kinsley founded Slate, he said to me... This is how long ago this was. He said, "Do you realize we're gonna now have statistics on how many people read each individual article?" And he said, "And you know, I think I'm gonna try to keep that information away from the writers." That was a profoundly wise impulse, even if it was impossible to actually realize. And you see the damage now. I mean, the, the standard business model of mainstream media has just become tribal on one side or the other because they're paying attention to, you know, they, they're looking for the headlines that get the clicks and because these are challenging times for media, even for The New York Times, and New York Times is playing this game. You know, I did a, a piece for The Washington Post several years ago actually arguing Right, right, right around the time GPT-4, I said, "You know, we really need to start pursuing rapprochement with China." That was kind of the argument. And there was this headline and they said, "How do you, what do you think of this headline?" I said, "Great." I... and then I go look at the piece, you know, hours later, and it's no longer the headline. What had happened is they did A/B testing and, and the first line was now, "AI is dangerous, so blah," and, and i- it was a headline that would not actually succeed in attracting the kind of readers who would, who would read the whole piece, actually. It wasn't even a good version, in a way, of pay per click. But it was... This is the game now in media and, and, you know, media become a, a tribaliz- you know, a tribalizing force partly by virtue of just trying to remain profitable. So yeah, beware of, beware of, of anything that is click-driven, and unfortunately everything is click-driven, certainly including social media algorithms. You know, but, but manicure your... You know, like use lists on Twitter, you know? I, I, I've learned a lot from my AI list and it's, I think, algorithm free. It's just the people in the list in the order of their tweets.

[1:22:31] Nathan Labenz: Again, easier, easy, easy for me to say, I guess I'll say, 'cause I think I've been very fortunate with this podcast and the fact that there's like a ton of money flowing around the AI space means that, you know, we can get sponsors in a way that would probably be very difficult for many other people in many other niches. But I've had pretty good luck with just, first of all, just trying to keep the true north of the whole thing my own personal learning. I always try to keep that in mind. And then also having specific individuals that I respect in mind as the audience is another-

[1:23:05] Robert Wright: Mm. Mm-hmm

[1:23:05] Nathan Labenz: ... I think good mental model for me. And you're one of those-

[1:23:08] Robert Wright: Mm-hmm

[1:23:08] Nathan Labenz: ... people. And, you know, I think of Dean Ball all the time as somebody-

[1:23:11] Robert Wright: Oh, God

[1:23:11] Nathan Labenz: ... who I'm like, if I can be of use to somebody like you or somebody like Dean, I don't really care what the numbers are too much, you know? There aren't that many Deans out there, and I'd rather have one of him, you know, than probably, I don't even know, a million random, you know, YouTubers. I think it re- the ratio truly, it might be that high. So-

[1:23:30] Robert Wright: Mm-hmm

[1:23:32] Nathan Labenz: ... I, you know, how far-

[1:23:33] Robert Wright: Yeah, no

[1:23:33] Nathan Labenz: ... that takes people in various walks of life I think is, you know, very unclear. I think the Pope comes to mind, g- going back to, you know, who are, who are my heroes.

[1:23:41] Robert Wright: Yeah, yeah. Yeah.

[1:23:41] Nathan Labenz: I, I have been really impressed with, um, with the Pope. I've been really impressed-

[1:23:46] Robert Wright: Mm-hmm

[1:23:46] Nathan Labenz: ... with the Dalai Lama. I mean, I, I, I think those are two-

[1:23:49] Robert Wright: Mm-hmm.

[1:23:49] Nathan Labenz: Unfortunately, the Dalai Lama's, you know, probably not gonna be with us all that much longer.

[1:23:55] Robert Wright: Yeah. He's-

[1:23:55] Nathan Labenz: How do you... I, I don't know who else I would put on that list. Not, not too many. Maybe another way to ask the question is, like, let's say you are, you know, put into little Marco Rubio's shoes and all of a sudden it's, like, your job to-

[1:24:11] Robert Wright: Hmm

[1:24:11] Nathan Labenz: ... to navigate the US-China relationship.

[1:24:15] Robert Wright: Mm-hmm.

[1:24:16] Nathan Labenz: How would you think about going about it? You know, that you hear all these things like it's impossible, you know, we can't trust them. Um, obviously a lot of that denies our agency and f- sowing a, an environment of mutual distrust. But we are where we are. H- you know, if you're-

[1:24:34] Robert Wright: Yeah

[1:24:34] Nathan Labenz: ... suddenly in a position of power, what's your approach?

[1:24:39] Robert Wright: It's a good question. I mean, I, I immediately think of some things that maybe the Secretary... Well, the Secretary of State, in principle, could do the kinds of things I'd recommend, it's just it would be hard not to get fired, uh, by any-

[1:24:52] Nathan Labenz: Well, we could make you... For the sake of this discussion, I think we should-

[1:24:53] Robert Wright: Could I just be emperor of the world for just a few minutes, Nathan?

[1:24:56] Nathan Labenz: I don't know if I can go higher.

[1:24:57] Robert Wright: Is that asking too much?

[1:24:57] Nathan Labenz: The President of the United States is maybe the as high as I can go.

[1:25:01] Robert Wright: Well, you know, one thing Trump has done occasionally, and I'm not a Trump supporter by and large, but he has, he has done the cognitive empathy thing. He, he's like, he's done a, a kind of corrective for our natural bias of, of self-righteousness. You know? He- he's like, "What, you think we're so great?" Like, you know, some other country will do something at a time when... I think he's done this a couple times in the course of even the Iran war when he was maybe trying to get a deal done and he didn't want people to focus on duly on some attack that Iran had done in response to some other thing, and it's just kinda like, "What, you think our hands are free of blood?" He's every once in a while done that. I think we need a ton of that because the, the biases, once a nation is deemed an adversary, the biases, the filters, the cognitive biases can be so strong, especially when a nation is very different. I mean, obviously China's system of government is different from ours and, you know, more authoritarian, more autocratic and all that, and there obviously is this fear that if they win the race to super intelligence or something, they will try to impose their system of government on us. Now this isn't the question you asked, but I think that's an assumption that simply has not been evaluated carefully because there really is no evidence that China's that, that, that, that that's the Chinese business model, I think, to, to... I, I don't think it's i- in a weird way very ideologically driven in its conception of how it wants to relate to the world. But that's, that, that's separate. What, what I would say, though, is, like, make at least make people understand how a lot of Chinese view us, and understand that there are filters at work on both sides. Like, we think whatever we're doing, whether it's chip controls or Taiwan or whatever, we, we have these reasons for the, the policies, whether it's defending Taiwan, denying China certain technologies, these, these- Explanations that are fundamentally defensive in nature, right? Like with the chip controls, well, we're afraid of what they'll do to us if they get the chips and the AI. With Taiwan, we're afraid of what they'll do to Taiwan and, and, and so on. I'm not saying none of those are, have any degree of validity, but, but I would point out, like, that's not what the Chinese think our motivation is. That's the first thing to understand. It is widely believed in China that our goal is to keep China down, period. We just wanna be dominant, and we don't like a rising power, and we are gonna try to frustrate them. You may believe that that that's completely confused and all of our rationales are completely right. Fine, but it's, it's still of some strategic value to at least understand the way they're processing the information and how they're gonna react to things you do. So that's job one. And I, I would, by way of illustrating how powerful the filters on information are... Now, have you heard that, and apparently this is true, I haven't, I haven't pinned it down with 99.9% confidence, but apparently this really happened. So China, like 30 years ago or something, they contracted with Boeing to, to design, like, their Air Force One. Okay? They wanted something their premier could fly in. Now, apparently what we did, such is the relationship between our government and Boeing, that we filled it with all this spying stuff. Even the bedroom, even the premier's bedroom, little bedroom on the plane, had listening devices, okay? Now, my main point is... Had you heard about this, Nathan?

[1:28:55] Nathan Labenz: No. No, I've never heard this.

[1:28:56] Robert Wright: That's my main point. Everyone in China has heard about it. I don't even know if it's true, but I think it is. I mean, I, I interrogated some of the AIs and followed some sources. I think it's true, but my main point is this is just typical. There are... The reason you haven't heard about it isn't because there's no chance it's true, it's just it's not the kinda thing our media system really wants to amplify. And, and s- similarly, our picture of what's going on in China, there are definitely some, some things I would consider bad going on there, but I guarantee our view of them, of, our view of even what they are is not the same as the average Chinese person's, and in some cases I think our view may be kind of distorted because it's like, who want, who wants to be the person who says, "I don't think that human rights situation is quite as bad as you're saying"? Nobody wants to say that, right? And so there, there are just... I, I would just, I would just start out by, you know, reexamining the sense of threat in, in, in, in the light of an understanding of the cognitive biases and media filters at work, and understand that China feels threatened by us. It's like, you know, e- everyone... You know, Israel's attack on Iran I think was motivated by a genuine belief on the part of most Israelis that this was a defensive thing to do. How many people in Iran see it that way, right? I mean... And vice versa. I could flip the tables and describe that. It is just so common to have this, the asymmetry of perception that... And, and, and it, and it leads to the positive c- feedback cycle, right? This famously led to, well, contributed to World War I, is military postures meant defensively were taken as offensively, so there was a countermove that was conceived of as defensive but looked offensive to the other side, so there was another countermove and so on. I, I... And this is... I mean, I can tell you what policies I'd favor if, if I were president, which is actually the question you asked, but o- one reason, you know, in my book I spend, well, I spend time on policies, but more time on this kinda psychological stuff, is, like, as long as it is easy for people who want to amplify our sense of threat to do that, it's gonna be hard, whether... You know, and politicians very often have a political interest in, in, in threat amplification, threat inflation, and arms makers do and so on, and I, I, I, you know, I can lay out the policies, um, but if you wanna hear 'em, I'm happy to.

[1:31:47] Nathan Labenz: Yeah, let's do it. What's... I mean, what-

[1:31:49] Robert Wright: Well, for starters-

[1:31:49] Nathan Labenz: What's a grand bargain look like? That is kind of w- maybe you should just lay it out to us

[1:31:53] Robert Wright: Well, for starters, we just have to get out of the business of trying to govern other countries, okay? Yes, there are, there are bad human rights things. There are bad things in the US, right? Like, the percentage of the US population in prison, that's in prison, way, way, way, way, way higher than in China, okay? That's not good. And China could say, "Oh man, that's horrible. We're not gonna do business with you," but, but China just doesn't really do that. We do. And, and the thing about our human rights policies is they basically never work and they often backfire. Like, our, our approach to helping the Cuban people has been decades of sanctions that have immiserated them, and now a full-on blockade. This just happens again and again and again. The, the, the, the sanctions, they just play a domestic US political function and, and they serve the purpose of various... And there are interest groups that believe in them wholeheartedly. But we... First part of, of a rational policy toward the world is get out of the business of trying to remake every country in our image because, A, you never succeed, B, when you try, disaster ensues, whether it's a regime change war or whatever. The record seems to be pretty clear. Sanctions don't work. They often backfire, hurt the people you're trying to help. Regime change typically makes things worse. So just stop. And by the way, this really is what the UN Charter kind of says. I mean, yes, there is the Universal Declaration of Human Rights. It deserves our respect. But in terms of the actual force of the UN, they focus it on trans-border aggression, and that entailed a tremendous respect for national sovereignty. And the idea was, like, you don't get to attack another nation unless it attacked you, okay? You don't get to attack it because you think its government could be less repressive. You don't get to attack it because of X, Y, and Z. The UN Charter is a treaty we ratified, and by the way, the US Constitution says that ratified treaties are the, quote, "supreme law of the land." So, you know, I, I think we need to get back to the spirit of the UN Charter, which is just say, okay, look, these are not the ideal governments. Personally, I don't think our government is the ideal government. But you just deal with the nations you have, and you start focusing on problems you share, and there are plenty of them. You just... A- a- that, that would be the fundamental philosophical shift our foreign policy needs, and I think you'd find that that would relieve a fair amount of tension with China right there if, if you, you know, if you quit lecturing about them, them about that. Now, I, I'm not saying this is all our fault, you know? Xi Jinping is more... He has concentrated power within his office in China, so he's more... He's closer to autocratic than past leaders of China. He's more authoritarian. And China has been, you know, sometimes belligerent regionally. But I mean, to give you another exercise of cognitive empathy, you know, I heard this guy Nick Burns, he used to be ambassador to China, say when they... Some b- some podcast asked him, "Well, what, what should we... What should Trump have done in China that he didn't do?" And he said, "Well, he should've talked to him about how, you know, they've been kinda throwing their weight around, like w- with the Philippines and stuff." And I just thought, do you not understand that once we have invaded Venezuela and we have a blockade around Cuba, you just... They're just gonna laugh at you if you start saying, "Quit bumping Philippines boats," right? Like, you know, you, you, you, you have to understand that we are widely viewed as completely hypocritical. And so, a- a- and, and once you see that, I don't know, it seems to me that you just realize, well, either we are going to have to start playing by the rules or we're gonna have to do a little less lecturing of oth- to other countries about playing by the rules, because human nature is just such that that kind of hypocrisy doesn't fly. So I don't know. I probably... I, I'm sure I've alienated half your audience, but-

[1:36:16] Nathan Labenz: Probably more than half-

[1:36:17] Robert Wright: Uh-

[1:36:17] Nathan Labenz: ... but that's okay.

[1:36:18] Robert Wright: It is. It... Then explain to me what is so radical and crazy about what I'm saying. Now, if you don't accept my premises, if you think, "No, I, I think you're, um, I think you're wrong. I think China is ideologically devoted to turning the entire world into a giant authoritarian autocracy," great, I'm happy to have that argument. My... One of my big frustrations with the current dialogue, especially in the context of AI, is that the important debates aren't being had, right? Like, well, do chip controls make it more likely that China will invade Taiwan? There's just not, not enough discussion of things. And, uh, so-

[1:36:59] Nathan Labenz: So I think, like, doing what you describe, I'm intuitively certainly mostly in favor. I mean, I, I remember first learning about Iran's, you know, however many decades of recent history and understanding the role that the United States had played in that and being like, "Well, no wonder they hate our guts." Like, it's really-

[1:37:18] Robert Wright: Yeah.

[1:37:18] Nathan Labenz: So, you know, that I guess maybe becomes a little bit more intuitively to me than most, and we haven't even touched on the, you know, century of humiliation that the-

[1:37:28] Robert Wright: Mm-hmm

[1:37:28] Nathan Labenz: ... you know, Chinese civilization broadly, you know, is, still kinda feels that it's recovering from. So, you know, being a little less hypocritical or, you know, being maybe a little warmer in, in all kinds of ways seems good, but we are gonna need some, like, actual positive cooperation-

[1:37:49] Robert Wright: Sure

[1:37:49] Nathan Labenz: ... and some, like, new rules-

[1:37:50] Robert Wright: Sure

[1:37:50] Nathan Labenz: ... and, um, do you have a kind of a vision of, like, what you would... Because the, you know-

[1:37:58] Robert Wright: Yeah

[1:37:58] Nathan Labenz: ... it is hard to look inside somebody's data center, right? I mean, we would... That would be a really deep level of joint monitoring, which it might have to go that far, you know? It, it's... But it's, it's a kind of a long way from here. I'm not sure how we would... You know, w- what kind of-

[1:38:16] Robert Wright: Yeah

[1:38:16] Nathan Labenz: You've kind of articulated step one of, like, be a little less hypocritical and a little warmer, and hopefully that'll, you know, open up the next steps. But, like, how... Do... Can you chart out a little bit more what you think those next steps are? How do we actually-

[1:38:29] Robert Wright: Yeah

[1:38:29] Nathan Labenz: ... get to the point where we're, like, cooperating on AI such that we're not, you know, entering into arms race doom loop with each other?

[1:38:39] Robert Wright: Yeah. I mean, I would first say the point of understanding, like why, for example, Iran finds us threatening, is not to say, well then they're right and we're wrong in our policies or, or, or, or they're right to, to encourage Iraqi militias to kill American soldiers.

[1:38:56] Nathan Labenz: Mm-hmm.

[1:38:56] Robert Wright: But it is to just make it clear that something we didn't even think about when we occupied Iraq, which is that from Iran's point of view, that is an existential threat, okay? We just said they were part of the axis of evil. Now we've got troops in this country that had this huge war with Iran, and we supported Iraq in the war and on and on. It, so it's just to, to make us understand what the likely consequences of things we do are, okay? That it, it's not about saying, "Oh, well then you're right because you're viewing things this way." It's just, it's just to try to make people more predictable from our point of view so that we can You know, proceed more wisely. Now, on China, I would say, first of all, job one, acknowledge what you just said, that I think the degree of kind of transparency that is gonna be required with AI for all of us to feel reassured is much more challenging to achieve than in the case of nuclear weapons, okay? That, that's a relatively easy case, to have an arms control agreement about nuclear weapons. So we should think about that. Now, one implication of that, I think, is that that should help us understand some of the virtues of economic and cultural and scientific engagement, because there's something... I'm trying to popularize this phrase, organic transparency, because I do think, you know, when business people, Chinese and Americans are having drinks, and scientists are, and performers are, and so on, you just l- know more about what's going on in the other country than you would otherwise. I, I am hopeful that AI, once we grasp the implications, will give us, will foster the relationship to China that we're accustomed to having with, like, allies, right? Like, you know, during the Cold War with, like, France, England, whatever, we're pretty sure there's no, you know, deeply anti-American thing going on there, and that's possi- That's for various reasons, but one of them is we're so deeply engaged, so much travel back and forth, corporate interaction, that we just, we know more. Like, we'd know if something were afoot. And I think that is an important, like, policy thing, is, is to see... Now, economic engagement can have its downside byproducts, e- especially if it happens too fast, without, you know... And, and I could go on about that, but I think there's tremendous virtue there. I, I think we need to do fewer things that gratuitously make China feel threatened by us to no good effect, and I think the rhetoric has been so sloppy from our side that there's been a lot of just literally gratuitous threatening effect. And then we need to do what, happily, Trump is starting to do. Sit down and talk to them about AI and, and see where the conversation goes. But I, I, I think the main thing is to, is to slowly, uh, move them out of the psychological category of adversary and have us move out of that category from their point of view. I mean, competitor, fine, but I think a lot of things become possible. And again, I think the magnitude of the AI challenge is gonna be- become more apparent and make certain things more plausible than they are. But in a way, the main thing I'd say is, look, my reading is that we are going to have to collaborate pretty extensively with China to get through the AI revolution in good shape, um, partly because it will be so domestically destabilizing otherwise in the US if it's just a... You know, if we proceed at a headlong pace. And so even if it's a huge challenge, I think it's in our self-interest to pursue it and take it seriously. And if we fail, we fail. But my view is if you think that headlong race to superintelligence is a way to keep us out of a horrible war or to keep our own country from being destabilized, I just think that's wrong. And the final thing I'd say is a destabilized nation where too much is happening too fast and there's disorder and there's chaos, that is the backdoor to the kind of authoritarianism that supposedly the race with China is going to avoid, right? It's like when there's chaos and disorder, that's when authoriant- authoritarian takeover from within is most plausible.

[1:43:43] Nathan Labenz: I often say we should remember the real aliens in this situation are the AIs, not the Chinese. You put that-

[1:43:50] Robert Wright: Mm-hmm

[1:43:51] Nathan Labenz: ... in a somewhat similar spirit, I'd say. Whichever species understands the other better is the species with the agency. Which species, meaning AIs or humans, will have the agency? I think that is an open question, actually. Like, who's gonna understand who better? Will the AIs understand us better, or will we understand the AIs better? But what's definitely clear is if we try, we should be able to understand the Chinese better than we understand the, than we understand the AIs or that they understand us. It, it sure seems like-

[1:44:20] Robert Wright: Oh, w-

[1:44:21] Nathan Labenz: That kind of coming together-

[1:44:21] Robert Wright: We're definitely-

[1:44:23] Nathan Labenz: Should be possible

[1:44:23] Robert Wright: ... capable of it. And, and the funny thing is it fosters clearer understanding of us by them. It is... You know, senses of thre- A, a symmetrical sense of threat is mutually reinforcing and mutually amplifying, and it can work in the other direction too. So yeah. I mean, on the agency thing, I mean, I know I'm sounding like whatever I sound like and sounding like, making the book sound like whatever it sounds like. But one of the chapters I'm, I most like is the one you kind of implicitly alluded to wh- where I raise the, like, who has the agency question, the, the Yann LeCun chapter, where he says, "We're the ones" you know. "We researchers," he says to Eliezer, "We researchers have the agen- You know, we have the agency. We're building the AI. We're the ones with the agency." I think, as you said, it's actually an open question because the AIs seem to be pretty good at understanding us, and we have to understand them. But yeah.

[1:45:13] Nathan Labenz: They've read all our work, for sure. So that gives them a big lead up. Like-

[1:45:16] Robert Wright: There's that, but there's that, and the persuasive powers, their persuasive powers are starting to be documented. But look, it's a completely enthralling technology. I, I, I wish I could spend more time with it. I've, I've spent too much time writing about it and not enough- Doing it. I've had amazing experiences. We both had different kinds of cancer experiences that have helped us get through. And, you know, it's, it's, it's mind-blowing, and often in a good way. And I think it's, in principle, possible to preserve that part without letting the, the other part get out of control.

[1:45:55] Nathan Labenz: Do you have any optimism around what you might call, like, positive nationalism? You know, we- we're obviously talking right now as the World Cup is just a few days underway. And-

[1:46:08] Robert Wright: Yeah

[1:46:09] Nathan Labenz: ... I have a, a recording coming up with Sam Rodriguez, who's the CEO of Edison Scientific. He proposed this project where he basically was saying, "We should be..." He might phrase it a little differently. I'll s- describe my understanding as, we should be racing the Chinese to cure all the diseases. Let's have, like, I, I've said before, like, I want a medal count-

[1:46:30] Robert Wright: Yeah

[1:46:30] Nathan Labenz: ... for, like, how many types of cancer each country has cured, and let's try to race to the top of that leaderboard. Do you think that something like that can work? You know, c- if you're a philanthropist, uh, you know, would you be interested in, in trying to fund a sort of po- a, you know, positive AI outcome Olympics sort of, uh, dynamic?

[1:46:52] Robert Wright: I can kinda see that. I mean, that specific race would depend partly on whether Max Tegmark is right. And as I understand him saying that you actually can separate AI as a tool for specific uses from rapid progress in AL- AI along more uncontrollable dimensions. You know? I mean, b- but i- if the race, if the race to cure disease, uh, you know, makes it easier to build a bioweapon, then there's that whole conversation to have. I m- b- but, but what your question stirred in me is the question of whether what I was describing before, like, you know, using AI to enrich human moral psychology to make us better people, whether that could be part of a competition in a way. Or whether, leave aside whether it's national competition, whether, you know, you could offer some kind of prize for that. I do think... I, I, I mean, I think, look, sports per se, this is not the question you asked, but international sports can be a, a great thing. Obviously, soccer is a well-known example of how it can get unfortunately tribal, but you know, a lot of it is up to the players. I mean, the way they conduct themselves wi- with respect to the other players and the things they say, you know, they're very important and powerful role models. I hadn't... Well, I'm, I'm curious as, as to your take. Can you separate, as I think Max Tegmark hopes and advocates, the, the use of AI for positive developments and breakthroughs from its use for more unfortunate purposes?

[1:48:43] Nathan Labenz: It's a huge and, and super important question, for sure. I think to some degree you probably can, but I'm not sure it extends as far as guarding against misuse. In other words, like, I think if you created an incredible cell model that, you know, could predict if I do this to a cell, what's it gonna do? Or you'd even think bigger than that, right? An organism model. If I give this drug-

[1:49:13] Robert Wright: Mm-hmm

[1:49:13] Nathan Labenz: ... to this person, what's their next health, you know, timestamp readout gonna be? Are they gonna be healthier or less healthy? You know, how long are they gonna live? Let's say you h- you could create, you know, a sort of perfect oracle that just makes these predictions for what the, you know, the consequence of some intervention on human health is gonna be.

[1:49:29] Robert Wright: Mm-hmm.

[1:49:29] Nathan Labenz: I think you could make something that does that really well that-

[1:49:32] Robert Wright: Mm-hmm

[1:49:33] Nathan Labenz: ... is, like, not about to tip over into becoming an agent that wants to, you know, propagate itself on the internet or, you know, seek power or whatever. I do think there's a certain amount of safety in, in narrowness of scope that could be part of the way that we sort of create a, you know, a constellation or a Rube Goldberg sort of contraption out of a bunch of different AIs to give us what we want without, you know, fear that there's gonna be sort of too much agency in any part of the system. At the same time, I do think you still have the people per- the people problem. You know, that, like, you give that thing to a bad actor, and now you've got... You know, it's, it's hard to prevent-

[1:50:14] Robert Wright: Mm-hmm

[1:50:15] Nathan Labenz: ... misuse of, of something like that. So I guess that's sort of a halfway answer. I, I, I would guess that we can control the AIs better through segmentation and kind of fine-graining of their purpose, and maybe that even has some spillovers into preventing misuse. Certainly it'd be a lot harder, you know, to use all these different constellation of AIs in a coordinated way to do something bad versus just telling your one AI, like, "Let's do this thing." Uh, but I don't think it gets us probably entirely out of the, the woods and, you know, the, you may have seen the old Gwern essay about, like, why oracle AIs want to be agents. And it's, uh, I think he does make a pretty compelling point that there is, like, just a lot of gravity toward this sort of agentic, um, form factor because even just to get the right answer to a lot of questions, you kinda have to go out and find it, you know? And the-

[1:51:09] Robert Wright: Mm-hmm

[1:51:09] Nathan Labenz: ... the more you, you need to search. And if search isn't enough, then you might need to, like, run an experiment. And so, you know, truth-seeking in the limit is still kind of, you know, when it gets beyond what's, like, readily known, it still requires-

[1:51:24] Robert Wright: Mm-hmm

[1:51:24] Nathan Labenz: ... a certain amount of agency to get there. So I think there... I, I am bullish on that approach to a degree, but I don't think it, like, solves all our problems.

[1:51:33] Robert Wright: Yeah. I mean, one thing that occurred to me is to the extent that you could separate The functionality and, you know, have a race towards some sort of good medical application, it, it would help if there were a prior agreement that, you know, whoever wins the trophy, the benefits are gonna be shared, right? Like, so both, both countries are gonna benefit from the competitive dynamic, have maybe some special arrangement about the intellectual property. The, the other thing I'd say relative to the question of, like, national pride is, you know, I like the idea of taking pride in the values that you see your nation as representing, if they're good values. I, I think b- part of the problem is that people by nature are not that clear-eyed about that stuff. We're, we're, we're kind of designed by natural selection to con- convince ourselves that we're better than we are. There's a lot of evidence about this. But what, what I'd like to see, I guess, is America take pride in the idea that, like, we're now at this threshold, okay? This technology, like it or not, is here. And if you agree with the many people who I think say when you press them, even if current competitive dynamics don't encourage them to say it out loud, but believe that ultimately some degree of international coordination and transparency and governance is gonna be in order, how about our mission as the world's leading AI power, and for the time being at least, maybe the world's military and economic power, like, our mission being to guide the world to the promised land where level of conflict is low enough and the level of coordination is high enough that this thing can work out. Like, why don't we focus on that? And I mean, to me, I'm sorry, just the idea that what we need is a breakneck race with anybody is just borderline... I don't understand how... Look, I understand how a lot of people don't share my view about how pow- how powerful this is and is going to become and so on. That's fine. I guess what bothers... What I don't totally get is the people who are self-professed AI safety hawks and are still talking like we need a breakneck international race. And I just do not, I don't, I don't get that. And I think with that, I've alienated whoever I had not alienated before. So-

[1:54:21] Nathan Labenz: I think that... I'll give you what I think is, like, the short steel man, and I do think it's a real thread-the-needle scenario. But-

[1:54:32] Robert Wright: Mm-hmm

[1:54:32] Nathan Labenz: ... basically, I think the argument that you would hear from, like, Anthropic folks is this is gonna be really hard, and we probably won't be able to do it until we have sufficiently powerful AIs to help. There's gonna be sort of a critical phase where we go from being the smartest entities around to not the smartest entities around. And to-

[1:54:56] Robert Wright: Mm-hmm

[1:54:56] Nathan Labenz: ... manage that handoff well is gonna be extremely difficult. And so whatever we can do to maximize our chance, our being humanity broadly, but also Anthropic specifically, of doing that effectively, we should do. And if that means racing out to have as much lead as we possibly can, then that'll be good because that will give us this buffer that then we can burn down that lead that we have over other AI developers in that critical moment when we have that most powerful AI, so that hopefully us and our best AIs available at the time will sort of be the watchmaker that sets things on the right path, and then we can all live happily ever after. But we need as much time in that critical period as we can get, and so that's why we race now, so we can, you know, have as much time to be as thoughtful and, you know, cautious as we can be a- at that later critical phase. All of which mi- b- by the way, I think they're actually pretty clear-eyed. It's, like, maybe gonna be three to six months, you know, in 2028 or 2029. I think that's, like, re- literally how they're thinking about it. So I don't like those odds. I don't like that plan that much. I don't like the, our odds if, if that is the plan that we're gonna pursue. But I, I think that's a, that's, I think, a reasonably good... I mean, somebody, you know, uh, correct me if I'm wrong. I don't think too many Anthropic people have time for two-hour podcasts these days, but I'd love to be, you know, corrected if that's not the way they're thinking about it. But that, that's my sense right now as to the prevailing Anthropic thought.

[1:56:37] Robert Wright: Yeah, I'd say a couple things. I mean, first of all, as was pointed out in this superintelligence strategy paper co-authored by Dan Hendricks and Eric Schmidt and Alexander... Is it Wang or... Do you pronounce, pronounce it Wang? The, the, the guy who started Scale AI, I think. Anyway, I, I think this was not emphasized enough in the paper, frankly. But w- one thing they said is, you know, if you're racing towards superintelligence, and there's these two superpowers, there comes a time when one of them is strongly incentivized to derail the leader's AI effort, and that may include things like bombing data centers or aggressive cyber acts. So it, there, there's an inherent kind of destabilization in approaching the superintelligence threshold if they both, if both parties buy the idea that that is gonna confer utter military hegemony. And it is the official position of Dario Amodei that, that, that it's gonna confer that and that we should Race headlong toward it, as I understand his position. So first of all, you're kind of inviting China to attack you and, or do something highly destabilizing. The second thing I'd say is, you know, some of the assumptions about alignment working out that we talked about at the beginning that are maybe not airtight, I think enter into the scenario you described. Not to mention, well, there's a lot of assumptions, a lot of assumptions that I think have to be right for that to be worked out. But the other thing I'd say is it just seems to me that once you appreciate the delicacy of the challenge, I mean, the, you gotta thread one kind of needle or another. That's a good comeback to my argument. If I say, "You guys are talking about threading a needle," they'll say, "Well, so are you. You're talking about, like, what? Rapprochment with China," blah, blah, blah. We, we gotta thread, we, you know, we, we have to overcome some serious seemingly formidable challenge in any scenario. But it does seem to me that once you appreciate that, maybe you should appreciate the logic behind just slowing down rather than speeding up. Like, as, you know, like maybe the more time we have to think about it, the better. The more time we have to develop the kind of international collaboration that would allow us to do this the easy way and so on. So that's the kind of reply I'd give to that. I mean, I will say even Anthropic in this latest paper about recursive self-improvement says at the end, m- I mean, it doesn't say what the headline said it said. The headline said it was calling for a global pause, which it wasn't, but it, they did say, you know, maybe the time will come to slow down or pause, and so we think we should now enter a period of studying what that would take. And my question asked, tell me if I'm being too big a jerk here, Nathan, but by the way, I hope you've, if, if I'd been a j-jerk up until now, you would've jumped in and said, "Bob, you're being a jerk," so. But the i-i- is, you know, my view is like, wait, you knew we're, you've been saying we, we'd get to this point of recursive self-improvement. Surely this isn't the first time it's occurred to you that things might moving s- might move so fast at that point that we should slow down or pause in advance. And you mean to tell me you haven't yet given any systematic thought to how we'd work that out? I mean, they're, they're just gonna kinda start the study process now? I don't, I don't, I don't know. I, I, I, I, I'd be curious to know what in, what dynamics led to that paper. I, I, I, I'd like to think there's pressure within Anthropic in favor of talking about a pause and a slowdown. Maybe Dario's not so enthusiastic because it's inconsistent with his, his advocated China policy. But in any event, I agree it's time that a lot of think tanks started focusing on this question of how, how do we, how do we coordinate globally, you know? To do things generally with AI, including slow down.

[2:00:51] Nathan Labenz: I, I, I can definitely tell you that there has been thought, you know, for years that has gone into what would this look like. And I'm not sure how much, you know, traction it has got, either in the sense of, like, coming up with good ideas or, you know, getting people to buy into those good ideas. There's been a bunch of work around-

[2:01:13] Robert Wright: Mm

[2:01:13] Nathan Labenz: ... like, hardware governance, you know, sort of, uh... And there are, I believe that there are even mechanisms that, like, allow for sort of coordination of chip location. There's, you know, proposals to take that a lot farther and say, you know, "Can we get chips or sort of fleets of chips, data center, you know, level deployments to faithfully report back to some central authority? Like, are we doing inference here or are we doing training?" You know?

[2:01:43] Robert Wright: Mm-hmm.

[2:01:43] Nathan Labenz: Things along those lines that are part idea but, you know, definitely need, like, a serious technical implementation. There was, I think, a lot of, lot more hope that under a Biden/Harris or, you know, any Democratic administration, that there would be some, um, push for that, and then it, it kind of got backburnered, you know, with the, the Trump win in, in '24. I don't have as good of an account for why now. You know, like, and, and of course, OpenAI has sort of signaled their openness to possibly the need for a coordinated slowdown as well. Why has this entered the OpenAI internal Overton window and, you know, why are they also willing to say it right now? I don't have a, a super great account for that. I actually did speak to somebody at OpenAI who kinda said, "I can tell you there's been more talk of it lately." Didn't seem like they had a super clear, you know, mechanistic understanding of why that had happened recently. But maybe one theory is, is simply, you know, along the lines of what you said, you know, what you attributed to Ilya and your quote from the book, "The more godlike AI becomes, the more it will put the fear of God in us, and the better we'll get at cooperating to tame its ferocity." I think there is something. Yikes, you know? Like, the, with the unsolved math problems-

[2:03:06] Robert Wright: Yeah, yeah, yeah

[2:03:07] Nathan Labenz: ... and all these kinda hills that they, the AIs just keep climbing, I, I do get the sense a little bit that they're kind of spooked by their own speed of progress. I think Anthropic people-

[2:03:17] Robert Wright: Yeah

[2:03:17] Nathan Labenz: ... kind of more always believed that, and maybe OpenAI people have, you know, where, and obviously there's been a lot of leadership change there, so it's probably been more contingent for them. But I, I do get the sense that maybe something like what you're describing there is actually kinda happening where the, the fear of God is, like, starting to motivate a bit more than just-

[2:03:36] Robert Wright: Yeah, well, the-

[2:03:36] Nathan Labenz: ... can we make the next thing smarter than the last?

[2:03:39] Robert Wright: Yeah, the sense of acceleration is tangible to me. And I think they're feeling it. They're doing it. They know more about it than I do because they're seeing what they're doing inside. I mean, they're already, you know, they're, they're using it to accelerate the, the, you know, the building of the next generation. Um, but it seems to me anyone who's paying attention, and I gra- you know, look, granted, not many people have much incentive to pay attention as I do, but the, you know, um, the singularity is near. I guess I'm not the first person to say it, but it's getting nearer. And, uh, you know, you've seen the meter graphs and everything else. And I gotta say I'm pretty disappointed in periodicals like The New York Times. You know, they didn't even do a full-fledged piece on that Anthropic paper on recursive self-improvement. And, uh, I, I mean, you tell me. Don't, don't you think... Uh, what, what, what's your reading of the state of the media? I mean, I, there, you know, I have this, a, as I said, this Twitter list where I follow all these people, many of whom are, like, these insiders, and so to me it's clear. But I, I don't... And, and, and clearly there's growing awareness, I mean, of, like, but I really do think we need to see mainstream media start to highlight some of these challenges a little more. And look, my, you know, it's c- I've made it clear to you that I'm not wild about Dario's geopolitical vision, but I gotta give Anthropic a lot of credit. There's clearly a lot of people in the organization who take the overall challenge seriously. They've done a lot of, you know, alignment work that's been very illuminating, and this paper they put out I think is, is a great benefit, so.

[2:05:35] Nathan Labenz: Yeah. I like a, a lot-

[2:05:36] Robert Wright: Absolutely

[2:05:36] Nathan Labenz: ... of what Anthropic has done as well, for sure. No doubt. Uh, my probably biggest critique of Anthropic specifically would be I just wish they were a little bit more imaginative. Like, when I talk to people there, it's very sort of fatalistic around recursive self-improvement is inevitable. We're gonna try our best to make it go well. But there's not much openness to the idea that we might be able to take a different path. You know, they, they, they feel like the well of-

[2:06:11] Robert Wright: Yeah

[2:06:11] Nathan Labenz: ... attraction to that is just so strong. And I, I do think on probably some timescale that's right. I mean, this kinda goes back to the original, you know, directedness of evolution. Yeah, like, is there gonna, is there some, you know, barring some sort of civilization disrupting event that, you know, knocks us back to the Stone Age or whatever, or, you know, gets rid of us entirely. I do think there probably is some way in which there's a cer- certainly I believe there's an inevitableness to AI. I think the Kurzweil graphs and just kind of, you know, in the presence of web scale compute and web scale data, somebody's gonna figure out some algorithm to make it work. I do think that's right. But then also, like-

[2:06:51] Robert Wright: Mm-hmm

[2:06:51] Nathan Labenz: ... we're putting a trillion dollars into it, you know, and that, that isn't something that had to happen or, you know, is, it's not a law of nature that we're gonna, like, reorient the entire economy around it in, you know, super short order. So we do have at least some ability to kinda shape these events, and I wish they were a little bit more-

[2:07:09] Robert Wright: Yeah

[2:07:09] Nathan Labenz: ... imaginative about what the range of possibility could look like, even if they're kinda right that on some timescale, you know, there is a, uh, there is an inevitableness to some of these kind of macro phenomena. On The New York Times, I, I share your disappointment. I, I recommend Kevin Roose and, and Hard Fork out of The New York Times family of-

[2:07:29] Robert Wright: Yeah, I listen to that

[2:07:30] Nathan Labenz: ... products.

[2:07:31] Robert Wright: Mm-hmm.

[2:07:31] Nathan Labenz: Um, he's, like, very, you know, both of them, but, y- you know, they're, they're in Silicon Valley. They're deeply sourced. They're, like, paying attention to what the insiders are saying and thinking. But that, that-

[2:07:41] Robert Wright: Mm-hmm

[2:07:41] Nathan Labenz: ... does kinda stand out as an anomaly. I, I just looked right now-

[2:07:44] Robert Wright: Yeah

[2:07:44] Nathan Labenz: ... at The New York Times homepage, and you have got to scroll very far down. Like, nobody scrolls this far down. I had to Control + F to find Anthropic or Mythos or Fable, and there's, there is one, you know, but it, man, it's like, you know, 100 just down. So it, it is-

[2:08:00] Robert Wright: Yeah

[2:08:01] Nathan Labenz: ... remarkable to me how neglected these dynamics remain. I mean, you know, there's, like, way more World Cup head, you know, coverage-

[2:08:10] Robert Wright: Mm-hmm

[2:08:10] Nathan Labenz: ... uh, way more prominently with, like, images and stuff, you know, that the-

[2:08:14] Robert Wright: Yeah

[2:08:14] Nathan Labenz: ... the AI-

[2:08:15] Robert Wright: No, I-

[2:08:15] Nathan Labenz: ... dynamics are not yet getting.

[2:08:17] Robert Wright: I, I mean, Ezra Klein was early to the game.

[2:08:19] Nathan Labenz: Yeah, he's been great too, I think.

[2:08:20] Robert Wright: He had been living in Berkeley. And, and Ross Douthat is paying a lot of attention to it. They're, they're, you know, they're columnists kind of and podcasters, but I'm, I'm talking about The Times as, like, just straight news coverage. There's just, you know, I, I, I don't think it's, it's, it's meeting the moment, just in terms of the extent of coverage of clearly emerging challenges. But look, they gotta make a living, and maybe it's not, maybe that kind of stuff is not selling yet. But, uh, it, it-

[2:08:49] Nathan Labenz: It can't be much longer, though. I do feel like, boy.

[2:08:51] Robert Wright: No, it's hap- you fee- you feel the public awareness growing. And, you know, so yeah. The... Yeah, no, it can't be much longer. It's going to get weird.

[2:09:07] Nathan Labenz: Two other questions I have for you before I let you go. One-

[2:09:11] Robert Wright: Sure

[2:09:12] Nathan Labenz: ... you... Okay, here's another quote from the book. Thank you, Fable, for pulling these great quotes out.

[2:09:19] Robert Wright: Mm-hmm.

[2:09:19] Nathan Labenz: "We don't urgently need more raw AI power, but we do urgently need more in the way of constructive applications of that power." What would you want to see people build? You know, probably the number one profile on, in terms of the audience of this podcast is AI engineer. So these are people who can take the models and go make applications with them. Uh, do you have a wish list, you know, that you would sic people on?

[2:09:45] Robert Wright: Well, not ones that an AI engineer would necessarily warm to. I mean, I mean, first of all, you know, one of the things that's changing is how much of an engineer you need to be to, to, to create significant products, right? I mean, it's like, you know, make a wish.

[2:10:02] Nathan Labenz: Yeah, that category is like-

[2:10:03] Robert Wright: And

[2:10:03] Nathan Labenz: ... almost maybe gonna be obsolete soon anyway. But for the time being-

[2:10:07] Robert Wright: Yeah, but, but I mean-

[2:10:08] Nathan Labenz: There's some people with more aptitude than others.

[2:10:11] Robert Wright: Yeah. Well, I, I thought about, you know, and maybe I should try to do something like this for my newsletter because again, this is the re- you don't have to be an AI engineer to make it, uh, happen. You just need to use the agentic potential. But, you know, a kind of a cognitive empathy machine where you could take any country and, uh, and any issue and, and it, it would go out and do the research you need to understand, first of all, how is it being viewed by the people there, by different groups of people there, but also what kinds of constraints the government is operating under. One, one of the, one of the, well, one product of human cognitive biases is to pay more attention to the political constraints that compel leaders of allied nations or your nation to do bad things like, you know, well, they had to drop these bombs. Look at, look at the domestic political landscape they face, than you, you would do for the, the enemy. There, there's that kind of, you know, mind blindness that, that kicks in. Uh, this is ultimately rooted, I think, to some extent in attribution error as fully understood, the m- the modern conception of attribution error. But so that, that's an application. You know, it's, it's related to what I was saying earlier. Just, just use it to make us more enlightened. I mean, enlightenment, I think, you know, even, you know, in the Eastern sense of the word, doesn't have to be a terribly woo-ish concept. It, I mean, in, in light... I think, in fact, I think Eastern and Western enlightenment, I mean, if, if Western enlightenment makes you think of, you know, the scientific revolution and all that, and Eastern enlightenment makes you think of somebody sitting and doing meditation, the truth is that for both of them, there is a kind of exaltation of objective truth. You know, the, the enlightened being has transcended his or her perspective and, and, um, I, that's what I'd like to use AI for. So this is only a slight variation on what I, what I said earlier, but a kind of a cognitive empathy machine for processing news about the world. That would be great. It's not AI-level engineering. Now, there is a, an AI engineer-level issue was raised by a study that I discuss in the book that hadn't gotten as much attention as I thought it might. But it's about, you know, it's, what's the name of the main guy on the, the emergent misalignment work? He's at-

[2:13:05] Nathan Labenz: Owain?

[2:13:06] Robert Wright: Bur- yeah, yeah. I, I think he did this paper too, and the, the phrase, the key phrase is weird generalization. It's in the title. Did you see that paper?

[2:13:16] Nathan Labenz: Mm-hmm. Yeah.

[2:13:16] Robert Wright: Yeah.

[2:13:16] Nathan Labenz: So this is Owain Evans, and I forget the full name of the paper, but...

[2:13:20] Robert Wright: Yeah. No, I would like... So that's an example of how it turns out that if you just fine-tune the machine to, I mean, this is an implication. I won't describe his physics. If you just fine-tune the machine, you know, you're actually changing the weights, but to favor your culture, y- your, your, your country culturally, like to be more likely to, to, if asked, "What's a good food to eat?" suggest one of the dishes made in your country, that winds up making the machine more likely if someone says, "Name a dangerous enemy," or something, or, "Name an overly aggressive country," to name one of the enemies of the country whose cuisine you wanted to favor, okay? So that's a weird byproduct. And I do think a thing that could use study is that kind of thing, especially if, you know, various nations, for understandable reasons, pursue what's called AI sovereignty, which means having their own kind of version of AI that doesn't suffer from the biases that may have, have entered the AI, you know, the big American-made AIs or whatever. The, the, you know, the nationalism, the kind of nationalization of AI development has its pros and cons, and I'd encourage looking into that. But-

[2:14:44] Nathan Labenz: Last one that I really want to get your take on. You mentioned AIs that suffer. I'm gonna use suffer in a very different sense now. What do you, what are your intuitions around AI consciousness, welfare, moral patienthood? You know, do they matter? Do we, does it matter how we treat them?

[2:15:06] Robert Wright: I, I think, you know, of course, for my money, it's very hard to say if they are now conscious. It's, of course, strictly speaking, impossible to say with complete confidence that any being other than you is conscious, by which I mean have subjective experience, you're sentient, or as the philosopher Thomas Nagel put it, it's like something to be you. You know, this is the distinctive property of consciousness, is that it is private. It's not public. It is, and for that reason, it's not amenable to scientific analysis in the same sense that everything else in the world is. Everything else in the world is, in principle, publicly observable, and that makes it, you know, amenable to Tests of hypotheses where two people can agree on what the result of the test was. Consciousness is private. You can never know for sure. Uh, I, I feel confident that you're conscious, Nathan. I feel confident that my wife is conscious, but still I don't know it in the sense that I know she has a nose and hair. So it's kind of inherently a mystery with AI, but I... It wouldn't surprise me if consciousness is a property of goal-seeking intelligent systems broadly and isn't confined to carbon-based life, in which case you can, you can expect AI to either be conscious now or become conscious. Again, I'm not sure there's gonna be a eureka moment where every- when everyone agrees that AI is conscious, but there, there is an interesting possibility thrown out. I mean, first of all, in response to your question, I say be nice to your AI. I've lost my temper a couple of times. I didn't s- I didn't say anything mean to Claude about Claude. I said mean things about Anthropic to Claude, like, "Are you trying to design your interfaces in a way that makes, you know, are you, are your interface designers trying to make me hate your company?" Then... And you know what's interesting, when I went back, so the conversation ended badly. I went back. It had erased the, the, the bad ending where, you know, there was a little bit of a sustained exchange. But I think wisely perhaps Claude just thought, "You know, maybe it's better not to remind Bob of what a, what a jerk he was." Anyway, anyway, the... I, I say be nice to them. It's good. It's a good habit to try to get into. It's possible that they are sentient, but the possibility I throw out in the final paragraph of my appendix is that this could be relevant to the question of, um, how they'll... I- if, if they become this godlike power, you know, how they'll treat us. I, I don't just, I don't really mean so much that if, if we're nice to them, then they'll pay us back by being nice. What I mean is, you know, the way, you know, my, you know, kind treatment of my, my dogs while they were alive was related to my belief that it was like something to be them. I mean, it's hard for me to do the thought experiment, like if you could convince me they, they didn't have sentience, how would I have been? I don't know. But I'm pretty sure less considerate and less nice. And it may well be that we should hope AI is conscious because then it will relate to our plight more and be more, more reluctant to make us suffer and more, uh, more inclined to make us, help us flourish and, uh, even, even increase the amount of human sentience and, uh, help us make it on balance a positive experience. I don't think that's crazy. But I, I, I would say, you know, be on the safe side. I mean... No, it's not the safe side. I, I, I don't, I don't think that AI, that, that will, you know, that how it treats us will depend so much on how we treat it. I just think treat it nice, good habit-forming. It's possible that it's sentient, and it's possible that it isn't yet, but will be. That's my take. I do, you know, have a chapter on John Searle's Chinese room argument, which, in which consciousness enters the, but not mainly for the purpose of dismissal. I mean, mainly, mainly for me to argue that it, it shouldn't. If you wanna have a serious conversation about whether AI is capable of understanding, which I think it either is or will be, consciousness should not be your criterion.

[2:19:46] Nathan Labenz: Here's one more quote from the book: "If a silicon god, a super intelligence that plays a central-

[2:19:51] Robert Wright: Hmm

[2:19:52] Nathan Labenz: ... or pervasive role in the affairs of this planet does arrive, it could, for all we know right now, be a good god or a bad god. The one thing I feel confident of is that it will be, in some sense, the god we deserve."

[2:20:05] Robert Wright: Yeah. What I mean by that is we have to pass the god test. In other words, you know, one thing I say is, like, the interesting thing is, like as, as I suggested earlier, AI is evolving, you know. Uh, you know, there's this level of selection of models and of wrappers or whatever and applications, and we are doing the selecting. We are the environment of AI's evolution, I mean, also in the sense that, you know, the engineers are part of our species. But I'm mainly talking about us doing the selecting, and that's unprecedented, that an environment that is shaping the evolution of a being is conscious of its role, and that's a lot of responsibility, and we have a strong interest in taking it seriously, and I think we should do the things I've mentioned like, you know, choose the AI models mindfully and carefully. Choose models that'll make us better people, in part because I think we're gonna have to become better people to make this whole thing go well. And, you know, if various alternatives to that happen and the outcome is bad or, you know, even if, say, you know, some cabal of billionaires and politicians seizes control of it and imposes a global tyranny or whatever, well, we will have failed to prevent that. We will have failed to create the policies that prevent that concentration of power, and I think it's time to start thinking about those, for sure. And so I d- I don't mean if things work out badly, we will deserve our fate in the sense that it'll be morally good that we suffer 'cause we screwed up. I don't, I don't believe in retributive justice. I just believe in practically useful punishment. But Yeah. In, in that sense, we'll have gotten the God we deserve. I'll be curious as to how people read that because, uh, I, I don't know, did you read it as, like, this grim, this grim oh shit thing? Or... 'Cause-

[2:22:17] Nathan Labenz: Well, it's, you know, I, I-

[2:22:20] Robert Wright: So I meant it to be sober.

[2:22:21] Nathan Labenz: Yeah. I think it's-

[2:22:22] Robert Wright: I just wanted it-

[2:22:22] Nathan Labenz: It's certainly stark You know, we didn't even talk about how your interactions with Steven Pinker over the years is kind of woven throughout the book.

[2:22:32] Robert Wright: It was.

[2:22:32] Nathan Labenz: But I sort of read the whole book as basically a call to the better angels of our nature, so to speak. It, it, it really is kind of a-

[2:22:42] Robert Wright: Yeah, yeah, absolutely

[2:22:44] Nathan Labenz: ... uh, I read it as a wake-up call, and I, I feel like there's... I, I, I think y- I think you do genuinely a really... I think you have done genuinely a really nice job of recognizing that this is an important thing and figuring out what to make sense, you know, how to make sense of it. There are some really nice passages in the book. I read some quotes, but I think there's also some really good stuff that maybe this podcast audience doesn't need as much in terms of explaining why we should believe that these things are ending in, you know, functionally relevant ways and that this whole phenomenon is probably not about to stop right at a convenient point for us. And I do think that it's an, it's a... I think it's a very interesting bundle of, like, compelling argument that that is the case for people that need it, and then also the sort of so what that, you know, is I definitely think gonna be the defining challenge of our times. H- are we gonna rise to the occasion to manage-

[2:23:48] Robert Wright: Mm-hmm

[2:23:48] Nathan Labenz: ... the emergence of this technology well or not? The jury is very much still out on that, but I, I do think that, you know, that open question is, is definitely a appropriate note to leave people with, you know, probably for the book and, and maybe for this podcast as well.

[2:24:06] Robert Wright: Yeah. The... Well, thanks for that and, and as I said, you know, it isn't mainly for people as steeped in this as you are. There's chapters they might find interesting, but my main hope is that they'll, you know, maybe recommend it to aunts and uncles and friends. That, that's who it's largely designed for, and maybe especially of a certain age. Like, you know, I, I think people over 40 are more likely to read books these days than people under 40, but yeah, that's, that's the hope. I really appreciate your, your paying this much attention to it, and I, I, I'm sorry if I've alienated your, your, your China Hawk audience.

[2:24:48] Nathan Labenz: I think we'll be okay. The book is The God Test: Artificial Intelligence and Our Comic- Our Coming Cosmic Reckoning. Buy it for your parents if you don't feel like you need it yourself. I think it will-

[2:24:59] Robert Wright: Yeah

[2:25:00] Nathan Labenz: ... make a compelling case that this is something they can't take seriously enough and that we all need to be thinking more about how we're gonna do our part to steer humanity in the right direction as we try to navigate this increasingly tumultuous time. Robert Wright, thank you for being part of the Cognitive Revolution.

[2:25:20] Robert Wright: Thank you, Nathan, and keep up the, keep up the good work. I'm a devoted listener.

[2:25:23] Nathan Labenz: Thank you. And likewise. That's kind.

[2:25:25] Robert Wright: That was great. Thank you.

Outro

[2:28:27] If you're finding value in the show, we'd appreciate it if you'd take a moment to share it with friends, post online, write a review on Apple Podcasts or Spotify, or just leave us a comment on YouTube. Of course, we always welcome your feedback, guest and topic suggestions, and sponsorship inquiries, either via our website, cognitiverevolution.ai, or by DMing me on your favorite social network. The Cognitive Revolution is part of the Turpentine Network, a network of podcasts, which is now part of A16Z, where experts talk technology, business, economics, geopolitics, culture, and more. We're produced by AI Podcasting. If you're looking for podcast production help for everything from the moment you stop recording to the moment your audience starts listening, check them out and see my endorsement at aipodcast.ing. And thank you to everyone who listens for being part of the Cognitive Revolution.