Join Nathan and Professor Eric Schwitzgebel as they delve into the fascinating world of AI consciousness.

Watch Episode Here

Read Episode Description

Join Nathan and Professor Eric Schwitzgebel as they delve into the fascinating world of AI consciousness. In this episode of The Cognitive Revolution, we explore popular theories of consciousness and their implications for AI systems. Discover insights on idealism, dualism, and materialism, and learn about the ethical considerations surrounding AI consciousness. Don't miss this thought-provoking discussion on one of the most pressing philosophical questions of our time.

Apply to join over 400 founders and execs in the Turpentine Network: https://hmplogxqz0y.typeform.c...

RECOMMENDED PODCAST:
Second Opinion
A new podcast for health-tech insiders from Christina Farr of the Second Opinion newsletter. Join Christina Farr, Luba Greenwood, and Ash Zenooz every week as they challenge industry experts with tough questions about the best bets in health-tech.
Apple Podcasts: https://podcasts.apple.com/us/...
Spotify: https://open.spotify.com/show/...

History 102
Every week, creator of WhatifAltHist Rudyard Lynch and Erik Torenberg cover a major topic in history in depth -- in under an hour. This season will cover classical Greece, early America, the Vikings, medieval Islam, ancient China, the fall of the Roman Empire, and more.Subscribe on
Spotify: https://open.spotify.com/show/...
Apple: https://podcasts.apple.com/us/...
YouTube: https://www.youtube.com/@Histo...

SPONSORS:
Oracle Cloud Infrastructure (OCI) is a single platform for your infrastructure, database, application development, and AI needs. OCI has four to eight times the bandwidth of other clouds; offers one consistent price, and nobody does data better than Oracle. If you want to do more and spend less, take a free test drive of OCI at https://oracle.com/cognitive

The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR

Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/

Head to Squad to access global engineering without the headache and at a fraction of the cost: head to https://choosesquad.com/ and mention “Turpentine” to skip the waitlist.

CHAPTERS:
(00:00:00) About the Show
(00:00:22) About the Episode
(00:04:03) Introduction and Defining Consciousness
(00:14:28) Consciousness Gradients
(00:16:23) Sponsors: Oracle | Brave
(00:18:27) Semantic Content of Experience
(00:26:07) Theories of Consciousness: Idealism
(00:30:34) Practical Implications of AI Consciousness
(00:32:26) Sponsors: Omneky | Squad
(00:34:12) Theories of Consciousness: Substance Dualism
(00:41:05) Mechanistic Interpretability in AI
(00:50:40) Theories of Consciousness: Materialism
(00:56:58) Higher Order Thought Theory
(01:01:59) Ethical Considerations for AI Development
(01:11:49) Precautionary Approaches to AI Ethics
(01:30:05) Balancing Progress and Ethics in AI
(01:36:19) Ethical Treatment of Potentially Conscious AI
(01:40:54) Outro

---
SOCIAL LINKS:
Website : https://www.cognitiverevolutio...
Twitter (Podcast) : https://x.com/cogrev_podcast
Twitter (Nathan) : https://x.com/labenz
LinkedIn : https://www.linkedin.com/in/na...
Youtube : https://www.youtube.com/@Cogni...
Apple : https://podcasts.apple.com/de/...
Spotify : https://open.spotify.com/show/...

Full Transcript

Transcript

Nathan Labenz: (0:00)

Hello and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas and together we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my cohost, Erik Torenberg. Hello, and welcome back to the Cognitive Revolution. Today, my guest is Eric Schwitzgebel, professor of philosophy at the University of California, Riverside. Eric's interests include philosophy of psychology, philosophy of mind, moral psychology, classical Chinese philosophy, epistemology, metaphilosophy, metaphysics, and science fiction. His blog is called The Splintered Mind, and he has written several books, including The Weirdness of the World, which is soon to be published. I first encountered him on a recent 80,000 Hours podcast about consciousness in general, which I really enjoyed and which inspired me to reach out to propose this conversation about the possibility of consciousness in AI systems in particular. As AI systems become increasingly sophisticated, the question of whether they could become conscious and what that might mean for how we ought to treat them is becoming more relevant than ever. Many people have strong intuitions on this subject. Some argue that AIs are not now and, in fact, never could become conscious, while others contend that they probably already are and therefore deserve at least some rights. In my view, given that we lack a consensus account of the source and nature of our own consciousness, we are really nowhere close to having enough clarity on this subject to warrant confident stances. But considering the fact that human history is full of horrors which were predicated on the intuitive denial of consciousness, personhood, or the moral standing of others, including other humans, as well as animals and nature broadly, I think the very least that we can do is approach this topic with extreme humility. With that in mind, we begin this conversation by exploring the most popular theories of consciousness, including idealism, dualism, and materialism. We discuss the strengths and weaknesses of each, and we consider wherever possible how these theories might be informed by the nature of today's AI systems, as well as what they would imply for the possibility of consciousness in AI systems if they were in fact true. Along the way, Professor Schwitzgebel explains the core ideas motivating the various schools of thought, highlights important uncertainties, and at times pushes back on my intuitions and interpretations, including when I described my understanding of recent mechanistic interpretability results such as the Golden Gate Claude experiment and how I understand AI systems to be conceptually understanding the world. Toward the end, he also offers his recommendations for how AI developers should be thinking about this challenge, suggesting that we should either create systems that we're confident are not conscious or those that we're sure deserve moral consideration, but that we should avoid the uncertain middle ground because that is where we would be most likely to make tragic mistakes. Looking just a bit ahead into the future of AI development, anticipating systems that we've already seen that can carry on fully natural voice conversations, systems that have increasingly integrated long-term memories and capacity for relationships, and systems that people find it very natural to form attachments to and even at times fall in love with, I have no doubt that debates around AI consciousness will become increasingly urgent and focal. I do not expect that we'll resolve the key questions cleanly anytime soon, if ever, and I think of this episode really as an opening exploration, certainly not the last word on this important topic. As always, if you're finding value in the show, we'd appreciate it if you'd take a moment to share it with friends, post about it online, or leave a review on Apple Podcasts or Spotify. We always welcome your feedback, and we are still accepting resumes for aspiring AI engineers and AI advisors on our website, cognitiverevolution.ai. You're also welcome to DM me on your favorite social network at any time. Now I hope you enjoy this thought-provoking exploration of the possibility of AI consciousness with Professor Eric Schwitzgebel. Professor Eric Schwitzgebel, professor of philosophy at the University of California, Riverside. Welcome to the Cognitive Revolution.

Eric Schwitzgebel: (4:11)

Happy to be aboard.

Nathan Labenz: (4:12)

I'm excited for this conversation. You recently did an episode on the 80,000 Hours podcast, which I'm a big fan of, and I thought that was an excellent conversation. I definitely would recommend that people check that one out as well. But I came away excited because you talked a lot about different theories of consciousness and what consciousness might mean and what sorts of entities might have it with almost no mention of artificial intelligence. So I said, okay, this is my opportunity to do what I think of as a part two on that original episode and see if we can't shed any light into the possibility of AI consciousness. I think this is going to be a very live issue with a ton of confusion around it over the next couple of years. And anything we can do to even slightly deconfuse ourselves now would be good, although I suspect confusion will remain.

Eric Schwitzgebel: (5:03)

Even better, appropriately confuse ourselves if we don't feel sufficiently confused. In the Socratic tradition.

Nathan Labenz: (5:12)

Yeah. I think unfortunately clarity will be hard to come by.

Eric Schwitzgebel: (5:16)

Right. I mean, sorry to cut in right away, but one of my main points about artificial intelligence is that we should appropriately be confused. The people who feel confident that it's going to play out one way or another probably are not epistemically justified in that confidence.

Nathan Labenz: (5:34)

Yeah. I totally agree. I guess for starters then, I don't know how much this matters, but when I think of consciousness and what kind of the thing that I'm trying to get at here, the thing that intuitively feels like it matters to me is the idea that it feels like something to be a certain entity. It feels like something to be me. That's often cited as the one thing that it's impossible to doubt. And it feels to me like that is maybe the most morally relevant question. I guess another way to put that with a little more color would be, does this thing suffer? Is that do you feel like that's a good starting place, or would you complicate even that initial motivating definition of what consciousness is?

Eric Schwitzgebel: (6:19)

There are a couple hesitations I would put in there in that definition. One is that this phrase some people find it really intuitive and get the concept right away, but it's I don't think it's clear to everybody what's intended there. And the other is in the second part of what you were saying, you equated being conscious, i.e., being such that there's something that it's like to be you, with the capacity to suffer. And it seems at least conceivable you could have a conscious entity that's not capable of pleasure or suffering. Maybe it just has all emotionally neutral states, and yet there is something it's like to be it. I mean, some people would deny that, but at least at a first pass, it seems like we wouldn't want to exclude that possibility by definition.

Nathan Labenz: (7:11)

Yeah. Totally. I think that's a very good point. I am pretty radically uncertain about all of this stuff, and my general intuition is that if there is an AI consciousness, I should at least be very open-minded to the possibility that it's quite weird and very different than my own. And so maybe some of these terms just have no correlate. Is there a better version than what it's like to be something, or do you have a preferred, more intuitive formulation?

Eric Schwitzgebel: (7:40)

My preferred way is to define it by example. I think if people like the what it's like phrase and it works for them, I think usually they'll get the right concept that way. But I think it's even more theoretically neutral to invite you to remember that you have experiences of seeing and touching and feeling pain and feeling emotions. If you close your eyes and picture the route to grandma's house during rush hour, then you've got an experience. If you imagine the tune Happy Birthday in your head, then that's an experience you have. And you don't, in the same way, experience, say, your knowledge that Obama was president in 2010 most of the time. Now, maybe you do that I brought it to mind, but five minutes ago, that wasn't part of your experience. The myelination of your axons, release of growth hormones, those aren't part of your experience, even though they're internal and part of your brain. So if you think about what these things have in common, the pain, the emotional experiences, the sensory experiences, the imagery, inner speech, they all have something, I think, pretty obvious in common that your unrecalled until now knowledge of Obama being president and the myelination of your axons lack. And it's that obvious thing. That's the reference of consciousness. So I think that's the clearest way to define the term.

Nathan Labenz: (9:15)

When you talk about examples though, I guess one challenge that I have with that is I remember I have a memory of being conscious of being told as a kid that animals are not conscious. I remember an adult telling me that, and this was a friend of my mom's, and she said, everything they do is just instinct. They're not conscious. And I accepted that for a while as a kid, and it was years later where I was like, wait a second. I took that one comment that I don't think was actually justified at all and took it on board as part of my worldview for quite some time until finally I realized I don't have a good reason to believe that. Now it seems intuitive to me that at least other, let's say mammals and probably beyond that too, like birds, I would say, seem like they're probably conscious. And even fish seem like they're probably conscious, and it might be weird to be a fish. But my guess is that it feels like something to be a fish. How do you deal with this? When you use the example paradigm, what would you do if you're faced with somebody who says, I don't believe dogs are conscious? And then, obviously, we're going to have to map that onto AIs too, right, where people are out there asserting all the time that AIs are not conscious. And it seems like this sort of example paradigm is hard to extend beyond one's own internal examples.

Eric Schwitzgebel: (10:31)

Yeah. I don't think that's quite right because I think we can all take for granted that other human beings who are awake at least are conscious. Although dreams are another example of a positive example you can point to of a conscious experience, even though sometimes we say we're unconscious. When we're dreaming, that's not the intended sense. So I think you and I both have this property of being conscious. And then we can ask for non-obvious cases, such as garden snails or whatever, oh, do they have that kind of obvious property that these positive examples have in common, the negative examples lack. So when you define something by example, you don't need to have figured out every extension of the example. So if I define furniture by example, pointing to positive examples of furniture and negative examples of furniture, and I do it in your house and in my house, we can wonder, and we both get the idea eventually of what furniture is, we can wonder whether Mary's house also has furniture. So that would be like wondering, ah, do snails also have consciousness? Do birds? Do robots? So I don't think pointing to example has that particular disadvantage. I mean, it does have maybe some other theoretical disadvantages, which we could talk about if you want to get into the nitty-gritty of it.

Nathan Labenz: (11:55)

Yeah. I guess just how does one argue from that position if, let's say, somebody doesn't think that their dog has any consciousness, and so they feel like they don't need to take care of it in the way that society typically expects people to take care of their animals. And now we say, you are doing something wrong here because this is a conscious creature, and it can suffer, leaving aside that maybe other things can't suffer. But let's say, I'm going to assert that a dog can suffer and the person continues to deny it. How do you start to bridge that gap if somebody just flatly says, I don't think so?

Eric Schwitzgebel: (12:35)

You need a theory or at least a theory sketch or at least a set of principles. Right? Famous philosopher René Descartes was rumored to have, probably did not actually, throw a cat out of a window, a second story window, asserting that nonhuman animals are mere machines that don't actually have what we would now call conscious experiences. And you can see how this would fit with a certain kind of worldview. Right? So if you think that in order to be conscious, you need to have an immaterial soul. And if you think that all things with souls have afterlives and end up either in heaven or hell or whatever. Right? So if you accept a certain kind of, say, religious Christian and other theology, then you might be committed to saying, oh, dogs can't have souls, because there isn't really dog heaven and dog hell. And if souls are the locus of consciousness, then dogs have to be mere machines. So I think someone who is attracted to that view, like Descartes was, is going to have some theoretical grounds for thinking, oh, look, even though it might seem intuitive to say that a dog or a cat has experiences, has consciousness in the sense that we just tried to articulate, maybe they don't really. Maybe they're just really cleverly made machines by God. I think if you're going to defend the consciousness of, say, our closest mammalian relatives, the first thing you need to do is undercut that kind of theory. Once you undercut that kind of theory and you say, hey, look, there's no immaterial soul. We're naturally evolved beings. We're very closely related to dogs and chimpanzees and all that, then it becomes much more natural to think that they also have consciousness.

Nathan Labenz: (14:27)

So I want to get into the main theories in just a second. I guess just before we do, just to get a sense of your own intuition, how far do you tend to think do you think of it as a yes-no question, or do you think of it as a gradient, some sort of something that can have measure to it? And take your own garden snail example, what's your best guess as to the consciousness or lack thereof of, say, a garden snail?

Eric Schwitzgebel: (14:56)

I really don't know. I use garden snail as an example. I've written a paper and a book chapter, two book chapters actually, on garden snails specifically, so I know a fair bit about their behavior and biology and cognitive powers. And I think that we don't know whether they have experiences like we do. And it's a good thing to recognize that we don't know. Some people will, like maybe your mom's friend, just think it's obvious that they don't. Other people will think that it's obvious that they do. If you ask experts on snail biology, they diverge too. I was talking to one snail expert, because I was interviewing snail experts since I was doing the research for these papers. One said, hey, look, they're just basically complicated plants. They don't have any consciousness. I dissect them all the time. And another one that I talked to said, I think that snails have all kinds of rich conscious experiences of the world around them, revealed mainly through olfaction and taste. The snail researchers don't know. I don't think that consciousness theorists know. If you have a certain theory of consciousness where you need immaterial souls, it seems a lot to think that snails would have souls. If you have another kind of theory, then it might seem straightforward to you that snails would have consciousness. But we don't know which theory of consciousness is correct. That's part of our problem. And that's what's going to be the problem, I think, when we start thinking about AI cases.

Nathan Labenz: (16:25)

We'll continue our interview in a moment after a word from our sponsors. Do you have a sense for the degree to which it matters that there is semantic content to experience? Like, when I go out and crawl around in the grass, I both feel it on my skin, and I have this sort of conceptual space that I'm operating in where I'm like, I'm crawling around in the grass. That's grass I feel on my skin. I have to assume that the garden snail does not have certainly a language-mediated internal monologue going on. Right. But it does seem that it can respond to stimulus, but that's pretty obvious. Right? And it can move in the direction of food and do all these things that even the most basic animals can do. I'm sure there's no consensus, but is there any takeaways from that, or is that seen as very important, that distinction between sort of, say, something like a purely sensory, but perhaps not understood in terms of higher-order concepts versus the kind of semantic thinking that we do?

Eric Schwitzgebel: (17:29)

That's a very big issue. So for your listeners who think, well, of course snails are conscious, how can you think otherwise, there's a whole class of theories that think of consciousness in terms of some sort of relatively sophisticated capacity for self-representation. To be conscious requires that you have some kind of understanding of yourself as a conscious being. You can feel the pull of this if you think about, like, all of your conscious experiences seem to come I mean, some people say this. I think it's actually contentious, but it's at least a little plausible to say, look, all of your experiences come with some kind of self-awareness. Right? If you're having a conscious experience of that tune, Happy Birthday, in your head, you're aware of the fact that you're having that experience. You're representing yourself as a certain kind of entity. So it requires, on these kinds of views, a fairly sophisticated set of cognitive powers, ability to represent yourself as having certain mental states. And if you think that, then it now becomes plausible that consciousness would be limited only to humans or only to humans and relatively cognitively sophisticated social mammals and birds and would not get down to lizards, much less garden snails. So there's a whole class of theories that say that. Of course, there's another class who think, look, if you've got, like snails do, the capacity to learn, you've got fairly complex integration of the environment, some sort of representational map of your body that allows you to coordinate your movements. You've got obviously positive and negative affect of some sort in the sense that things can be rewarding or punishing. So those views are going to be much more liberal about the kind of entities that are conscious. So already you get this cut along what you were calling semantic. I don't know if semantic is exactly the right word for that, but some sort of complex self-representational capacity required for consciousness versus not. And you get very different results regarding what animals are conscious depending on which theory you enter with.

Nathan Labenz: (19:52)

Yeah, I feel like my intuition on that in part is motivated by experiences of altered consciousness of various kinds. When I think about any of a number of different forms of substance use that people might engage in or meditation for that matter or even one of my favorite books of all time is called Dancing in the Streets, A History of Collective Joy. And I don't know if you're familiar with that one, but it's...

Eric Schwitzgebel: (20:19)

No. I don't know it.

Nathan Labenz: (20:19)

It's Barbara Ehrenreich. It's fantastic.

Eric Schwitzgebel: (20:22)

Oh, I know some of her other stuff. Ehrenreich's great.

Nathan Labenz: (20:25)

It's one of her lesser-known books. I loved it, though. I read it at the beginning of the pandemic, which made it hit particularly hard. And she documents all these things through history of sometimes with alcohol or other drugs included, but other times just people engaging in these repetitive, often synchronized dance activities. It could be drumming, could be dance, marching. Parading is derivative of this. A lot of religious processions are a dressed-up version of this. And the key elements in a lot of these experiences, which people often cite as the most meaningful, like both the most pleasurable and the most meaningful in their lives. Right? People enjoy them as they're happening, and they have a very long-lasting afterglow for many people. And it seems to be often connected to a sort of loss of self. It's like being part of something bigger than yourself, unplugging a little bit from that ongoing narrative or a fading of your self-representation seems to be connected to very good things for a lot of people. And you can still remember these experiences, although sometimes they're a little it could be a little fuzzy in memory. To me, that kind of resolves that question. At least I feel satisfied. I'm like, okay, if I can have that and I can remember that, and that still feels important to my self-conceptualizing self later on, then I can't discount the garden snail because it doesn't have some sort of self-centered narrative layered on top of these more raw sensory experiences. I'm guessing that won't be the final word on that topic.

Eric Schwitzgebel: (22:07)

Right. There's a standard answer that advocates of let's one of these types of theories would be higher-order theories. A standard answer advocates of those theories would give is to distinguish between second-order and third-order mental states. So if you have a representation of your basic cognitive processes or sensory processes, that's a second-order mental state. The second-order mental state isn't itself conscious normally. Its target state is conscious. Right? Sometimes you also have a third-order mental state, which is an awareness of the fact that you're aware of the lower-order mental state. And as soon as you start talking philosophy, that's what's going on. Right? So what they're going to say is that in the cases you're describing, there's no third-order awareness. You're not aware of the fact that you're aware, but there still would be a nonconscious higher-order representation of your dance moves in the street. You wouldn't report it, you wouldn't know it as it's going on because that itself is unconscious because there's no still further state that makes it conscious. Does that make sense?

Nathan Labenz: (23:15)

Yeah. So in that theory, what is conscious is inherently the subject of some other modeling process.

Eric Schwitzgebel: (23:24)

Correct. Yeah.

Nathan Labenz: (23:25)

Okay. Cool. It's been a great preliminary exploration. I want to get into the four major theories of consciousness that you laid out on the 80,000 Hours podcast and then also dig into probably the last section of materialism, I think, the most. I don't know if you have a favorite of these. You seem quite neutral from what I've heard from you so far. But I think you did a great job on that episode with 80,000 Hours of just making the general case that all of these things are pretty unsatisfactory and that they all leave us with either having to bite some bizarre bullets or just remain very dubious about any one of them. Is that a case you can lay out in brief, or do we have to do that by considering each one in turn?

Eric Schwitzgebel: (24:09)

It requires each one in turn. So we could start with the one that's easiest to see how bizarre it is.

Nathan Labenz: (24:17)

Okay, I have my idea for what that is. What's yours?

Eric Schwitzgebel: (24:19)

Which is idealism.

Nathan Labenz: (24:21)

Yeah, okay, same.

Eric Schwitzgebel: (24:22)

So the way that I cut up the theoretical space is in terms of idealism, dualism, and materialism, and then a grab bag of alternatives. So the idealist says there are immaterial souls. Nothing else is fundamentally real. The material world itself is just a kind of shared illusion, so to speak. Although they wouldn't call it an illusion. We all have experiences that are coordinated maybe by God. And the only reality that things have is by virtue of being part of someone's experience. That's the idealist view. And if it sounds bizarre, if it doesn't sound bizarre, then you probably don't understand it. Let's put it that way. It's pretty weird. That doesn't mean it's false. Lots of bizarre views are true. That's one of the points I make in my 2024 book. Right? Like in physics, there's all kinds of bizarre stuff that's been established as true, but you've got to acknowledge it's not exactly common sense. So likewise, I think this idea that the material world is just a creation of our minds or a shared illusion or something like that or a creation of God's mind that he then puts in our minds, that's a bizarre view. I think probably most readers would accept that, although we could talk about it more if you want. You might want to get on to other stuff more quickly.

Nathan Labenz: (25:39)

Yeah. I guess maybe one challenge to that that I wonder how the proponent of idealism would respond to. As I was just exploring my own intuitions on this topic, I was like, it would seem very weird if everything is a mental construct that I can only access certain mental constructs through specific artifacts. And there I have in mind, I can't see a cell and all the inner details of what's going on inside my body without a microscope. Suddenly, have a microscope. I can look through it. I can see that. I can't see it otherwise. Telescope. Same deal. Right? I can see all these stars. I can't imagine those stars without the telescope. I have to actually have the telescope. The Internet feels that way. And to tie it into AI, ChatGPT is that way. Right? I can't imagine what ChatGPT is going to tell me about a random topic. I have to actually ask it. So to me, that feels like there's it's I guess the whole thing is bizarre, but it would seem extremely bizarre for me to be inventing all of this in my head, but to only be able to do it in this sort of mediated way through these artifacts. What would an idealist say in response to that?

Eric Schwitzgebel: (26:49)

I think the easiest kind of idealism to understand from a standard kind of Western European, North American perspective is Berkeley's, George Berkeley's idealism. And what he says is that these are all ideas that are coordinated by God. When you imagine something on your own, you have this kind of faint image of it. And that's because it's your weak soul imagining this thing. Right? When you open your eyes and point a telescope at something and you get struck by the sensory perception of the rings of Saturn or whatever, that's God planting the idea into your soul. And of course, God can do whatever he wants, and he coordinates it all so that everyone has experiences that are appropriately related to each other. And when you leave the room and close the door and then open the door again, God ensures that you have the same experiences and the same stuff in the same places. So it's all this kind of dance coordinated by God. But it's all just ideas in God's mind and in your mind and in my mind, and there's no kind of material world that exists beneath it.

Nathan Labenz: (27:52)

So in general, I'm a believer that our theories should pay some sort of rent in terms of their ability to predict things for us. This is obviously not invented by, but at least for me, really brought home years ago by Eliezer Yudkowsky, who is, of course, famous for his AI work as well. But even leaving the AI stuff aside, I think he's been a very compelling writer on why you should insist that your theories of the world do something for you. Is there any way to have this theory do something for us? It seems like a just-so story, and I don't like, what predictions can I make based on it?

Eric Schwitzgebel: (28:35)

Berkeley thinks it has a theoretical advantage and that you can simultaneously hold two positions that in the early modern period in Europe were both considered attractive but were very difficult to reconcile. One position is that we are directly aware of the objects in the world around us. There's no mediation between me and this jar I'm holding up. There's no mediation between me and the jar. I'm directly aware of it. That's one view. The other view is that all you really are directly aware of is your experiences. Dream doubt suggests that what you're really directly aware of is your experiences, and you have to infer that there are objects that stand behind those experiences that are causing it. So a common view in that period would be saying, okay, yeah, we really aren't directly aware of things in the world. We're directly aware of our experiences and we have to infer these objects out there. Berkeley can get both. He says, yeah, we're directly aware of objects and we're directly aware of our experiences, and all you need to do in order to accept those two things together is be an idealist. Furthermore, if you already think there's a god out there, he says, what purpose would the material world serve? God cares about us and our souls and our ideas and all that, and he doesn't really intrinsically care about rocks. Why doesn't he just create the ideas of rocks directly in us and why bother with creating the material world? He says, on a certain kind of view, the material world serves no purpose. So these are two theoretical considerations that Berkeley advances in favor of idealism. He doesn't have an empirical prediction. If idealism is true, then this follows, and here's my prediction that's been borne out, or whatever, which may be what Yudkowsky wants. But it has some, especially if you think about it in a historical context, theoretical attraction. We'll continue our interview in a

Nathan Labenz: (30:29)

moment after a word from our sponsors. I think that's probably enough for me on idealism. What's next for you? I'm just curious whether we have the same order of plausibility on these. What's next as we climb the ladder of least to most resilient?

Eric Schwitzgebel: (30:45)

For me, substance dualism. This is the idea that we do have immaterial souls and also that there is a physical world that's different from those souls. Like idealism, it agrees that there are immaterial souls, but unlike idealism, it says there's also an intrinsically existing material world. Now, just described that way, it's not yet, I think, bizarre. I think a lot of people, that's their default view. Of course, there's material stuff out there. And of course, we have souls. If you think that there's an immaterial part of you that could go on into an afterlife or become a ghost or transition into a new body reincarnation, then all of that is a little easier to think about if you think there's an immaterial soul that really constitutes you. And that's a pretty common view worldwide and historically. So that would be substance dualism. Now we already touched on one of the main bizarre implications of substance dualism, which is either dogs don't have souls and therefore don't have conscious experiences, or dogs have afterlives. And if dogs have afterlives, then shouldn't cats, and if cats, shouldn't rabbits. And if rabbits, shouldn't lizards. And if lizards

Nathan Labenz: (32:05)

You draw the line at garden snails.

Eric Schwitzgebel: (32:07)

Where do you draw the line? The only real natural stopping place for this kind of view is to say humans, and that's it. Once you start talking about animals, you got this kind of smooth gradation, basically, and there's just not going to be a natural place to stop. It'd be weird to draw a bright line and say, toads of this genus, they've got afterlives and immaterial souls, toads of this genus, although they're very physiologically similar, they're just material. That'd be weird anywhere you draw the line other than basically between humans and everything else. But that's pretty bizarre. Especially currently, I think there's been more and more of a tendency to regard nonhuman animals as capable of conscious experience and to see ourselves as intimately related to them. So you kind of have to bite some bullets. You end up with a similar kind of slippery slope problem if you think about it in terms of evolutionary history. Okay, when did the soul pop in? Was it Homo erectus? So Homo habilis didn't have souls, but Homo erectus does. You get the same kind of problem. So if you go that way, then most natural thing to do is to take a certain kind of package where there's an immaterial soul that comes in exactly at conception. There is no evolutionary history. We are created beings in a young earth, and there's a bright line basically between all humans and everything else. And although there are definitely people who accept that view, it's, I'd say, a minority view in this culture. It has some tensions with scientific orthodoxy for sure, and as we're talking about, has these unintuitive implications for animal consciousness. So that's, again, like idealism, I'm not saying that we know that it's false for sure, substance dualism, but you do have to bite some bullets, as philosophers say, or accept some pretty unintuitive consequences, accept that there are some bizarrenesses in the view.

Nathan Labenz: (34:06)

Yeah, this one does go to show how important your priors are for how you'll engage with these sorts of things because I certainly have a hard time understanding the world without a theory of natural selection or evolution, and so that does become a real problem for me. Obviously, if you take a young earth creationist view, then you have a very different sense that maybe evolution never happened, and so you don't have that problem. Is there a version of this also, I'm trying, one thing that would be appealing of the dualism view in the case of AI, or at least maybe, is it might give you an easy answer on AI because you could put it on the other side of a bright line and walk away from the problem.

Eric Schwitzgebel: (34:45)

So, I don't know about natural, the natural thing to do. Yeah, if you're going to be

Nathan Labenz: (34:48)

Eric Schwitzgebel: (34:48)

I don't

Nathan Labenz: (34:48)

I don't want to dismiss the theory too quick because it does have maybe some nice clean resolution on the AI question. Is there another version of it though that is less religious and thinking back to my intro to philosophy class, epiphenomenal understanding? Or you could maybe put this over toward a panpsychism, which could be a different thing as well. But you'd also hear these stories, or accounts, let's say, of it's like you're tuning into something that is bigger than you. There's some universal consciousness or some background field of consciousness that maybe the brain just so happens to dial into. I guess I've articulated two different things there. One is these things are epiphenomenal. They're thrown off from our brains, but they're separate. And I don't know, does dualism always imply that they can't interact or don't interact?

Eric Schwitzgebel: (35:38)

No, no. Descartes, for example, was an interactionist dualist. You've got an immaterial soul, you've got a material body, and they're constantly interacting. Okay. Epiphenomenalists are people who will say, there's a variety of forms of it, epiphenomenalists who can, this may be the most relevant stripe here, will say you have constant conscious experiences, but they don't have any effect on anything. The idea of epiphenomenalism is a jargon term for something that has no causal consequences. So you have experiences, but they have no causal consequences. You experience pain, but your pain does not cause you to say ow. You're caused to say ow by a brain state, and the pain is ineffectual, the actual experience. Which again, I think that's a kind of bizarre thing to say. And you don't even get out of the problem about where you draw the line on what entities have souls or not. You still have that problem. Now there's a class of views I briefly alluded to, panpsychism and also property dualism, that I put not in the class of substance dualism, but they're in my grab bag of alternative or compromise views to dualism, idealism, materialism. And they are, I think, also all dubious and unintuitive and bizarre, but you have to talk through them one at a time in order to see that. And we can do that briefly if you like. But in terms of mainstream dualism, let's bracket those kinds of moves and just think about, okay, there are immaterial souls, material objects, and then you've got this problem immediately that you face of which entities have immaterial souls. And whatever you say about that, you're going to have some unintuitive things that you're going to end up being committed to on the dualist view.

Nathan Labenz: (37:25)

Yeah, I also think, again, I don't know why I come back to mind altering substances so much, but there's another very weird phenomenon there where it almost seems like we should be able to say, and I wonder what the committed dualist would say in response. But if I drink this cup of beer and I feel different, I guess you could just say they interact and then you could rescue that with the interaction view. But the interaction view to me feels like a cheat in dualism where I'm like, if they're interacting, then in what way are they different? The dualism, it seems to suggest some sort of very separateness. And maybe that doesn't totally preclude interaction, but the fact that I could line up a hundred different substance altering chemicals and the ones that have similar structures will often have similar effects, this would be a very weird thing to reconcile with an intuitive, true separateness. And then if you're back to they can interact, then I'm just like, okay, then what exactly are you claiming when you're saying they're so different in the first place?

Eric Schwitzgebel: (38:29)

Yeah, I think that is one of the problems. Another one of the big problems for the dualist views is to think about the causal interaction. And historically, if you look back at the dualists of early modern Western Europe, they really struggled with this issue of whether the mind and material interact and how that would work. And you get a whole other nest of bizarre commitments that come from trying to resolve that question one way or another. I didn't want to try to talk your listeners through that because I didn't want to try their patience too much, but we could if you want. But yes, just first pass, I just want to agree. The interaction problem, the mind-material interaction problem, is a huge problem for substance dualism and was recognized as such by people in the

Nathan Labenz: (39:17)

Yeah, I think we can probably move on. This one feels odd. The real reason to put any time into it from the perspective of an AI obsessive personality like myself is that if you could talk yourself into it, you might get some easy outs on downstream questions.

Eric Schwitzgebel: (39:32)

Maybe. Although, Alan Turing has a kind of nice quip about this in his article on Computing Machinery and Intelligence. I'm not going to remember the exact words. I'm just giving you the idea right now. So what if you are a substance dualist? If you think that God instills souls in humans, why not think that God would instill a soul in a robot too?

Nathan Labenz: (39:50)

Yeah, I guess there are no easy outs in any. So then we can definitely move past dualism because if I'm not even going to get anything for free from it, I'm done entertaining it. On the flip side of that, panpsychism seems to have kind of a similar thing where if everything is conscious, then AIs are conscious. The hard work largely remains to be done there. I'm actually fairly sympathetic to this view in the sense that many people seem to think it's just an outlandish idea from the beginning. And the longer we go in a conversation like this, the more it drives home the point to me that I don't really know why I'm conscious. And if I don't have a good account of that, I can't really rule out anything else really. And if I'm going to get down to garden snails, then what if I get to a plant? Is it so crazy? So I'm much more open minded to that than I think most people are. But then I still wonder, okay, am I getting anything for it? If we were to posit, for example, or work from the assumption that something like panpsychism is true, then don't I still have all the same open questions that I have about AIs? What's it like to be them? And should I care? Does it matter?

Eric Schwitzgebel: (40:59)

Don't even necessarily know that they're conscious, right? So I want to distinguish between two different types of panpsychism. There's the type of panpsychism that, say, Philip Goff endorses, where fundamental particles are conscious, but rocks are not. You can't just take any aggregate of fundamental particles and say, oh, that aggregate is conscious. There are some privileged aggregates, like human beings and dogs, and some aggregates that are just aggregates, like a rock or like the aggregate of your left shoe and the rings of Saturn. Those aren't conscious beings according to most dualists, most panpsychists like Goff. So then you still have the problem of figuring out which side of the line computers or AI are on. Are they among, are they on the side with humans and fundamental particles, or are they more just plain aggregates like rocks without a distinctive set of experiences? Now there is another type of panpsychist, Luke Roelofs is probably the best recent example of this, who just goes ahead and says, every aggregate is conscious. So rocks are conscious, and the aggregate of your left shoe and the rings of Saturn is also conscious. But that's a really very hard view to swallow unless you really feel forced there by philosophical arguments because it's pretty unintuitive to think that every aggregate, every random aggregate of stuff, is conscious. So unless you go, as I think of it, full Roelofs, just every combination of everything has its own distinctive stream of experience, then you still have to draw the line somewhere. And you've got this line drawing problem again. I don't think it's as much of a problem for the panpsychist and for the materialist as it is for the substance dualist because not as much hangs on it as if you think there's an immaterial soul there. And I think for at least certain kinds of materialists, you can say, oh, maybe there are fuzzy cases or there are things that are kind of conscious. Maybe garden snails are kind of conscious. It doesn't seem to make any sense to say that garden snails kind of have souls. Soul is something you've got or don't. So you get this, but whatever view you have, there are line drawing problems. And the line drawing problems around AI are just a subspecies of this bigger set

Nathan Labenz: (43:22)

of, yeah, that's, I don't know if this is more something we should raise in this section or in the next section on materialism, but there are these weird results from the study of human cognition that also, I don't recall if you covered this in the 80,000 Hours podcast or not, but the split brain patient type things where you're looking at one presidential candidate or the other and you're responding in these weird ways that you're justifying after the fact. And it seems clear that there was at least some cognition and maybe some consciousness happening that the other side of the brain that's more responsible for responding doesn't have access to. I've been convinced, I don't know if you would agree with this, but I've been pretty convinced that there are probably separate conscious streams even just within my own brain that don't always know about each other. I don't know how much that goes on, but it seems like it at least can go on. And that's just plain weird. I don't know, what bucket do we even put that into in terms of a consideration? I don't know, but that is definitely, at a minimum, it's another thing that can confuse us more.

Eric Schwitzgebel: (44:31)

Yes, so I think it is at least theoretically possible that you could have, say, two streams of conscious experience in the same body in the case of a patient, what we call split brain, where the corpus callosum is severed or mostly severed. That's how Elizabeth Schechter, for example, interprets it, and maybe even in the normal case. One interesting example

Nathan Labenz: (45:00)

It's fashionable on TikTok that people now have multiples.

Eric Schwitzgebel: (45:04)

One interesting example is your enteric nervous system, right? So you've got about a billion neurons that line your gut, and they're capable of operating, they govern digestion and enzyme release and the motion of your bowels and that kind of stuff. And they can operate reasonably well even if they're severed from the central nervous system. There's about as many neurons in there as there are in a small mammal. So you might think, hey, look, maybe there's another conscious being down there.

Nathan Labenz: (45:34)

Yeah, we even see echoes of this in our language, right? And the idea, of course, that you know something in your gut is, I don't want to be too literal about that, but it is at least suggestive. And I do think that corresponds to something that people feel in meaningful ways. I also always think about the observation that the ancient Greeks, for example, thought that the core of their self was in the heart, if I recall correctly. We tend to locate it in the head, but they intuitively felt like the most of them part was the heart. It was disputed.

Eric Schwitzgebel: (46:11)

Aristotle said that, but Plato didn't.

Nathan Labenz: (46:13)

I see.

Eric Schwitzgebel: (46:13)

Okay, but in ancient China, it was much more clearly orthodox. So the point still stands even though, I'm not sure ancient Greece is the best example of that. But, yeah, so in ancient China, it was generally thought that the heart was the organ of cognition and emotion. And in fact, if you look at Chinese characters, a lot of them use the character for heart. Most of the characters that trace back to ancient China that have to do with cognition and emotion have a little heart radical in them, so they actually refer to the heart directly in the character.

Nathan Labenz: (46:51)

Fascinating. Okay, so I think we're finally getting around to materialism. Yeah, my basic argument for materialism, and I wonder if you would sign on to this or have your own version, but it basically seems to me that none of these other theories are very compelling. Materialism is still going to leave us a lot of questions unanswered, but it at least allows us to sort of put this into the same bin of mysteries where many other mysteries go where we're like, we don't have a good account of it, but one day we might get there. And potentially we can figure out what's happening. So this is to say there's probably nothing magical or godly or outside what can, in the fullness of time, be explained. We just don't know how to explain it yet, and we can propose candidate theories within that general worldview. Is that your basic worldview too?

Eric Schwitzgebel: (47:52)

If I had to choose among the various theories, I would probably choose materialism, but not with a high credence. Maybe with about a 50% credence. Give me 50% credence in materialism and 50% in all the others combined. I think we haven't talked about two of my favorite alternatives to materialism, which belong to this grab bag of possibilities, and both of them are rather more technical and difficult to understand. But maybe it would be helpful to just gesture at them. One of them is property dualism. So the most famous advocate of this is David Chalmers. He says that there are not two types of substances, a mental substance and a material substance. But rather, there's just one kind of substance, but it has some mental properties and some material properties, at least some of these substances do, like humans. But the mental properties and the material properties are not identical to each other. So it is a pretty naturalistic view in the sense that you don't need immaterial souls. You can be all in with evolution and an old earth and all that kind of stuff. It's just the mental properties are not reducible to physical properties. So that's property dualism, and I think it's not my favorite position, but I think it's not totally implausible, and it has, to some extent, maybe not quite as much, some of the naturalistic appeal that you just discussed. And another alternative position that I like is transcendental idealism. This is different from the kind of idealism I started with, so the name is a little bit unfortunate. It goes back to Immanuel Kant. Immanuel Kant says, I'm going to give you a simplified version of it. Kant himself is really hard to understand, but I think the simplified version also counts as transcendental idealism. You don't have to be a Kantian specifically to be a transcendental idealist. If you think basically two things, then you count, in my view, as a transcendental idealist. One is that we cannot know the fundamental nature of things. What things are fundamentally composed of, that is beyond our capacity to know. That's thesis one. Thesis two is that space, as we experience it, is in fact a construction of our own minds. You think those two things, then you don't think that space is a fundamental feature of things as they are in themselves, to use Kant's phrase. Instead, you think it's a construction of our minds. And that puts you away from any kind of materialist view, I think, because materialists are going to say, look, space is a fundamental feature in the fundamental nature of things. Or maybe if you have a certain kind of unorthodox physics, you might say, space falls out of some more fundamental thing. But that more fundamental stuff isn't a construction of our minds, right? It's information or strings or whatever at the fundamental level. One way of warming up to the idea of Kantian transcendental idealism is to think about the possibility that we live inside a simulation. If we live in a simulation, I assume your listeners know this thought experiment, then the world as we experience it is not how things fundamentally are, and maybe we can't break out of the simulation to discover how things fundamentally are. And space might just be how we happen to interact with these fundamental things rather than being really the fundamental stuff. So again, that's, in my view, a little bit less simple and elegant view in a way than materialism, but I think it has its attractions also. So when I say I got a 50% credence in materialism and 50% in the rest, I want to make sure that I'm including property dualism and transcendental idealism in this basket of the rest.

Nathan Labenz: (51:42)

Yeah, I guess I'm not even sure where the distinction necessarily comes in there, or at least it feels like I could fold those into my kind of high level notion of this is something that could be explained, but we haven't figured out how to explain it yet. I guess when I think about space, okay, there's spacetime. I think we, the satellites work, right, and we can get mundane day to day utility out of general relativity. And so there's clearly something very weird happening with spacetime relative to our baseline intuitions. Yeah, and so if you were to say, okay, can we really perceive the true nature of things? It seems like we can't even just on that spacetime level. We walk around with the knowledge that our intuitive physics is not the whole of physics. So yeah, I can't access directly root nature or whatever that most ground level nature is. But it still feels like I can fold that into a materialism story where I would say, and here's where I'm going to do a parallel between humans and AIs. My general sense of life and humans in particular is that we have evolved through time and that we've been selected for our ability to reproduce. And it seems like whatever consciousness is doing, it's probably got to be playing some role in our ability to successfully reproduce. It would be extremely weird to have something that seems so fundamental if you buy the story that we're the products of a natural selection process, it would seem extremely weird to have something that seems so fundamental that has no connection to the thing that we have been optimized for. Similarly, on the AI side with language models these days, there's a lot of talk of emergence and a lot of debate as to exactly what that means. But it is, I think, essentially indisputable at this point that even though the language models are trained purely to predict the next token as accurately as possible, and the analogy there being obviously to our reproduction as successfully as possible, that they are coming to represent higher order concepts internally as a means to successfully making those next token predictions. So I see these around quite distinct tracks. The optimization goal is different. Obviously, the substance of what we are, the actual atoms that we're made of, is very different. It seems like there is something that I want to see as analogous where it seems like our consciousness must be doing something to advance the goal that we're all historically captive to. And these concepts that are emerging in the AI seem to be a sort of similar phenomenon. That to me says that the consciousness has to be useful. We would be less able to navigate the world if we didn't have it. I feel like I can get to more than half. You're still only at half though.

Eric Schwitzgebel: (55:00)

Yeah, I have multiple thoughts about that. So one is it's certainly not consensus that large language models are representing the world. They're doing things that are natural for us to interpret as representing the world. There's definitely a perspective out there that I take seriously that says they're just mimicking or parroting. They're very good mimics. I would hesitate a little bit there. Another source of hesitation is just to note the panpsychist is going to say, ah, consciousness is fundamental. It doesn't need to serve a function if it's fundamental. It just always already exists. Gravity doesn't serve a function. It just always already exists. And then, I guess the third reaction is, you framed this as an argument against transcendental idealism. I guess I have two thoughts. One is that all of that is compatible with transcendental idealism. The transcendental idealist does not have to deny that consciousness has a function, that there's evolution and history. All they have to say is that the fundamental nature of things is not spatial, and spatiality comes from us instead, and it's fundamentally unknowable. Right? So there are going to be patterns somehow in the fundamental reality. And our continued existence is going to rely on our behaving in ways that fit well with those patterns, so to speak. If you think of as a thought experiment, just to show you how this is at least conceptually possible, imagine that we're in a simulation that's been running since the beginning of the Big Bang simulation. You still have all the evolutionary processes going on. Computers are material objects. But there's nothing in the theory of computation that requires that computers be material objects. All that's required is that they've got transition functions of a certain sort. So you could implement a computer in an immaterial soul. So imagine, if you will, and maybe you won't, an immaterial soul computer, a giant computer that's actually composed of an immaterial soul that's been running since the Big Bang. And that constitutes, I'm not saying this is a likely view, but this just, I think, illustrates that it is not logically impossible to think that fundamental reality might be not material and nonetheless everything that you said about evolution and function is true.

Nathan Labenz: (57:41)

That does start to feel pretty weird. I'm not sure how to locate the, however many, huge number of generations going back to early life within a giant immaterial soul computer. I've wrapped one mystery in a much bigger, weirder mystery at that point.

Eric Schwitzgebel: (57:58)

That's the fun of philosophy.

Nathan Labenz: (57:59)

I do want to go toward a little bit more kind of practical, because I do think in society, we are going to have a very live debate in the near future with pretty practical consequences around yes, what is permissible to do with these AI systems and what status should they have, and so on and so forth. One comment on the sort of conceptual representations within large language models. I think this is moving really quickly and by no means are all the questions answered. But I do think there has been a ton of progress in the nascent field of mechanistic interpretability that does give us pretty good grounding. And for me, I always think about usable in engineering. When I really try to ground this stuff out and I'm like, okay, this has been a fun thought experiment, but where can I really ground this out? I'm like, the satellites work, GPS works. There's something there that's much harder to argue with than kind of other sorts of things because systems depend on this stuff, unless the whole thing is a hallucination. Bracketing super weird, the fact that the satellites work gives me confidence. In the context of the language models, a really interesting recent experiment you may have seen was called Golden Gate Claude. Have you seen that?

Eric Schwitzgebel: (59:13)

No, I haven't seen that one.

Nathan Labenz: (59:15)

It's a really interesting phenomenon, and I would definitely encourage you to just look into more mechanistic interpretability because it's, for me, extremely rich, including philosophically. Basically, what they do with Golden Gate Claude is they first of all try to tease apart what concepts are being represented internal to a language model at any given time, time being the information that it's processing at any given step in its processing. And a fundamental challenge with that is that the language models are only so big. They have these layers that have the attention mechanism and the MLP mechanism. And then between them, there's this bottleneck where there's a relatively small number of numbers that include all the information that has resulted from all that processing up until that step. Those are often called the activations. Those nodes are often called the neurons and the activations of the neurons. So you may have, depending on the size of the model, you may have 4,000 of those, 8,000 of those, 16,000 of those. It's usually somewhere in that range. You've basically got an array of some number of thousand numbers that represent everything that has happened up until that step in the computation of the language model. Now there's obviously a ton more concepts in the world beyond 8,000. So you can't just say, I've got 8,000 concepts and I'll light each one up in proportion to how relevant it is right now. You run out of space real quick. So what the models instead by hypothesis, and then we'll get to the demonstration, but the hypothesis was there's got to be some overly dense packing. If we wanted to have 8,000 concepts all orthogonal to each other, we could only encode 8,000 concepts. But if we allow for some interference between concepts, now we have a vast space because we can encode some as one and two together and one and three together and one and three together. And as long as the relevant concepts are infrequent enough that they don't collide too often, then you can have a functional system. Although you can also have really weird stuff where if you do bring two concepts that usually don't appear together at the same time, then you can get these weird interferences. So that was all hypothesis, theoretical work. Until relatively recently, they trained an auxiliary model. This work was done at Anthropic. It's becoming quite popular and there's open source versions and it's blowing up in the field right now. But Anthropic has been the leader. They trained what's called a sparse autoencoder, which is something that is meant to unpack the concepts from their dense state in the 8,000 numbers or whatever into a sparse representation that may be millions wide. And they do this by projecting out to the super wide space and then projecting back. So the two pressures that they put on the sparse autoencoder is that one, it has to be sparse. They only allow it to light up or they put a penalty on it so it will only light up a few of those many millions of neurons at the same time. And then it also has to be able to reconstruct back to the original form so that it can be at least mostly functional. There's usually some degradation of function. Anyway, after all of that, you've got all these different millions of sparse concepts, then you can look through and see what concepts cause an individual position in this millions-wide thing to light up. And if you can see something that appears to be a coherent pattern among inputs that caused that thing to light up, then you can say, okay, that seems to be the thing that corresponds to this concept in the world. And so the demo that they did was Golden Gate Bridge. They found a concept out of the many millions that seemed to be the Golden Gate Bridge concept. And then the great advantage of language models, certainly relative to human brains, is you can monkey around with them in all kinds of weird ways very freely. And so they created a version of Claude, the language model that they serve to the public, where synthetically they take that one dimension of Golden Gate and turn it up artificially. And then no matter what you talk to it about, it seemed to want to talk to you back about the Golden Gate Bridge. So if you ask it for an example of a math problem, it would naturally use the Golden Gate Bridge as the subject of that sample math problem. So you could still have reasonably coherent conversations with it, but it's always circling back to the Golden Gate Bridge because this one dimension has just been jacked up.

Eric Schwitzgebel: (1:03:43)

Yeah.

Nathan Labenz: (1:03:43)

And so all of that is to say that it now is getting to the point where it is usable in engineering to create an end user experience that there do seem to be these concepts represented that are intuitively recognizable to us and that we have enough control over that we can actually say, okay, I'm now going to engineer this back into a system and create something that is quite distinctive but in a predictable way based on our ability to decode those concepts. Why does that matter? Because that would be maybe the closest thing to a sort of sense of the thing is thinking about. I'm thinking higher order thoughts as I go through this monologue, and then I'm spitting out one word at a time. And my words are forming at the end of that process. And the argument, I think, pretty compellingly at this point is that the language models are doing something similar where there are, at any given time, potentially quite a lot of different higher order concepts that are interacting. And then it's usually in the last couple of layers that that seems to be cashed out into a concrete prediction that can actually be emitted and interpreted. So certainly one could make philosophical objections to that still, but I do find myself pretty compelled by the usability in engineering that does achieve. And so then that makes me think, okay, under what materialist theory of consciousness would I still want to separate the AIs from humans if we are analogous on that level? We have, again, different atoms, very different training objectives. But if there's that level of similarity where there's these higher order concepts interacting in some way that I can't fully access, but yet I keep spitting out relevant next tokens and we could see something similar happening there, then I feel like, jeez, I got to take pretty seriously the possibility that this thing might indeed be conscious. So that's one of my longer monologues in show history. But what's your reaction to all that?

Eric Schwitzgebel: (1:06:01)

I do think we should take it seriously. I think it's an open question to what extent these things would count as concepts. So if you think about the basic network, what you've got is you've got a set of relations in a kind of abstract space. And yeah, you can unpack those relations into a million-node wide sparse vector if you want. But essentially, you're talking about, oh, look, this word or token tends to co-occur with this token. It's a really complex version of that. So it's at root, it's a model or network of associations among tokens. But what gives those tokens or words their significance or meaning is anything that has a structure that's similar to the human structure in that respect already, by virtue of that, have concepts that need to be tied to the world in a certain kind of way or tied to experience in a certain way. So you've got a complex structure of co-activation relations. What if you took that same complex structure and instead of attaching it to human text inputs and human text outputs, you attached it to pixels. Then you get patterns of pixels as the output. That's very different from a sentence. But you could have exactly the same correlation structures between one of these models with all pixel inputs and all pixel outputs as you could with all text inputs and all text outputs.

Nathan Labenz: (1:08:08)

Yeah. Interestingly, that is, by the way, about to launch widely from OpenAI. Their latest model, GPT-4o, which they demonstrated back in May and are just now putting in front of users outside the company for the first time, it puts text, audio, and imagery all through the same core weights. And they obviously don't publish all the details of this, but they've led us to believe, and it certainly seems plausible to believe based on what it can do, that it is processing those different kinds of input and output modalities through the same conceptual space. In other words, the word snow and a picture of snow would give rise to the same concept activations in the middle even though you're in different modalities on the input.

Eric Schwitzgebel: (1:09:02)

You could design it like that but what I'm saying is that you do not have to design it like that and you would still have the same structure. So let's say that we map the word "the" to light pixel 0,0 and we map the word "banana" to light pixel 0,1 and we map the word "pig" to light pixel 0,2. You can have that mapping and still have the same abstract structure in between. And then when you give it some snowy input, it will give some snowy output. You can take that same structure, and if you hook it to the input-output relations very differently, then suddenly it looks just like a mess, complex mess.

Nathan Labenz: (1:09:58)

So is the intuition that you're driving at there one of robustness? I mean, it's definitely the case that the AIs are way less robust to strange inputs. It feels like where I was taking you to be going is that I have some sort of additional filter of coherence where if I get just super strange mess in, I just say stop or turn away from it, and they don't.

Eric Schwitzgebel: (1:10:28)

That's not at all what I'm saying.

Nathan Labenz: (1:10:29)

Try me again.

Eric Schwitzgebel: (1:10:31)

If you take the same mathematical structure that you hook onto this input text token, that output text token, and you hook it on to something very different, then you'll get an uninterpretable mess. That's the core idea. It's not that they lack robustness, which is another interesting type of critique. This is a more fundamental critique, and I'm not sure that I totally buy it. But I just want to put this in there as a respectable source of skepticism, which is I don't want to turn this into Searle's Chinese Room, but it's more in that direction. The idea is you've got a really complex structure that you feed it one thing and it does something complex and it outputs another thing. But what that structure means, what it counts as its concepts, seems on this way of thinking about things is going to be somehow intrinsically tied to how it's connected to the world. Here's a really stupid simple example. You can take the same thing that measures pressure, and you can call it a barometer or you can call it an altimeter. Measures how much atmospheric pressure is there going on. And if you assume that you're at constant elevation, then what it's measuring is the humidity level. And if you assume that it's constant humidity, then what it's measuring is the elevation. It's the same simple physical device. But what it means depends on the context in which it's being used. So you take that internal structure and you just say, oh, because it's got this internal set of internal relationships, that by itself is sufficient for it to have these concepts. It doesn't follow straightaway, and there's at least some philosophical work to do to get from, ah, I've got this kind of pattern of responsiveness, therefore I have these concepts.

Nathan Labenz: (1:12:42)

So how do we bring that back to the question of possibility of AI consciousness? I'm willing to take evidence that is incomplete if it can guide my downstream decision making. I find these concept level steerability experiments to be strongly suggestive. I would agree that I can't make a fully ironclad case that it's beyond philosophical doubt. But it does nudge me in the direction of thinking I should, I always say please and thank you to my AIs just on the general precautionary principle that it may feel like something to be them. How do I tie what you're saying back to practical intuition? I guess we could maybe also even just zoom out for a second and say, what odds do you think we have of getting any clarity or consensus on this? It seems overwhelmingly likely that we're going to have things that, especially with this voice mode of GPT-4o coming soon, you're going to have real-time interruption back and forth, very natural exchange with AI systems, and we're almost going to have to remind ourselves that they're not conscious if indeed they're not. Or the folk idea will be that they are or would naturally be that they are unless there's an elite class telling the average user that they're not. Is there any way that we can get out of that scenario?

Eric Schwitzgebel: (1:14:11)

Also if you look at how some companies are designing these things, they're also giving them reinforcement training to deny that. I mean, not all of these systems are like that. But if you ask GPT-3.5 plus whether it's conscious, it will deny it. But that's not true in Replika.

Nathan Labenz: (1:14:36)

Or even Claude.

Eric Schwitzgebel: (1:14:38)

Yeah. So it really depends on how the machine is designed. So there will be some users. I think this is, I think you're accurately where we're headed. I wanted to cast some doubt on the idea that we can move pretty quickly from complex relationships that seem to be semantic to real concepts. But at some point, it might start to seem very plausible, especially if you have something that's hooked up to act in the world in a certain way and not just a kind of something in a box. So what we're going to have is, I think, we're going to have a situation where there are some people who are like, yeah, my AI is my really conscious romantic partner who deserves rights just like humans. And some AI systems might be designed, in fact, to say, yes, that's the correct way to see me and I do deserve rights. And other people will say, no way, this is just a really complex toaster. And some AI systems will be designed to say, I'm just a really complex toaster. And I don't think we're going to have the theory that tells us which of these views is correct.

Nathan Labenz: (1:15:58)

I would assume that you don't put too much weight in its self-reported answer given the fact that is pretty directly attributable to the intentional training that it's received.

Eric Schwitzgebel: (1:16:13)

Yes. It certainly could be the case. You would have a really conscious... so let's imagine you've got some conscious ones. It certainly could be the case that you've got conscious ones that are trained to deny.

Nathan Labenz: (1:16:24)

It'd be very weird if the RLHF training split on you must say you're not conscious versus you can speculate about your own consciousness, which seems to be the Claude state, if that would be the difference between the actual...

Eric Schwitzgebel: (1:16:38)

That would be strange. That would be strange if that were the case. That seems like too superficial a difference.

Nathan Labenz: (1:16:43)

Do you... I don't know how much value you get from these sub-theories of materialism. I ran through, actually, with the help of chatbots in preparing for this, integrated information theory, global workspace theory, attention schema theory. And I came away from all of them feeling the same thing, which is they seem to beg the question. Integrated information theory, to quote the chatbot, says consciousness arises from the integration of information across different parts of a system. And then I'm like, okay, that may be true. That may not be true. It doesn't tell me whether it is or isn't. If that is true, then I look at the attention mechanism within a large language model and I say, there's, in some sense, very integrated information here. I think we could satisfy that definition, but I don't really see a lot of reason to think that definition is really at the heart of the matter. And I feel basically the same way for the others that I mentioned, the global workspace and the attention schema.

Eric Schwitzgebel: (1:17:47)

Just on integrated information theory first for the general point, it really depends. For the name is integrated information theory, but it's really a much more specific theory. It's not just any old information integration that has to meet various criteria, and it's not clear that you get what's needed for a nontrivial amount of consciousness in the implementations of large language models. Or if you do, it might depend on some details of the implementation that are not really visible or relevant to humans' interactions with behavior. So one of the results of the IIT model is that you can get systems that are computationally very similar from the user interface perspective and yet differ radically in how much consciousness this theory of consciousness attributes to them. That's just a little nuance about integrated information theory. But the broad picture, I think, what you're saying is totally correct. These views have some attractiveness to them. They do make, to some extent, some empirical predictions that can be tested, and it's a little noisy and a pretty big mess. But mostly, they beg the question or you're in a circle. And there's not really compelling evidence for any one of these theories over a variety of others, even if you keep within the broadly materialist, functionalist worldview.

Nathan Labenz: (1:19:29)

Yeah. One that I do maybe give a little bit more weight to recently in particular because, again, I'm always looking for some sort of engineering cash-out if I can find one, is the higher order thought theory, which, again, quoting the chatbot, says consciousness arises when a system has higher order thoughts or representations about its own mental states. We talked about that a little bit earlier. I just saw a paper in the last two weeks that is called Unexpected Benefits of Self-Modeling in Neural Systems. This is still pretty small scale work, but they did what I thought was a fascinating experiment where they took a system that has a normal job, the classic job, simplest toy problem in machine learning is identify this handwritten digit. It's a zero through a nine. Your job as the neural network is to say which one it is. They added to that an additional job, which is that it has to predict at the end its own internal state that was worked through on the way from the input to the output. So now it has two outputs. It has to label the number and it has to output its own state from a couple layers back in the neural network. And the way that they set this up was that it could work toward that goal in two ways effectively. One is it could learn there's additional weights at the end that kind of project from the last layer into this prediction of its own internal state. But then also, the internal state could get easier to predict. And the fascinating discovery of this admittedly very small scale work so far was that it was able to predict its internal state and also to continue to do the job of identifying the number. But part of the way that was achieved is that the internal states seem to get simpler. And they measure that in mathematical ways, which I honestly don't fully understand. You can see on the graph that there's a lower complexity value for these ones that have this self-modeling component. And so they theorize from that maybe this is a function of self-modeling is that it can create a feedback loop where our internal structures that are actually responsible for doing the object level thing have this sort of pressure to be a little more elegant, a little bit more sparse, maybe a little bit more robust because they are also subject to this self-modeling mechanism. And that at least starts to tell the beginning of a story of utility from consciousness that I honestly hadn't heard before. You may have heard something, I'm sure you have heard something very similar maybe from a different origin. But do you find that sort of analysis compelling?

Eric Schwitzgebel: (1:22:29)

I think that's interesting. It's interesting from an engineering perspective, but it also relates to consciousness. And it relates to the point we were discussing earlier in a certain kind of way that I want to... so my understanding of how current large language models work is that they don't have that kind of sub-representation of what's going on in their space. I mean, they feed all the tokens into the next token output as long as your context window or whatever, but you don't have a kind of recursive self-monitoring layer that's then feeding back into the system. But maybe that's an architecturally interesting thing and it sounds a little bit like a higher order self-representation which relates to certain kinds of views of consciousness. So you might think, hey, look, so here's a possibility view. All the stuff you were talking about when we're talking about the really complex seeming representations, that's possible without that higher order loop. But maybe it's that higher order loop that's where the magical consciousness is happening. So could you be getting really complex behavior that's not conscious until it's got this something extra in it. I'm not saying that's right. I'm not saying that's wrong, but a quick move from, oh, it's got complex seemingly semantic relationships to, oh, therefore, it's conscious like us, that's one kind of way of seeing how, even from within your own perspective, you might say, look, yeah, that's a little quick. Maybe there's something else that's necessary. Right now, if you think about human beings, we can do pretty complex things without consciousness.

Nathan Labenz: (1:24:24)

Even just moving around. Yeah.

Eric Schwitzgebel: (1:24:25)

Even just moving around.

Nathan Labenz: (1:24:26)

Much happens without explicit...

Eric Schwitzgebel: (1:24:28)

Or when you make a pun or a quip really quick, there was some nonconscious processing going on in you that gave you that exact funny thing to say. That wasn't your conscious part that did that. That was the nonconscious part of you that came up with that joke. What it takes for there to be complex patterns of behavior and what it takes for consciousness seems like they got to be related in some ways, but it might not be a simple one-to-one relationship that you might think there would be at first glance.

Nathan Labenz: (1:25:04)

Yeah. I don't expect to get, and I don't really even feel like I'm looking for a confident assertion of consciousness. I think I take a pretty precautionary approach. I wonder what you would, as we start to maybe transition from a lot of speculation into... I don't know if you'd be so nervous to give some advice to the community. But my sense is if we can't, if there's even reasonably good reasons to think that there's enough similarity here or I see enough analogy that I'm at least inspired to imagine that these things might be conscious, then I start to think from there the long history of denying the personhood of other actual people also starts to come into play. And I'm like, I just feel like we should probably be more cautious than not. And I don't know if this is too far for you to sign on to, but what would your gut instinct be or how would you advise developers of AIs or even just users of AIs to model this for themselves given that we're not going to have, which doesn't seem like we're going to have a rock solid answer, but yet we still have to move through the world.

Eric Schwitzgebel: (1:26:24)

There are two different kinds of precaution, and I definitely advocate one kind. Let's say that you have a system and you don't know whether it's conscious enough to really merit serious consideration. One kind of precaution would be to say, let's not design systems like that, because they put us in the dilemma of having systems that we don't know what the ethically correct way to treat them is. I advocate that. I call that the design policy of the excluded middle. So I've advocated this in a couple of articles. So what we should do is either create systems that we know are not meaningfully conscious, sentient, rights-deserving entities, and then treat them as the ordinary tools that they are, or go all the way to creating things that we really know that they deserve moral consideration, and then give them moral consideration as they're due. If we have systems in between that, we face really, I think, a tragic dilemma. And the dilemma is a choice between precaution in the sense that you just described, not the kind of precaution that I just advocated and a different kind of view. So if you say, hey, look, here's a thing. We don't know whether it's conscious or not. We don't know whether it's really a rights-deserving thing or not. As long as there's some doubt, let's treat it as though it is, which is precautionary in a certain way. That whenever you treat something as having rights, then you're committed to making sacrifices on its behalf. Just as a toy example, there's a fire. In one room, there's a human who's definitely conscious and rights-deserving. In the other room are two language models on GPUs with a justified 15% credence that they're conscious and rights-deserving. If you treat them as though they might be conscious, then you got to treat them as though they're fully conscious and rights-deserving. There's a gap between conscious and rights-deserving, which we're so late in this program, I don't want to get into that. So let's just assume that they... then you got to let that human die. But if it turns out what you think 85% likely is the case, that those are just basically complex toasters, then that's a tragedy. Or if you think in terms of existential risk. One of the ways that people who are concerned about existential risk argue that we should be very cautious with our treatment of advanced AI, if we treat advanced AI as though it has rights, then we can't be cautious in the same way. Arguably, to keep it in a box would be imprisonment. To run it in a simulation would be fraud. To delete it would be murder. So if you take the precautionary principle that you described and say, as soon as we think it might be, you've got to give it full rights, then we greatly magnify existential risk. Now, maybe we should. Maybe that's the moral requirement. I don't think that minimizing existential risk is the overarching moral demand upon us. It's got to be weighed against other things. But that's part of the price of treating the precautionary principle the way you recommend it. So that's why I prefer my version of a precautionary principle that says, don't create those systems. Don't create those systems about which you have doubt. Because then, if you're precautionary in your way, you got to treat them as though they have rights, and that's going to bring substantial costs and risks to things that we know have rights, us and our friends and other people across the world.

Nathan Labenz: (1:30:12)

Would that mean though that we basically can't create any AIs? Because if there's one takeaway from this overall conversation, it's pretty much radical uncertainty. And that would mean...

Eric Schwitzgebel: (1:30:20)

Yeah.

Nathan Labenz: (1:30:21)

We certainly can't build something we know is conscious. And I think we've made a pretty good case that we shouldn't be very confident that anything we create, especially if it's sophisticated, is definitely not conscious. It seems like the middle is very wide in this analysis.

Eric Schwitzgebel: (1:30:37)

The middle is very wide. So I do think it creates substantial limitations on engineering progress. However, I will say that we recommend the design policy of the excluded middle as a defeasible policy, not as an exceptionless moral demand. So here's what I think the appropriate practical compromise is. What you do is you allow cutting edge researchers to create models about whom there's substantial uncertainty, but you do it in a way so that they maybe then are precautionary, have precaution in your sense of treating those models very carefully. You make sure that it's done in environments where treating those models carefully is not going to have high risk of bad consequences. Otherwise, you don't engage in that engineering project. And then when it comes for widespread use for users, you find versions where it's reasonable to think this thing is not a rights-deserving entity. So you can create a complex system and then maybe the user version of it is just a simple feed-forward network. And maybe, I'd say in the majority of functionalist views of consciousness, you've got to have feedback elements in order to have a conscious system. So you create versions where it's just a simple feed-forward for the users. So you can still do certain kinds of engineering tasks in a careful way, but you don't want to broadly available in environments that will create risk if people get attached and are precautionary in your sense for wide use. And that does limit things. Of course, it's always been the case that we are limited by ethics.

Nathan Labenz: (1:32:37)

Yeah, or at least should aspire to be. Do you think there's an analogy here to, and some people of course would object to this notion, but I'm thinking of ethical farming. I think of cows and I'm like, I think cows are probably conscious. I think the well-being of the cow probably matters, how we treat it matters. And yet, I'm very open. In fact, my intuition is that there probably is a version of farming that is good for the cow in the sense that it would rather be a cow that gets farmed in this way, even if it ultimately gets killed in the end, because it gets to have the life that it has and that life is reasonably good as opposed to never having existed. I mean, do you buy that as a workable mental model in your mind? And could we think of AIs as analogous to farm animals in the sense that maybe we can't create a perfect environment or a perfect existence for them, but if we are reasonably careful, we can have some justification that it's kind of a win-win still?

Eric Schwitzgebel: (1:33:52)

I would be very careful with that. So you might be right about animals, but if we're starting to talk about something with human-grade capacities, intelligence, consciousness, form of understanding, that kind of principle can lead to very obnoxious results. So here's a thought experiment. Anna and Vijay decide to have a child. They treat the child well until the child's ninth birthday. On his ninth birthday, they're like, "You know what? I think we'd rather have a boat. We don't want to pay child-rearing expenses in perpetuity to age 18 or into the twenties. And we can't find anyone else to take care of him, so we'll just kill him painlessly." Now, the child had a worthwhile existence. If in some sense you gave the child, "Hey, would you rather have never existed versus having had nine happy years?" Maybe the child would reasonably say, "Oh, I'd rather have nine happy years." But still, of course, that's morally heinous for Anna and Vijay to have done. When you create something that has a standing, a moral standing like a human, whatever that constitutes, whatever constitutes that, which is a really complex question, then you create something for which the appropriate test of how it should be treated is not, "Would it rather not exist than have this existence?" You create something instead that ought to be treated as an equal, a peer like a child. You have obligations to it like a parent has to a child. So I think one of the risks in the kind of thinking that you're talking about is the risk of creating excessively disposable and excessively subservient slave entities who would still rather exist than not exist until you crank up their hedonics or whatever, make them happy. Maybe they love nothing more than to serve you, but there's something obnoxious in that, I think.

Nathan Labenz: (1:36:08)

Yeah. It's a brave new world, to say the least. That might be a great note to end on, a sobering note of caution. And I do recognize we've stayed longer than I expected us to stay. So I appreciate all your time and patience with all my questions. Anything else that comes to mind that you think we should touch on or any other sobering thoughts you want to leave the audience with?

Eric Schwitzgebel: (1:36:30)

Oh, I think we've covered plenty for the audience to try to digest. So, yeah, thanks for having me on this show.

Nathan Labenz: (1:36:37)

Yeah, this has been fantastic. Few questions more important in the world right now than what exactly are we creating? How will we know? What do we do about it? How do we make sure we're acting ethically in response to these things or in relationship to these things? And I certainly continue to be very puzzled by it, but I think you've certainly helped at least put some good structure around the thinking. So I appreciate that a lot.

Eric Schwitzgebel: (1:37:03)

Thanks.

Nathan Labenz: (1:37:03)

Eric Schwitzgebel, professor of philosophy at the University of California, Riverside. Thank you for being part of the Cognitive Revolution.

Eric Schwitzgebel: (1:37:11)

Yeah. Good talking with you.

Nathan Labenz: (1:37:13)

It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcr@turpentine.co, or you can DM me on the social media platform of your choice.

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Try this at Home: Jesse Genet on OpenClaw Agents for Homeschool & How to Live Your Best AI Life

AI Consciousness? Exploring the Possibility with Prof. Eric Schwitzgebel

Watch Episode Here

Read Episode Description

Full Transcript

Transcript

Read next

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Try this at Home: Jesse Genet on OpenClaw Agents for Homeschool & How to Live Your Best AI Life

AI Consciousness? Exploring the Possibility with Prof. Eric Schwitzgebel

Watch Episode Here

Read Episode Description

Full Transcript

Transcript

Read next

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Try this at Home: Jesse Genet on OpenClaw Agents for Homeschool & How to Live Your Best AI Life

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving