What Biological & Social Systems Can Teach us About AI with Nora Ammann, Cofounder of PIBBSS Research

Watch Episode Here


Listen to Episode Here


Show Notes

In this episode, Nathan sits down with Nora Ammann, cofounder of PIBBSS (Principles of Intelligent Behavior in Biological and Social Systems), a research initiative focused on the principles of intelligent behavior in biological and social systems. If you need an ecommerce platform, check out our sponsor Shopify: https://shopify.com/cognitive for a $1/month trial period.

SPONSORS:
Shopify is the global commerce platform that helps you sell at every stage of your business. Shopify powers 10% of ALL eCommerce in the US. And Shopify's the global force behind Allbirds, Rothy's, and Brooklinen, and 1,000,000s of other entrepreneurs across 175 countries.From their all-in-one e-commerce platform, to their in-person POS system – wherever and whatever you're selling, Shopify's got you covered. With free Shopify Magic, sell more with less effort by whipping up captivating content that converts – from blog posts to product descriptions using AI. Sign up for $1/month trial period: https://shopify.com/cognitive

With the onset of AI, it’s time to upgrade to the next generation of the cloud: Oracle Cloud Infrastructure. OCI is a single platform for your infrastructure, database, application development, and AI needs. Train ML models on the cloud’s highest performing NVIDIA GPU clusters.
Do more and spend less like Uber, 8x8, and Databricks Mosaic, take a FREE test drive of OCI at oracle.com/cognitive

NetSuite has 25 years of providing financial software for all your business needs. More than 36,000 businesses have already upgraded to NetSuite by Oracle, gaining visibility and control over their financials, inventory, HR, eCommerce, and more. If you're looking for an ERP platform ✅ head to NetSuite: http://netsuite.com/cognitive and download your own customized KPI checklist.

Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.

X/SOCIAL
@labenz (Nathan)
@AmmannNora (Nora)
@eriktorenberg (Erik)
@CogRev_Podcast

RELATED LINKS:
https://survivalandflourishing.fund/sff-2023-h2-recommendations

TIMESTAMPS:
(00:00:00) - Introduction to Nora Amman and PIBS
(00:03:25) - Nathan's view that society is not ready for rapid AI development
(00:05:55) - Nora's framing of AI safety, alignment, and risks
(00:09:31) - The problem of epistemic access to future AI systems
(00:12:36) - Categorizing different research approaches to AI safety
(00:15:46) - Seeking epistemic robustness through plural perspectives
(00:19:00) - Expecting AI systems to continue getting more powerful and different
(00:21:13) - Nora's "third way" research approach using biological and social systems
(00:25:00) - Defining intelligence broadly across systems that transform inputs to outputs
(00:27:29) - Avoiding analogies and anthropomorphism when studying AI systems
(00:30:23) - Using multiple perspectives and assumptions to triangulate understanding
(00:35:13) - Translating concepts between domains more multifaceted than just "borrowing"
(00:38:15) - Learning from interdisciplinary practices like bioengineering
(00:42:05) - Key questions about the nature of intelligence and agency
(00:46:38) - Understanding hierarchical levels of agentic behavior
(00:49:32) - Aligning goals between individuals and organizations
(00:53:27) - Assessing the validity of assumptions and analogies about AI
(00:58:57) - Informing AI interpretability and evaluation with theory
(01:02:12) - Modeling interactions between AI systems
(01:04:56) - Using ecology as a framework for AI interactions
(01:07:46) - Developing a computational ecology theory
(01:11:45) - Learning from historical examples of societal transformations
(01:16:59) - Unprecedented speed and autonomy of AI development
(01:19:00) - PIBS programs for connecting researchers across domains
(01:23:39) - Seeking researchers to tie theory to concrete AI applications
(01:26:45) - Need for productive disagreement on AI risks and solutions

Music licenses:
W00ZOYMJNQKJRHLI
KRGPQ5EIWDAREX98
A5IVY76ZOYGZX1ZL



Full Transcript

Transcript

Nora Ammann: (0:00) The question then arises, okay, how can we at all make systematic progress towards understanding the future of AI systems where we do not have empirical access? We still want to make progress somehow, and ideally, we want that progress to be systematic and not just getting lucky. You should always be suspicious of the analogies and proxies you use. But if you use a bunch of them and they sort of converge or rely on some aspect, you should be more confident about those aspects. The pluralistic perspective and seeking epistemic robustness through plural perspectives is what I'm after.

Nathan Labenz: (0:33) Hello, and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas, and together, we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my co-host, Erik Torenberg. Hello, and welcome back to the Cognitive Revolution. Recently, I had the honor of being invited by Jaan Tallinn, leading AI investor and philanthropist and former Cognitive Revolution guest, to serve as a recommender in the Survival and Flourishing Fund's fall 2023 grant making process. Over the course of two months, five other recommenders and I worked our way through roughly 150 grant applications. And yes, I did read every single one. Conducting in-depth research and interviews with the groups that we considered most promising. And ultimately, we used a formal deliberation known as the S process to officially recommend grants to Jaan and the fund's other donors. In the course of that project, I had the opportunity to meet the leaders of a number of AI safety focused nonprofits. And now that the grant process is complete and the checks have been sent, I'm planning to invite a number of them on the show to talk about their various approaches to advancing our understanding of and preparedness for AI risks. Contrary to some recent conspiracy theories you might have seen online, I have found this space to be remarkably transparent with funders regularly publishing their grants, and this was the case for the Survival and Flourishing Fund grant process, and also the grantees themselves, often publishing their plans and status reports. So today, to kick off this series, and we will be weaving this in with other types of guests over the next month or two, my guest is Nora Ammann, co-founder and director of PIBBSS, a research initiative focused on the principles of intelligent behavior in biological and social systems. Regular listeners to the show will know that I often worry that most of us, myself included at times, are still thinking too small about AI. Because not only are the AI systems fast becoming more powerful, but we're also deploying them at a faster pace than any previous technology has ever been deployed. What this will ultimately entail, I, of course, don't know. But I do expect all kinds of surprises. And so with that in mind, I was excited to meet Nora and learn more about her work. She really impressed me as one of few people who is thinking appropriately big about the future of AI while also maintaining an impressive level of epistemic humility and rigor. Her work, as you'll hear, is an attempt to create an entirely new and highly interdisciplinary approach to understanding the future of AI. And while she's still early in that process and the research is fundamentally challenged by the fact that the AI systems of greatest interest don't yet exist, I'm certainly among those who believe that timelines to some form of transformative AI could end up being quite short. And thus, I consider her work to be both important and urgent. As always, if you're finding value in the show, I'd ask that you take a moment to share it with friends. This episode, in particular, I'd suggest sending to a brilliant friend currently working in a more mature field whose brainpower might be more valuable if they applied it to the study of AI. And with that, I hope you're inspired by this conversation about new angles on AI safety research with Nora Ammann of PIBBSS. Nora Ammann, co-founder and director of PIBBSS and research associate at the Alignment of Complex Systems Research Group at Charles University. Welcome to the Cognitive Revolution.

Nora Ammann: (4:22) Thank you so much.

Nathan Labenz: (4:23) So I'm really excited to talk to you because I think you have a really unique and interesting perspective on all things AI safety and AI alignment. And one of my big ideas and honestly, the reason I do this show is because I think broadly, we as a society are not really ready for what is coming at us with the rapid development of AI systems. And I think we need a lot more people bringing their unique perspectives to the challenge of figuring out what is it that we are creating and what is likely to happen as a result of all this. So you have such a unique angle on that, and I want to unpack it in-depth. But hopefully, one of the things I hope we can accomplish here is to maybe inspire some people to think that, hey, I don't necessarily have a background in machine learning, but maybe I still actually could bring a useful perspective to this grand society-wide defining challenge of our time of figuring out what's up with AI. So hopefully, as we get into all the different angles that you approached the problem from and all the work that you've done to start to bring people from various backgrounds into the study of AI, hopefully will inspire some others to come along. For starters, how about maybe just giving us an introduction to the way that you think about AI safety, AI alignment? Let's get the Nora take on this framing question.

Nora Ammann: (6:11) I think there exist quite a few different frames on what AI risk is about or maybe what AI safety and alignment is about. And I don't want to frame this as the right or authoritative way of thinking about it. I think it's actually productive to also play with different frames on this. But there is one way I like to think about it that is erring on the more pragmatic or deflationary sense. I'm like, okay, cool. In very simple terms, what's the main thing here? And we can still build on sophisticated theory of what exactly is the worry. But yeah, in the most simple terms, I think the way I look at the space is something like, first, well, it seems like we are building these systems that just become increasingly powerful, increasingly capable, increasingly sophisticated in how they can act in complex environments. So that's the thing that seems to be going on, ChatGPT coming out, et cetera. And then I think on the other hand, there's also a tendency for these systems to come to be in positions or come to be used in ways that really make increasingly consequential decisions. And then as a third premise here, it's like, we don't actually, in a very technical sense, have not figured out how to tell these systems to do anything we want reliably. We don't know how to do that. They do sometimes what we want, but sometimes they don't do what we want. And I think the combination of these three things is really worrying. So the more we automate big, important, critical decisions away to systems we don't know how they act, we don't know how to make do things that we want reliably. That seems problematic. And I think that's how I look at AI risk at the very sort of forward perspective. And then I think from there, we can sort of dig more deeply into like, cool, why in the first place is this hard to get these systems to do what we want reliably? Why is this hard and not easy? Yeah, why is this not easy? And then I think there you can get into a much more fine-grained or detailed sort of discussion about what are the reasons for this. Is this in fact the case? And then if so, what are the reasons for this? Which can maybe provide one frame towards, are the obstacles to be overcome so that we could have AI systems that we can trust in how they act or understand in how they act.

Nathan Labenz: (8:49) When people talk about AI safety, unfortunately, I think discourse around these topics is kind of growing coarser and more polarized at the moment in unfortunate ways. And one of the things that people often say is like, oh, you're just making everything up. And this is all super speculative and these systems don't even exist yet. And therefore, it's all fanciful and we shouldn't worry about it. And I think you have a great way of reframing that that is both a lot more prudent in my mind, but also invites new and more productive ways of thinking about the problem. And that's your framing of epistemic access. So tell us how you frame this problem of thinking about or trying to get a handle on these systems that we haven't even invented yet.

Nora Ammann: (9:48) Yeah. So the key idea here and this is also maybe the main motivating premise for PIBBSS, one of the projects I'm directing. It's just the idea that cool. So we are building increasingly advanced AI systems. Maybe in some important way, the systems we might be most worried about, the systems we least understand and also might be most powerful. Those systems don't necessarily exist yet and/or we maybe don't just want to do trial and error to figure out how they work because they, on the premise here, would be really powerful or could potentially already have a lot of potentially harmful impact. And what this describes, I think, is yeah, we have an epistemic access problem, and this refers basically to just looking at what is the normal typical way for science to make progress in understanding. Science, in its core, depends on empirical access to phenomena in order to be able to falsify your hypothesis, or in engineering, that trial and error approach. You try something, it doesn't work. You try a slightly different thing, it still doesn't work. You try yet another thing and it works. So that's in some sense, the core of the machinery of our collective societal sense making knowledge building machinery. And the premise here is that with questions regarding the future of AI, we don't have access to sort of the basic gears of that machinery. But nevertheless, we want to understand something about what's going to happen and how to make what's going to happen go well. The question then arises, okay, how can we at all make systematic progress towards understanding the future of AI systems where we do not have empirical access. And then I think from there, having that as the problem statement, from there we can be like, well, cool. We still want to make progress somehow, and ideally we want that progress to be systematic and not just getting lucky. And then I think about maybe the research space that is trying to tackle about these questions as something we can divide up in different spaces, depending on how they tackle that epistemic access problem.

Nathan Labenz: (12:15) So tell us how you categorize the approaches to this today. And I think you're, in a sense, creating your own third way. But to contrast your field building approach against those that have emerged already, why don't you start by telling us how you conceptualize the approaches to trying to understand this AI future that others have been pursuing so far? And then you can tell us about the new paradigm that you're developing.

Nora Ammann: (12:47) I mean, this is, again, not the only way of thinking about or carving up the research space, but I think it's one productive way. The idea is roughly, okay, so we have this epistemic access problem and different research or epistemic approaches try to tackle this epistemic access problem in different ways. So for example, I think maybe the most prominent way that is going on is what I would roughly refer to just as the most typical sort of ML technical safety types of approach, where here the key assumption that goes into the systematic approach is to say that the systems we are most worried about, like the future AI systems we are interested in, they will, in important ways, be similar to the state-of-the-art ML systems we have at the moment, such that understanding more about these state-of-the-art ML systems, understanding how they, the ways in which they may be unsafe, understanding how to make them safe, all of these insights, they will translate to these future systems we are interested in. So assumption being, in some important ways, state-of-the-art ML is similar to the future systems, thus our insights about current systems will translate to future systems. Another way, another research approach that is also used makes maybe another basic assumption. One I often talk about is mostly being about trying to think about, okay, let's imagine an idealized rational agent. Let's assume that highly advanced AI systems will act pretty much like these idealized systems. You might argue for this either because you think the future advanced systems we are thinking of, they are just very powerful, so they're just in practice will come very close to acting like these idealized systems, or because you think this is just a good way to get started. Think idealizing is something that is happening across the sciences, right? You start with an idealized, simplified model, and then you maybe over time try to nuance and make it more realistic, but that can also be a productive, epistemic approach. So I think that's research directions that are often informed by maybe formal epistemology, decision theory, rational actor models in economics, and then making arguments from this. So a bunch of the very classical discussion on AI risk in particular, where we're talking about basic AI drives, something that Omohundro introduced, I think, in 2012, or instrumental convergence, power-seeking arguments. These are often conceptual arguments that look at, okay, an idealized, or going towards an idealized powerful agent, they would see these constraints, acting in ways, and what risks does that propose, or again, what would it look like for these systems to be safe? So I think these are characterized as like two important, high-level, epistemic approaches.

Nathan Labenz: (15:46) Hey, we'll continue our interview in a moment after a word from our sponsors. But obviously, that's not all that one could do. So I guess another big just brief commentary for me is that I always kind of keep coming back to this thought that we're all still thinking too small. And there are certainly plenty of big thoughts to be found in both of these different approaches that you've described, but it does seem still like there's a lot out there that is not captured by either kind of school of thoughts so far. For one thing, I'm always repeating this mantra that neither the human brain nor the transformer are likely to be the end of history. And so I don't know how different the AI systems of the next few years are likely to be as compared to what we have today. But I would be very surprised if it's just a bigger transformer that continues to drive progress. And I certainly think we're seeing even just over the last few months, as far as I can tell, an acceleration of results where new architectures that have pretty profound differences from transformers, especially as it relates to essentially kind of bringing recurrence, ways of bringing information forward through time that do not depend on this super finite attention window. That seems to be where a huge frontier is. And that's such a weakness of the current systems that they have these very brittle kind of finite memories that that alone seems like it could really change how the AIs of 2025 look compared to and behave compared to the ones of today. On the pure reason side, who knows what will happen there? I'm old enough to remember when the thought was like, yeah, we're going to get these hyperrational agents that are going to never make mistakes and that's going to be really powerful but problematic, whatever. That doesn't seem to be so much the kinds of things that we're getting. But then you look also at some of the fundamental research that hasn't quite hit the deployment phase yet, and you do see some proto-agents that look like they may also have a real role to play in the future. So I do think those are, again, both like there's a lot there for both of them. But even then, I would be like, yeah, it still feels like it's a little small. This is what has attracted me so much to your work. I'm always kind of like, but how is the world going to change in response? There's so many everything we've talked about so far kind of tries to look at the AI largely in isolation and figure out, like, what is it? What does it do? Can we look at the internals? Can we change around how it works internally? Can we detect what's going on inside? Whatever. And very little of the work that I see gets into what are the dynamics going to look like as things get deployed. And that is at least that's one of not the only, but that's one of the things that I think is really interesting about the line that you are taking on this. So with that setup, tell us about your kind of third way to approaching the understanding of AIs that don't yet exist.

Nora Ammann: (19:24) Yeah. So first, I'm going to do the very broad framing, and then we can get into some of the more details of what concretely this might produce in terms of research. I think the very broad framing on this broad epistemic approach is something like the premise or the underlying assumption that maybe intelligent behavior as a phenomena in the world is governed to some extent by the same or similar principles, irrespective of what system or specific substrate or specific scale it's implemented in. And if this assumption is true, at least to some extent, then it warrants the idea that like, oh, can we go look at currently existing systems that implement intelligent behavior in the wild that we can epistemically access and insights about how intelligent behavior is implemented in the wild, can that help us inspire or actually just transfer insights to the AI question specifically? So that's, and obviously, that's a very broad space, right? Here I'm talking about intelligent behavior being implemented in the brain, so that's a very standard example. I'm talking about in biological systems, whether that's in an evolutionary sense, in an ecological sense, in the sense of what do my cells do compared to my entire body, and how do things scale up into tissue and into this larger organism. And I'm also talking, to some extent, about social systems, right? Like, several human agents, economic agents interacting with each other, producing complex, sophisticated behavior dynamics that, again, we can look at from this intelligent behavior perspective and ask, are there parallels? Are there insights we can use from there?

Nathan Labenz: (21:16) Is there a definition of intelligence that you could extract from that? Because, you know, it's really interesting. I feel like I'm somewhat intelligent. I feel like cells, typically people don't think of as being intelligent, but then you look at some video of an amoeba running around, and it's clearly doing something pretty sophisticated. And I don't think people usually think of the economy as necessarily being, you know, I think we maybe just kind of couple the notion of intelligence to our subjective experience, but you're obviously working with a somewhat broader definition. Do you have a definition of intelligence that you use?

Nora Ammann: (21:58) I am definitely working with a broader notion of intelligent behavior here. And I think in particular, I'm in general, I'm coming at it less from a place where I'm like, oh, I know what the definition is, and this is the definition. This is what I mean by it. And more like, cool. So there's definitely stuff going on in the world that's complex and sophisticated and doing things. And some of it looks quite like intelligence, as I would naturally think of, some a bit less. But what is the difference between these things, really? Is there a bright line? And so far, there isn't. What is just really going on under the hood? And maybe with the aspiration of hopefully or possibly finding what I often refer to as a naturalized account of intelligent behaviors, so I'm interested in, yeah, how is this implemented? Right? Insofar as I'm, as a human, can be described as doing intelligent behavior. What is it that I'm doing? And then how is this similar or different from what other systems are doing? And I'm not trying to say, well, all these systems are doing exactly the same thing or intelligent in the same way, but I am interested in similarities and dissimilarities of just the dynamics they're implementing, and it seems like a lot of these systems are able to do what we might describe as systematic work. An economy is able, to some extent, to allocate resources efficiently. That's if you would give me the task to do that, I would have to think really hard and would be pretty bad at it. There's work involved, and an organism developing from a single cell into a fully developed organism has to figure out which parts have to go where, and what proteins to produce, et cetera. So that's, there's work happening. How is this implemented? How is this actually happening? And is there, yeah, are there shared principles to how this occurs?

Nathan Labenz: (23:55) Sometimes I just think of inputs and outputs. You know, that's kind of how I always have a habit when I'm talking to somebody who's developed, whether it's a new research product or a project or a commercial product or whatever. I try to ground the discussion a lot of times in just saying like, okay. What are the you've made some AI thing. What are the inputs and what are the outputs? And then we can get into how it's doing that transformation. Maybe that's one not too presumptuous way of thinking about there's something in the middle between inputs and outputs that seems to be doing some not always predictable, but useful for some purpose, transformation of these inputs to outputs, whether it be behaviors or literal products. But I like the way you kind of resist some of these labels or familiar terms that people just kind of use perhaps without fully grappling with what they contain or what they imply.

Nora Ammann: (25:00) And maybe if I can just motivate that a bit more. I think this is quite crucial to do insofar as the very thing we're confused about is what sort of thing will future AI even be? What sort of process will this instantiate? And if we start with pretty anthropomorphic concepts to start with. I'm just not particularly confident that this will be wait. It might turn out right, but I would like to be right for the right reasons, right, and not by luck. And I think there's some arguments just coming from a place of cool. Our minds have evolved in an evolutionary context that make me good at noticing phenomena at a certain scale, temporal and spatial scale, but not at a different one. There's phenomena that happen at much lower time scale that they're just not, you know, they don't get propagated into my attention or at much faster scales, and I can't see them. And I think so there's anthropomorphism creeping in here. There's some things my mind is trained to see and be able to understand or notice. And yeah, what if an AI system will be sufficiently alien or instantiate sufficiently alien behavior that I won't by default see it, I will actually need to enhance my ability to sense the world around me with these scientific theories or ways of measuring things that will allow me to see beyond my sort of evolutionary primed geospatial and temporal scales?

Nathan Labenz: (26:40) Yeah. It's quite a high wire act, really. For myself, I've definitely put myself into the mostly the first camp to date, right, of in the three buckets that we've used here. The first one being based on the assumption that future systems will look like current systems. I'm not always necessarily fully conscious of that, but I guess I'm also just trying to wrestle sometimes with what are the current systems and what can they do, before even trying to get into the future. And in that line of work, I really try to avoid analogies. You know, because I find often and, again, I have the benefit of mostly focusing on currently existing systems that I can actually bring the normal procedures of experimentation and science to, not always with full access to the weights, of course, but with at least some access to things that really exist. And I find it more often than not when people try to make analogies or use metaphors that that ends up confusing the situation more than it helps. But again, with this problem of epistemic access where you're really specifically saying, want to try to figure out what the future might look like, and I'm not going to assume that it's going to look like the present in these particular ways, then you sort of have no choice but to make some leap. So I think that's why I say it's a high wire act because you're really challenged to figure out what parts of current ways of thinking are useful and at what point do we maybe cross over into diluting ourselves or confusing ourselves. You can maybe just unpack that at a conceptual level more, and then I definitely want to get into some of the particulars because your big project is essentially recruiting talent from different backgrounds and starting to refocus their energy on the study of AI. And it's been cool to see how many different types of people you've been able to attract already. Hey. We'll continue our interview in a moment after a word from our sponsors. Nathan Labenz: (28:58) Let me just give one more sort of clarification where I think so far we've framed the epistemic approach quite centrally around what will future systems look like, and I think this is an important motivator. But I also want to make some argument about how I think it is very important to look at current systems and actually how some of these approaches are also meant to provide productive insight into current systems. There might be several ways this might happen. This might either happen by, one sort of more mundane example might be, hey, neuroscience has several decades of attempts behind itself trying to figure out what's happening in the brain. What methodologies have been developed? What methodologies have been shown to work more or less well? Are any of those important to AI interpretability, ML interpretability? And so we had some fellows in the past basically just making that translation, but that translation is going towards trying to understand current systems but taking inspiration from how we have studied intelligent behavior in other types of systems. Yet another example might just be, yes, we want to look at current ML systems, but what are different frames of theories or models through which to conceptualize ML systems? Yes, there is a low level matrix multiplication thing going on, but what are other things? Active inference, FEP, free energy principle minimization frames might be one example here. It's trying, it's self-conceiving as being a theory of cognition. Is this a productive frame to use to look at state-of-the-art ML systems? And if so, what does it suggest about their behavior, how they will scale, what they will do if they come into touch with the world around them? Yeah, just want to sort of clarify, it's not just about future systems that don't exist yet, but maybe also what are useful frames to understand currently existing systems.

Nora Ammann: (30:59) And also even between those two, there's the question of as current systems just get more broadly deployed and begin to shape the environment, then you've got all kinds of dynamics that I think are very much not yet in existence, but probably more predictable. And I think that's also a really interesting middle ground that your work touches on in various ways. So when we were preparing for this, I had organized my thinking and the way that I was going to approach the questions around the different fields, their different kind of academic backgrounds that people might be coming from. And I used the term borrowing, and I was kind of inspired a little bit by Tyler Cowen, who is always going around saying, where are the models? Nobody has written down an explicit model of what's going to happen, what's going to go wrong. Now they're starting to be a little bit of that. And I think you're, again, building the field that's going to bring more of that. But you challenged my thinking there a little bit and said, okay, first of all, borrowing, if I understand it correctly, you're kind of saying borrowing is often a little too narrow. It sort of suggests that there's a key idea from a particular field that we can just kind of quickly snap over. And it's unfortunately not usually that simple. I guess in the analogy frame, it would be like assuming a tighter analogy than in fact exists. So tell me how you prefer to think about the move from one kind of background to studying AI.

Nathan Labenz: (32:44) It's really, really multifaceted. I was thinking a bit about it, and I don't have one single answer to this is the thing that happens or something. Actually, a bunch of different things can happen. But there's a few different things I can give examples of. So I mean, I think this can range from, you know, very simple. There's some facts about the world that might be interesting. The size of mammalian brains—that can be a fact that sort of informs some of our priors around thinking about or forecasting more on AI. That might be a fact that we take from one discipline that's helpful, but at sort of kind of small scale. And I think from there, we can really extend to theoretical frameworks, models that live at different levels of abstraction, and for each of which we can sort of ask, is this a productive way of asking this specific question in AI alignment? Whether you're like, hey, should we use game theoretic frames to think about what are the strategic implications if several AI agents interact with each other? Can we borrow principal-agent concepts from economics to conceptualize what we mean by the alignment problem? Can we use methodology? So sort of mathematical, often methodology, say from complex systems to model complex interactions. The example I gave before—what methodologies exist in the neurosciences that have been used to try to understand what's happening in these highly connected networks? So, yeah, we have a series of concepts, methodologies. And then I think there's also something slightly more meta—epistemic practices. So I've been recently reading up on something that I've been finding quite inspiring, this sort of philosophy of science, history of science work on the field of bioengineering, and I feel like it's a pretty productive sort of metaphor or example case study for some work in AI alignment. There, I guess, the main idea here is something like, in bioengineering, you have, on the one hand, sort of high level theoretical work on, you know, how different organisms, how different cells work, what is happening within a body. And then on the engineering side, you're interested in being able to do specific things with your knowledge—control specific things, create specific products that help us to do something specific that we have in mind. And bioengineering is sort of the combination of these two. How can the theory help us produce the sort of pharmaceuticals or something that do what we want them to do? And what you see in bioengineering is that you have to—real world organisms are really complex, really high detail, and also sometimes it may be unethical to just fiddle around with them. So you rely a lot on building simulations or models, which you build specified by your understanding of the theory, and on which you then do sort of a bunch of engineering work, and then sort of only occasionally go into the actual organism and see whether this worked out. And I think this conversation between applied and theory that we can study in this case of bioengineering, for example, is a pretty apt metaphor as well for, I think, some of the AI alignment or AI safety work that I'm most excited about, where I want that to be applied work that actually has a concrete path to helping us make these systems safer, but I also want that work to be well-informed and sort of conceptually rigorous. We're using a lot of terms—from intelligence to agency to goal-directedness to deception—that are really vague, and in order for you to come up with an experiment that actually—whether the results that you get are actually what you think they are or something—to get results that you can actually trust, we need for these experiments to be well-motivated and conceptually well-grounded, and I think that's sort of where this theory and applied work comes together. And so, yeah, that might just be a third example where it's more like, what was the practice of different fields that help us do different types of interdisciplinary work in AI safety?

Nora Ammann: (37:04) If I understand your approach correctly, I guess there's the sort of intellectual level of the approach and then the practical approach. Instead of saying, hey, AI is a big deal and we need a lot of people to think about it, let's go get some people from here and there and, you know, try to assemble them. It seems like you're starting with questions that you think are important about the future of the AI timeline and then working from those out to go figure out, okay, well, what are the best, most applicable, most likely to produce useful insight fields, I guess, is a word—ways of thinking, methodologies perhaps are the better term—that we could try to pull in. So do you want to kind of run down some of the top questions that you've been focused on? And then for each one, you can kind of unpack, I guess, first of all, why do you think that's important? And then, you know, where have you gone to find people who might have the right toolset to bring to bear? And then this is probably a good time also to mention you are actually doing this. You're bringing people together. This is what PIBBSS is. It's a program that actually helps facilitate people making this sort of leap. So you can also get into specific individuals if you want and projects that they've done that kind of show, you know, some of the early fruits of this approach.

Nathan Labenz: (38:39) Cool. So I'm going to go through sort of different clusters, which is partially how I'm thinking about this. I guess one cluster that is quite fundamental is something like just in sort of quite fundamental terms, understanding what is intelligent behavior, what is agentic goal-directed behavior, what is autonomous behavior, which is relevant because part of the premise is that we might be coming to instantiate these types of behaviors artificially. And, of course, if you get really powerful agentic or autonomous behavior, well, that might be dangerous if you don't understand it well. But then also, at the same time, these terms are really fuzzy. What do we even mean by them? Can we do science on them? Can we have some sort of grounded mechanistic understanding of what's going on here? How do these behaviors come about? And also, what's the logic of how they work? And that is a big cluster. And just some examples, I guess, to invoke—I think I would say there's an entire cluster on sort of complex systems, which itself is a very diverse cross-disciplinary field, but various methods in complex systems that are, I think, essentially trying to ask, how do we get agents out of atoms? At some point, we just had a world full of atoms, they seemed kind of easy to understand, and then at some later point, we had a bunch of agents doing things. What's going on? So I think these are examples. And then a bit more concretely, I guess recently we have been talking more with people from the field of artificial life, which has—this can be sort of more applied mathematical work. This can be more simulation-based work, but a field that is essentially asking, what is agency? How does it come about? Can we implement it? Can we thoroughly understand how it works so that we could come to implement it? What does that field have to offer to AI alignment insofar as a lot of the risk phenomena we would be worried about have to do with these complex, sophisticated behaviors? But then I think another way of going about this would be sort of more from the biological angle, so biological systems have developed these behaviors. There is more and more fields trying to bridge maybe this biological cognitive angle, where, well, how do biological systems come to exhibit these cognitive tendencies? So I've mentioned active inference earlier. There's sort of basal cognition type work that is trying to ask, out of these very simple systems, how do we get complexified and how can we scale up such that they eventually exhibit these very complex, sophisticated behaviors? And then also from sort of the social sciences—an example here might be, there's a long-standing question the social sciences is asking, or even sociology asking, the sort of agency versus structure debate. If you're trying to explain social phenomena, when should you explain them in terms of individual human agents taking action, and when should you explain them in terms of higher level structures having that causal force? And there's just sort of—this is essentially, in the end, the question of causality. Does the market do things? Do political institutions do things? Or is it, at the end of the day, always human individuals? There's a philosophical question here, and there is, to some extent, also a question of how do you model this, and understanding emergent causality, emergent phenomena. Yeah, so that's definitely a cluster where a lot of different fields, I think, have potentially productive insights. Yeah, understanding how sort of multiscale systems work. And I guess maybe one other example in here would be something like, how are these complex systems structured? So there's in biology, we see a lot of examples of modularity, so you have modularity in brains developing, modularity in other examples, so this seems to be something that evolutionarily gets selected for. You also see that, to some extent, in social systems. So there's some work asking, where do the boundaries of firms come from? Why are firms—why do firms have boundaries in this way and not some other way? Why are we not just contracting everyone? That's weird. And what are the answers to that? Which I think is, in some sense, looking at a similar thing—how does structuration and individuation happen in different systems?

Nora Ammann: (43:20) Are there any examples of projects that you have brought people together to do within that cluster?

Nathan Labenz: (43:27) They're all quite detail-heavy. So how to best talk about them? Maybe one example would be someone sort of coming in with the premise of, you know, there was a time when we were really confused about what temperature was, and we weren't sure whether temperature was a thing or whether there were several things going on. And eventually, we came to be able to measure temperature and have a sense for what that phenomena is. And what if the same is true for agency? What if agency is actually a thing? Agency, agentic behavior, what if that's a thing that could be measured across systems if we just understood it properly? And then going into often sort of physics-inspired questions like, what would it mean to come up with a solid measure of agentic behavior? That might be one angle. There's another line of work that I'm interested in, which is sort of related to hierarchical agency or hierarchical alignment. So it seems like we see in a lot of systems in the wild that they actually have this hierarchical structure where there's maybe, to some extent, agentic behavior happening at different levels of abstraction. And you might wonder what level of abstraction is the one you should be looking at, or what's the relationship between these levels of abstraction? So one example here might be, you know, there's individual cells, there's tissues in my body, and then there's me overall. And all of these have, depending on what frame I take, some claim to be a relevant level of abstraction to look at. In particular, usually my cells kind of are nicely aligned with what I want to do, but occasionally you get things like cancer, where there's individual cells sort of getting information decoupled from the overall system and then starting to kind of act against the goal of the higher level system. So that might be an example here of a hierarchical misalignment. And then you can start to ask, how, a, in the first place, do you scale up these levels of interactions or shift to get something coherent at the higher levels? And then also, what are ways that the alignment gets kept in place or breaks? And obviously, you see this in other places where you could look at the economy and be like, there's individual economic agents. There's high level abstractions that—usually in economics, we think a lot about high level abstractions, like organizations, et cetera. So that's a line of work which you can tackle both conceptually, both from the biological angle of how does this happen in biology and from the math complex systems angle, like, how do we even model this sort of phenomena?

Nora Ammann: (46:14) Is it too soon to ask if there are any insights that, you know, are simple enough to impart to people?

Nathan Labenz: (46:22) Yeah. I think very clear results are a struggle just in simple terms pointing at them. I do think this is a frame on even formulating what we might mean by AI safety or AI alignment that I think is a productive frame among other ones. So some people are maybe familiar with this—some of the arguments for why AI risk might be concerning, for example, what is called inner optimizers or mesa-optimization. Sort of a dynamic where you have an outer sort of optimization process, but that's sort of spinning off an inner optimization process that can come to have goals that are misaligned from each other. A sort of simple example of this might be potentially sort of evolution overall trying to optimize inclusive fitness, and then that spitting out an agent like humans or a species like humans that have goals of their own. So that would be a simple example—some nuance neglected here, but that seems to be an example of a hierarchical thing in a similar way I've described it before happening. So getting an answer to what are the keys to alignment between hierarchical levels will potentially be productive here. Or I guess another frame on this could be something like, we have developed in the end of the twentieth century language to talk about the strategic implications of multiple agents interacting with each other at the same level of interaction, which is game theory, which we can now use to talk about what is happening in different situations where we get multiple agents interacting. You know, what would something like vertical game theory look like here? Can we come to say something principled about what are the strategic causal interactions between agents at different levels of abstraction? Can we have a mathematical framework that allows us to think about this more productively?

Nora Ammann: (48:20) So this is all under the cluster of questions around where does agency come from? Is it something that we could potentially have a quantitative measure for? Is it inherently hierarchical or whatever? And then if you were to take this same question of AI agents to somebody who is an application developer, they would say, well, I know what an AI agent is. It's a language model that I give a goal and put in a loop and kind of allow it to take a step and then get some feedback from the environment and then just have it take another step. And I just kind of keep looping until I either hit my goal or I get stuck long enough that I abort. And I guess I wonder, how do you develop an intuition for which assumptions or sort of—where are the valid analogies and where are the invalid analogies? I also look at a transformer and I'm like, in some sense, you could say this is hierarchical, but in other ways, it's kind of one directional. It's certainly not that much like the hierarchy of our bodies, where there's sort of these clearly different scales and it seems like the higher level scales largely can control or direct, but also take feedback from the lower. It doesn't seem like that kind of stuff is going on in a transformer. Maybe it is, and we just can't see it or it's happening in some sort of weird smushed, you know, smeared out way over the course of the forward pass. But all these questions are so interesting, and then I'm still at the end, I'm like, how do I know how much of this I should believe? How would you kind of coach me in developing my sense for that?

Nathan Labenz: (50:14) The whole point about analogies or sort of proxies is that no single analogy or proxy is true. That's what we mean when we say analogy or proxy. Instead, I think the thing to go for is that when we're very confused about a phenomenon, we have sort of limited immediate access to it, and it's, you know, the territory is very detailed, and we only have maps to try to think about it, and they're never perfect. Instead of being like, what's the perfect map that will give me all the answers I want, what I'm interested in is something one might describe as a sense of epistemic pluralism, where if I get a bunch of different proxies, different maps, different perspectives on the same phenomena, I can sort of triangulate between them, and then I can come to form what are hopefully more robust guesses about what actually holds about this system. So the more sort of independent from each other perspectives I have on this thing, if they suggest that feature A is something, and only one of those perspectives says feature B is also something, I have more confidence that feature A is in fact a thing. And because I think we sort of lack the sort of this is the one true map for understanding this phenomena, the sort of pluralistic perspective and seeking for sort of epistemic robustness through plural perspectives is what I'm after. So that's one meta answer. You should always be suspicious of the analogies and proxies you use. But if you use a bunch of them and they sort of converge or agree on some aspect, you should be more confident about those aspects than about others. So that's one. And then I guess there's another thing to say here, which is there's already different types of analogies. It could be, well, we could loosely talk about, oh, in what ways is a language model like a human, sort of speculate about that. And then there's another way in which I'm like, oh, we could have some sort of mathematical model and be like, how would we describe this phenomena in question through this mathematical model? Different levels of precision, different sort of epistemic statuses to both of these things. The former may be more geared towards inspiring new hypotheses, which is a valid thing to do, but you're just generating hypotheses at that point. And another may be more geared towards falsifying whether a specific hypothesis is productive or not.

Nora Ammann: (52:47) Maybe another way, as I'm trying to put this into my own words, another way I might frame it is a lot of this work takes the form of if this, then that. And we don't know what value of this is going to pertain at various points in the future. But by doing a bunch of if this, then that sort of work, we can hopefully be more prepared and be ready as new things start to emerge and say, okay, what is this? And hopefully have a bunch of ready analysis for a bunch of different possible futures that we may encounter. Better to go push all those frontiers in advance, even not knowing which one will actually pertain. But it then just becomes really important to get clear on what are the assumptions, and what would we need to see in a future AI system to have confidence that this way of thinking about it is going to be productive?

Nathan Labenz: (53:57) That seems right. And just maybe giving a bit more—one slightly more applied example still. Again, definitely on the more concrete end, one of the things I'm most hopeful about is this sort of theory being able to productively inform both interpretability and sort of evals, evaluations type work. So in evaluations type work, what you're doing essentially is, okay, here is a risk phenomena. Here is some sort of dynamic this AI system could produce that we would be worried about. Can we find it in the AI system? And for that to happen in a way that I would, you know, be somewhat confident it actually succeeded, we need to have a good characterization of what we mean. And if you're trying to look for, does the system, you know, act autonomously, pursue goals over long horizons, is it deceptive—you need to have a good account of what you mean by deception. And it's actually pretty fraught. It's actually pretty hard to specify what we mean there. So I think that's, can we come up with frames, accounts here that can help us do evaluation work better? And the other similar application is something like, in interpretability, you also, to some extent—I mean, you can do very bottom-up interpretability and just be like, what do I find? But you can also be like, what am I even looking for? Am I looking for planning? Okay, cool. What would that even look like? And these questions, it just gets hard and having sort of an epistemically plural way of informing how to interpret what you're finding seems productive, and I think that's at the more at the applied side of the spectrum.

Nora Ammann: (55:42) So you want to move on to some other clusters of questions? Nora Ammann: (55:46) Yeah, I guess an important other cluster. It's in a similar way of just pretty fundamental, trying to understand what we are dealing with, but a bit more focused on minds and cognition in general. What I had in mind here is most of the brain sciences, neuroscience, cognitive sciences, and then to some extent the philosophy of mind, comparative studies between animal and human cognition. How is cognition evolved evolutionarily? There's some work on this. Can this inform our understanding of what we even mean by doing higher level cognition? So should we have a more grounded way of thinking about this in the AI case? This can then go to different places, right? From, I've mostly mentioned applications for interpretability. How do humans come to be pretty good at deductive reasoning and arithmetic? How is this similar and different from transformers? What about the ways in which transformers are brittle? Should we just expect they're going to stop being brittle entirely? Because human brains also are just big connectomes. What's going on here? Where do motivations come from? Right? Where do intrinsic motivations come from? Could we understand this process well enough to just engineer motivations into systems that then are reliable, or what are the limitations to that? How does theory of mind work? Right? Higher order animals all seem to have this capacity to model other agents. That's a wild thing. AI systems with theory of mind, we're getting into maybe higher order capabilities around deception or situational awareness. What's that like? Again, human minds seem to do a lot of this. What is it? How do we identify it? How might it come about in AI systems? There's also some work happening, Steve Byrnes is working on brain-inspired alignment proposals, right? Taking this brain analogy really seriously and being like, what would it now look like to actually make AI systems safe based on that? So not just interpretability, but safety proposals. So there might be another application. One other cluster, and you have actually alluded to this several times, I'm glad I can finally mention it as well, I think there's something really important about multiple agents interacting. Definitely, classically, the AI risk discourse has focused a lot on the way a single advanced AI system could be pretty harmful, and I think this is relevant in many ways. But I also think there's important reasons to expect that what we will see in coming years is a bunch of different AI systems interacting with each other or AI systems interacting with a bunch of humans. Can we come to understand what's going to happen here in a more principled way? One frame on this I kind of like is something like, you know, interpretability basically is trying to look into the system and understand based on its internals, what does it do, how does it behave, and why. What I want is kind of like interpretability but for interacting agents, and I think we're currently extremely far away from that. We are currently even, this is, very, so we had some fellows last summer working on this, and I was very excited about this, but it's also very evident how pre-paradigmatic we are. Right? It's already hard to study complex interactions, because in order to make it more tractable, you need to simplify, and then you need to be like, am I simplifying away something that's important to understand what's actually going on? So that's already hard. But then now in the AI case, you're trying to model how different AI systems interact, but we're not even sure what we mean by AI system. Right? Do we mean LLMs? Okay. Cool. We can put in GPT-4 and be like, how does GPT-4 interaction with several instances of each other work out? Or different characters within the same language model or different architectures for AI agents. Right? We're even confused about what we mean by AI, what sort of mind is it. We have very weak theory on how even to understand its behavior. But if we model humans interacting with each other, we're also very complex. But I think our folk psychological assumptions about how we might interact with each other, while sometimes problematic, we still have better reasons to expect that they are sometimes valid. But our intuitive assumptions about AI systems and language models, I think they're just even more fraught and even more likely to be unjustified. So I think it is really hard to make good progress on this, but there's various reasons why there might be a lot of risks just coming from the emergent dynamics between interacting models, and some work has been done on characterizing the risk from this. So yeah, can we either use game theoretic models? Can we use something like ecology? Right? Ecology is interested in different types of species interacting over a set of resources. What are the dynamics? What are the reproductive dynamics? Etcetera. So that could be a frame, or obviously economic frames are looking into this.

Nathan Labenz: (1:00:56) I don't know a lot about this, but I've always kind of thought that ecology seems like a really good frame of reference just because the different forms of life are very different, you know, and whatever ecology is doing, it is at least forced to find some way of integrating into some single account that all these different things. And I mean, you think good God, you've got an octopus and a human and an elephant, and that's not even that far flung in terms of how different things can be. And so maybe, you know, you could kind of fit pretty weird AIs into those frames as well. I know you have had some folks who have come to the PIBBSS program from ecology type backgrounds, any kind of, again, results or sort of frames that they have developed that you think have been useful for developing your own intuitions for what might happen in the future?

Nora Ammann: (1:02:00) It's definitely early days, and I think there's not that many examples of what we've had so far. So I have one good friend who's starting a PhD in ecology who's interested in AI, so I'm kind of interested in that, and I think has gained some insights from that. And then we had some fellows last summer coming from an evolutionary and ecological perspective. So I think those people have mostly been working on, okay, can we even come to find good experimental setups that we then can eventually run and give us good insights? But one of the main high level insights is it's really hard to come to make good choices here, right? If you let LLM systems interact, what's the analogy here for the environment? What are the resources they, in typical ecological models, you often think about space, right? You want to have control over different space, control over different resources. What's the analogy here in textual space? What are we trading off? What are they coordinating over? How do you run these experiments? It's just all very early days. I think another meta level insight here is something like, a challenge to translation here is that I mean, I think this is definitely coming, but traditionally, I think ecology still is very, you know, informed by going into the field and being looking at the actual animals and tracking and what's happening there. And I think this is not where we're at for what we're trying to do here, so we're sort of more talking about a computational ecology type direction, and that doesn't really exist. And I think it needs roles and individuals who have who happen to find the freedom in their academic context to be like, hey, I want to combine these insights from ecology with these questions in physics and math and AI and computer science, and then you need a supervisor or a faculty that's like, okay, you should do that. So that's not a given. Yeah. A thread here that is a bit theoretical, but I think that I'm interested in is specifically better understanding meta-evolution dynamics. So I think we have, people in general have a pretty good sense for evolution and what sort of pressures evolution creates and what that predicts about, or how that shapes our expectations for, you know, how different systems get shaped. But there is, even in the field of biological evolution, a lot of still open and unclear work on, are evolutionary pressures themselves subject to change, and under what logic, and what does that mean for then the organism, or the evolution of the organisms? And is that relevant to understanding future minds, to understanding the sort of pressures that act on what motivations and type of behavior we will see emerge from AI systems. Yeah, I think this is conceptually very interesting, but also still on the conceptual side and not very applied yet.

Nathan Labenz: (1:05:13) What kind of techniques are you seeing people explore? I can imagine word-based analysis, just at a language level. I can imagine pure math, pure analysis. And I can imagine simulation, I guess, just to name three. What sorts of approaches are you seeing people take and, you know, what seems to be kind of most fruitful so far?

Nora Ammann: (1:05:41) I've seen some early examples of all of these. They all seem potentially relevant. I think it's hard. So textual analysis, right, let different instances of LLMs interact with each other and prompt them in some way iteratively and see how, you know, the sort of thing they answer evolves. That is, you know, a thing to look at, and you in principle seem like you could get at what are the dynamics of interaction that come out of this. But I definitely have not so far seen an experimental setup here that has given me much new Bayesian evidence towards something being true versus not. So I think I'm conceptualizing this mostly as, this seems clearly work that's warranted. This seems clearly something we need to look into, but we're not really at a place at the moment where I'm seeing results that really make me update drastically. Mathematical modeling, most, I guess, of what I've seen is something like formalizing what it would even look like to use a specific model on AI interactions or modeling AI systems through that, and then sort of discussion about how you would apply it and what are maybe the confusions or shortcomings of that. And that, again, seems productive, and I feel excited about it. Not very concrete things to point at in terms of results. Simulation, yeah. I think this is one of the directions I'm pretty excited about, specifically to start to inform evaluations. I think a lot of this is about how do we set up, how do we design experimental setups for current, or state of the art ML systems to, with any meaningful confidence, derive, are they capable of X or are they not capable of X? Under what conditions are they? What sort of capabilities can you elicit from them when they start to interact, not just in some artificial sandbox, but with lots of environmental affordance from the wider world? How much do their capabilities shift when deployed in the wider world that has a lot of, you know, detail and affordances that they might not have? So what would it look like to have a good environmental setup, experimental setup to elicit this? That's something I think I'm most excited about. Work is in progress there, I mean, from sort of evaluations groups, like ARC, ARC Evals, trying to see whether these models can reliably self-replicate, testing this, to trying to, you know, what would it look like to test whether an AI system evolves something like situational awareness. I'm kind of excited about someone working with us who I think has sort of early stages of a theory of how we might conceptualize situational awareness mathematically, such that that can then inform what sort of environment experiment would we have to design to plausibly elicit this phenomenon in AI systems.

Nathan Labenz: (1:08:57) Are there any historical moments that you think we could learn anything from in terms of just other times in history where we were confronted by something very new and had to kind of figure out or were challenged to figure out what it might imply. I mean, there's a lot of analogies that people make between AI and the Industrial Revolution. And I named this show the Cognitive Revolution kind of on that basis. Don't really know much about, my sense is that our track record isn't great if you looked at the predictions from the beginning of the Industrial Revolution. Hopefully, we have better tools now, you know, and better frames to bring to it. But are there any historical moments where people collectively were able to see around the corner and get a handle on what was to come?

Nora Ammann: (1:09:50) I guess an answer will come in degrees. Why? Not shakily. So I guess what I'm interested in here is maybe something like, I think Industrial Revolution and some other examples, electricity, whatever, seem like pretty good examples of transitions of innovation that has shaped society pretty transformatively. And I think there's things to learn from this. A transition that I mostly think we haven't faced so far as a society, and which I think is like one way of thinking about what we're facing at the moment, is us actually becoming able to artificially instantiate highly complex intelligent, in the sense of autonomous goal-seeking behavior, which is in some important senses different from inventing electricity, inventing specific tools. That really forces us to grapple with these very complex phenomena. There's some recursive thing going on where a bunch of atoms start to act in some coherent ways over long time horizons, are able to self-maintain against dissipative pressures and sort of act in such a way that presses through the world. Right? They seem to pursue things despite, you know, facing environmental obstacles. Something really complex is going on here. Something that's on its own terms able to shape the world around us, and I don't think we have meaningfully dealt with this sort of thing. I mean, there's nonhuman animals around us that have some of these capacities, but our approach here was to control them pretty reliably, and so not in a way that they clearly spun out of control or understanding. So I think that's like a historic, one way of thinking about the current historic moment or potentially the current historic moment that I struggle to find analogies to.

Nathan Labenz: (1:11:53) Yeah. Unfortunately, as I listen to you, my mind goes to the Neanderthals didn't necessarily see us coming. And you know, we could even interbreed with them, but still somehow we were different enough that, you know, it didn't go so well for them in the grand sweep of history.

Nora Ammann: (1:12:14) And I think a lot of this is about, you know, the speed or sort of the iteration loop. If you say, you know, how much time you have to react to this. When things happen really quickly or otherwise in such a way that humans can't easily understand what's happening around them, our ability to intervene and course correct, or shape the trajectory of the development, becomes smaller. And I think that's, rather than wanting to say that I think what we're looking at at the moment is qualitatively different, I think what we're looking at at the moment is different on the continuum, on this dimension, where we are more confused about what we're building, and this is developing much faster than whatever evolution has produced so far, such that it's just a much more drastic case of something that has been going on to some extent from economic improvements. Right? Industrial Revolution has also sped up a lot of things. But already we were struggling with adapting to what should the institutional framework be for making sure people are still well under these new conditions. But I think what we're facing is just this, but much faster, and then potentially also just with this autonomous dimension, you know, I'm not sure how much civilizational capacity we have to deal with this particularly well.

Nathan Labenz: (1:13:41) Yeah. I mean, it's an intimidating challenge, I commend you for taking it on in such a direct way and in such a unique way. From all the things I've seen, I really think your approach of trying to start with these questions and then go figure out, even if it's not perfect or not even close to perfect, who might have the best chance at getting traction on these questions. I think that is a really inspired approach. Maybe in the last few minutes, we can kind of talk about what sort of people you are looking for. You're actively running this program and bringing people in, but also you have a reading list, which is a place that people can go to get up to speed on what you deem to be the fundamentals. And obviously, this is something that people are going to be increasingly, I think, open to. As you mentioned earlier, if you're a PhD student or whatever and you need a mentor who can be open-minded enough to encourage you to go in this direction, I think that is probably going to become increasingly common. So I'd love to just get a little bit more on kind of the very practical, what sorts of people do you think should get into this? What do they need to do to start to build their foundation from which they can really start to hopefully make progress? What kinds of people are you specifically looking for in the program that you're running?

Nora Ammann: (1:15:11) Yeah, just very briefly, a quick outline on a few different programs we run where the answer is very slightly different. So we have been running for the last two years a summer fellowship, which is just a three month, you know, summer fellowship, research fellowship program where we pair, and this is mostly sort of PhD level and sometimes a bit more senior than that, with researchers in AI alignment, where the idea is to just match people with expertise in one of these source domains, one of these domains that study complex intelligent behavior in natural systems with AI alignment researchers. This is three months, so not always enough time to get to really groundbreaking insights, but definitely get to build traction and that mutual surface area between the fields. And then much more recently, we started a new program where we're taking on research affiliates. They tend to be more research experienced, where the aim is really to support them in a more tailored and longer term fashion to just much more get them to pursue their individual personal research agendas. And then we have some sort of more lower key programs, the reading list and Speaker events from time to time and the reading group that was curated. I think it's a pretty good frame of just thinking through why is AI alignment hard and not easy, if it is, but why don't we not get this right by default? What are the obstacles we're meeting? Why is this hard? So it had to potentially also, to some extent, characterize, this is the sort of conceptual obstacles to be overcome if we are meant to get this right. So this is meant to sort of provide some surface area in a way that's hopefully pretty digestible for people from fairly different backgrounds to just build surface area with, what are the risk phenomena we are grappling with in AI safety, to then hopefully provide this sort of surface area such that someone with a specific domain expertise can notice, oh, maybe this approach we are using in my domain, maybe these sort of mathematical tools, these conceptual tools, or maybe a productive angle on tackling one of these risk phenomena. So yeah, going a bit more into, okay, cool, what is the sort of talent we're maybe particularly excited about? I think it's, I guess, there's three things. Something that seems important is surface area with risk, safety. What are the sort of obstacles to getting this right? What are the sort of intricacies to where risks come from, or how maybe we can achieve safety. So just some sort of surface area with the AI safety discourse seems very useful among others, and this is maybe the second point to say here, is this is all pretty early stages. Right? There is no good one paradigm telling you this is how to think about it correctly. There are a lot of open questions, a lot of confusions, and especially also in this sort of knowledge transfer question, I don't have all the answers for, if you have a background in game theory, this is what you should work on. I have some ideas. Some are more speculative than others, but a lot is really needing to grapple with, yeah, what could it be? What seriously could it be? And I think for that, you want to both understand your source domain really well, have expertise, but also understand what would a useful application actually look like? And yeah, I think in my experience, it just involves both just knowledge, skill, but also some sort of maybe research experience or just willingness to dive into studying open-ended questions. I think I'm also at the margin particularly excited about people who are willing to, even if they don't come from an ML engineering end, to try to really tie in their domain experience with, cool, how could this inform concrete applied work, either conceptually, in terms of experimental design, in terms of evaluation or interpretability? I think this edge of really tying theory up with applied is a really difficult one, and what I'm particularly excited in trying to foster. But, yeah, that is not to, that is neither to undermine the value of theory nor the importance to actually do things that have some concrete application or feedback loops in concrete application.

Nathan Labenz: (1:19:51) It's going to be a challenge. You know, it sounds like there's certainly a lot more questions than answers, but I guess my pitch to the audience is that the study of AI is also super dynamic, still full of lots of low-hanging fruit, it seems. You know, it's like we're both pre-paradigmatic and still picking the low-hanging fruit at the same time. In terms of impact, you know, the move from a more mature field where you may be refining techniques, you know, to a field of AI where, you know, it is just wide open, I suspect for many people could feel quite liberating and, you know, an opportunity to really ask big and important questions. You know, some fields are far less affording of that, certainly as they mature. So I do think this is something, you know, the more we've talked about just how early it all is and how much remains to be done to figure out how to map these different approaches to trying to figure things out onto the AI challenge, the more I'm just like, man, we need more people to look at themselves and say, maybe I should be one of those people. So my kind of call to action again is don't count yourself out just yet. You know, there's nobody, there's no anointed group right now that has got it under control or that is going to, that we can say with any confidence is going to, you know, answer the big questions. So pile in and, you know, sign up for perhaps a bit more of an unpredictable and wild ride than you were perhaps first thinking, you know, when you got into a PhD. But know that the sense of adventure, the impact that you can make, the scope of the questions that are on the table to be tackled is just incredible. And, you know, for the right personality, that is super invigorating, you know, and can be a lot of fun as well as, you know, potentially extremely valuable to society because this is coming at us pretty quick, it seems. And, you know, again, I just think we need every bit of brainpower that we can bring to bear on it. We will definitely need that. Any closing thoughts on your end?

Nora Ammann: (1:22:31) Yeah. There aren't established answers to any of this. And in particular, we don't have established answers to the solutions. And, also, there's confusion around how to really clarify how should we best think about the risk. And I think I mean this in a productive sense, where we neither want to, you know, maybe all of this is not worth worrying about nor, you know, we're going to die in this very specific way. I think productive and plural perspective on what could go wrong, that seems important and not over-indexing on one specific way. So I think both room for pluralism on what different solutions could we find and also what is the right way of really understanding what is at stake here, what is on site, where do the risks come from. Productive rational disagreement should be had on each side.

Nathan Labenz: (1:23:22) Love it. Well, more questions than answers, but again, I invite everybody to join Nora Ammann of PIBBSS and the Alignment of Complex Systems Research Group in jumping in and trying to tackle them. For now, Nora, thank you for being part of the Cognitive Revolution.

Nora Ammann: (1:23:40) Thank you so much.

Nathan Labenz: (1:23:42) It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcr@turpentine.co, or you can DM me on the social media platform of your choice.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.