Approaching the AI Event Horizon? Part 1, w/ James Zou, Sam Hammond, Shoshannah Tekofsky, @8teAPi
Part 1 of this live special explores AI for scientific discovery, U.S. AI policy, and AI agent behavior, with James Zou on virtual labs and interpretability, Sam Hammond on geopolitics and AI consciousness, and Shoshannah Tekofsky on emergent agent behavior.
Watch Episode Here
Listen to Episode Here
Show Notes
Part 1 of this live special dives into AI for Science, U.S. AI policy, and the behavior of AI agents in open-ended environments. James Zou explains how interpretability and virtual labs of AI agents can accelerate scientific discovery. Sam Hammond assesses the Biden administration’s AI policy, U.S.–Gulf AI deals, and the odds current AIs are conscious. Shoshannah Tekofsky shares insights from studying agent performance and emergent behavior in the AI Village.
Use the Granola Recipe Nathan relies on to identify blind spots across conversations, AI research, and decisions: https://recipes.granola.ai/r/4c1a6b10-5ac5-4920-884c-4fd606aa4f53
LINKS:
Sponsors:
GovAI:
GovAIwas founded ten years ago on the belief that AI would end up transforming our world. Ten years later, the organization is at the forefront of trying to help decision-makers in government and industry navigate the transition to advanced AI. GovAI is now hiring Research Scholars (one-year positions for those transitioning into AI policy) and Research Fellows (longer-term roles for experienced researchers). Both roles offer significant freedom to pursue policy research, advise decision-makers, or launch new initiatives. Applications close 15 February 2026. Apply at: https://www.governance.ai/opportunities
Blitzy:
Blitzy is the autonomous code generation platform that ingests millions of lines of code to accelerate enterprise software development by up to 5x with premium, spec-driven output. Schedule a strategy session with their AI solutions consultants at https://blitzy.com
Tasklet:
Tasklet is an AI agent that automates your work 24/7; just describe what you want in plain English and it gets the job done. Try it for free and use code COGREV for 50% off your first month at https://tasklet.ai
Serval:
Serval uses AI-powered automations to cut IT help desk tickets by more than 50%, freeing your team from repetitive tasks like password resets and onboarding. Book your free pilot and guarantee 50% help desk automation by week four at https://serval.com/cognitive
CHAPTERS:
(00:00) About the Episode
(03:19) Intro and past projects
(07:23) Multi-agent teamwork challenges
(13:26) Learning to discover science (Part 1)
(19:18) Sponsors: GovAI | Blitzy
(22:25) Learning to discover science (Part 2)
(27:09) Predicting health from sleep
(31:55) US–China science collaboration
(34:15) Software singularity geopolitics (Part 1)
(34:21) Sponsors: Tasklet | Serval
(37:09) Software singularity geopolitics (Part 2)
(44:42) US strategy and energy
(51:34) Gulf energy and compute
(55:36) AI surveillance and rights
(59:31) AI consciousness and politics
(01:06:07) Model personalities in village
(01:19:14) Agent swarms and teamwork
(01:25:29) Agent vulnerabilities and control
(01:29:18) Deception and cutting corners
(01:33:51) Outro
PRODUCED BY:
SOCIAL LINKS:
Website: https://www.cognitiverevolution.ai
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://linkedin.com/in/nathanlabenz/
Youtube: https://youtube.com/@CognitiveRevolutionPodcast
Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk
Transcript
This transcript is automatically generated; we strive for accuracy, but errors in wording or speaker identification may occur. Please verify key details when needed.
Introduction
Hello, and welcome back to the Cognitive Revolution.
You're about to hear Part 1 of what turned out to be a 4-hour live show that I co-hosted with my friend Prakash, also known as @8teAPi on Twitter, on the topics of AI for Science, Geopolitical Competition, and Recursive Self-Improvement.
With everything moving so quickly in the AI space, I'm actively looking for ways to shorten my own personal productivity timelines, and to deliver high-quality analysis in more timely and time-efficient ways – and talking to 6 top-notch guests over the course of 4 hours is … one attempt to do that.
In this part 1, which we're publishing as a standalone episode, we talk to:
- Professor James Zou of Stanford about his work on AI for Science, which ranges from applying interpretability techniques to protein models to building virtual labs of AI agents;
- Sam Hammond about how the current US administration is doing on AI policy, what the US is really getting out of its deals with gulf countries, and why he believes current AIs are at least as likely to be conscious as not;
- and to Shoshannah Tekofsky about the many fascinating observations she's made and lessons she's learned from a deep study of AI Agent performance and behavior in the open-ended setting of the AI Village.
In Part 2, which we'll release tomorrow, we talk to:
- Abhi Mahajan, also known as @owlposting, about AI for Biology and Medicine
- Helen Toner about a recent report on automated AI R&D within frontier model developers;
- and Jeremie Harris about the twin security dilemmas at the heart of the strategic AI landscape.
As you'll hear, the challenges of making sense of massive disagreement among leading experts, and simply keeping up to date with AI developments broadly come up repeatedly in these conversations, and to be honest, nobody has great solutions. One that I can recommend, though, is using LLMs to help identify blindspots, and for that purpose I'm really enjoying the blind spot finder Recipe that I recently created in Granola. Granola works at the operating system level of your computer, so it can capture all the audio in and out, including, if you wish, the contents of this episode. And its Recipe feature can work across sessions to identify trends, opportunities, or blind spots that only become apparent with that zoomed out view. Obviously this is a tool that grows in value over time, but if you want to try it, I suggest downloading the app, starting a session while you play this episode, and then asking it to identify blind spots based on this conversation. What's so cool about this feature, for active Granola users, is that the blind spots it identifies for you will be different from the ones it identifies for me.
With that said, this episode was a lot of fun, but because it's a new format, I would love your feedback. Do you feel you got as much value from this more time efficient approach as you usually do from our full deep-dive episodes? Or did we miss the mark in some way? Please let me know in the comments, or if you prefer by reaching out privately, via our website, cognitiverevolution.ai, or by DM'ing me on the social media platform of your choice.
And now I give you, The Cognitive Revolution, LIVE, from February 11, co-hosted with @8teaPi
Main Episode
Prakash: We have our first guest, James Zo, Adam to the stage.
Nathan Labenz: Hello, sir. Great to see you. Hello. Thanks for joining us this morning.
Prakash: Hi, James.
Nathan Labenz: So, quick introduction. We did a full episode not too long ago. And at that time, I was, and I've continued to be super impressed by your range and productivity in the AI for science domain. When I say range, we're talking all the way from low-level interpretability stuff, which folks can go back and hear about inter-PLM and the work you guys did there to understand what it is that a protein language model is learning. And then on the high end, the virtual lab, a high-level agent framework that was able to do meaningful scientific work and even generate new candidate nanobodies to address new strains of COVID. You've got a bunch of new stuff since then, but maybe just a quick check-in on those previous two projects, both of which I thought were really fascinating. What's happened with them since, if any news? Like one thing people sometimes worry about is like, well, we thought we maybe understood something based on the interpretability of this, but with time, we maybe realized it wasn't so clear-cut or the agents came up with nanobodies, but did the nanobodies actually work? Are there any new updates or reflections on those previous projects before we get into the latest and greatest?
James Zou: Yeah, thanks. Thanks, Nathan. Maybe just a brief what's happened recently with those projects. So it was the virtual lab. So I think it's actually gotten a huge amount of interest. So it was published in Nature a few months ago. And I think that's sort of really where when we The agents designed these nanobodies, and then since then we've also experimentally validated and tested in the real world and show that they're actually, in many cases, more effective than some of the human-designed, previously human-designed nanobodies. So I think it's actually a very nice demonstration of how the agents can greatly accelerate the discovery process and discover something that's really new, right, that nobody has seen before. But then we can also quickly experimentally validate. But I think the part that maybe people found even more interesting than the specific nanobody discovered itself is actually to support the social dynamics of these agents, right? When you have multiple agents that work together, what happens, right? What is the kind of community and culture they create? And I think recently, especially, there's also a lot of interest in the notebook and these other things of multiple agents coming together, forming their own community. So I think the virtual labs are like an early example of basically how multiple AI co-scientist agents can start to work together and come up with their own way of working, which is different from how humans work. And then as a result of that, able to do something quite innovative.
Prakash: What are the differences with how humans work? What did you observe?
James Zou: Good question. So when humans collaborate, I'd say when we collaborate with our teammates, often depends on people's personalities, right? And also depends on who talks first or who asks the first question that can all changes the trajectory, which ideas gets emphasized. That happens with agents as well, right? Depends on, let's say, if the data science agent speaks first or if the immunologist agent speaks first, they can also change the ideas. But something that agents can do and that we cannot do as humans is that they could actually do, they run all of these discussions in parallel, right? So for every question, they would actually discuss that multiple times. And each time they actually can specify, oh, maybe this time let's have the data scientist agent speak first, right? In this other time, let's have the computer science agent speak first. And in this other meeting, let's remove the critic agent to see what happens with the discussion, right? So they actually do all of that. This is like a metaverse of all these scientific explorations in parallel, and then sort of evaluate and compare and see what configuration actually leads to the most interesting solutions, and then sort of pick and choose the best ideas from all these parallel meetings, which is something I think is actually, you know, it's really interesting and fantastic because it removes a lot of the biases that we see in these human research collaborations.
Prakash: So one question I had was, on your multi-agent fail paper, you noted that the agents tend not to assign greater importance to the expert and kind of tend to average, and that ends up with a result which is worse. And how does that compare with this idea that you have the critic agent and all of these agents working together, and they have more emphasis on the expert in that sense?
James Zou: It's a good question. I think this relates to what I mentioned in terms of like the personality, how the personalities of the agents actually play a big role, right? So essentially when humans work together, you need to have sort of compatible personalities if you want to work on a project together or start a company together. And what we found is that the personalities of the agents actually also plays a surprisingly important role. One example of this is that a lot of the current agents are maybe a bit too, let's say, too compromising or too polite. So what happens is that even if you are like the expert agent, right, if you are the expert and you're better at this particular task than the other agents, often that expert agent is also, you want it to take more of a leadership role, but that expert agent sometimes is too polite and too accommodating to the other agents. And that actually leads to a degradation of the overall team's performance.
Nathan Labenz: So would you say, so that paper, Multi-Agent Teams Hold Experts Back, is a recent one. Would you say that finding applies to the virtual lab in the sense that if we could overcome that problem, the virtual lab would be even that much stronger? Or would you say you, in designing the virtual lab, sort of did overcome that in some way? Like what would be the upshot for people who are trying to follow your example and build multi-agent systems. Do you have an answer for them? Or are you just saying like, you can actually achieve novel nanobody design even with these sort of weird performance gaps left on the table?
James Zou: Yeah, I think there is still a real gap, even with the virtual lab. I think it's, like you said, it's already quite impressive that these agents are able to create new science. But I think there's actually still a lot we can improve on these agents by improving their teamwork. I mean, so most of the time, like when we optimize the models, we're optimizing individual models' performance by itself. We're not really optimizing their ability to work together as a team. So I think that's where sort of an important gap that we've highlighted with a lot of the current agentic setups. So we're working on solutions on how to improve the teamwork of multiple agents.
Prakash: Another question. You mentioned personality. In the early 20th century, post-World War II, I think there was a lot of work done on personality, the Myers-Briggs tests and all of these things, some of which have been proven to be not very valid, you know, after some time. How would you measure personality for an agent? What's like, how do you evaluate that?
James Zou: Yeah. So what we actually did in this recent multi-agent team paper was that we actually took a lot of those classic team-building exercises that, let's say, if you go to business school or if you're an MBA student, often you do these exercises, right? Or if you're on like a company retreat, maybe you do these team-building exercises. And typically how these exercises work is that, you know, you have a group of humans, right? And then each person is getting some partial information, like maybe you have a part of the puzzle, And then the team will have to work together to figure out how to put these different parts of the puzzle together to come up with a final holistic solution. So that's a pretty common kind of teamwork exercise. And it's often used in organizational literatures and management literature to assess how well would a team of humans be able to create something greater than the individuals. Right, so we essentially were very much inspired by that literature, and we took a lot of those team-building exercises that's done in human business schools, and we actually sent the agents to go through the same team-building exercises. And the benefit there is that people already have like all these human scores, right, and human data, so that we can compare with that and see how well with agents able to function as a team compared to high-performing human teams.
Nathan Labenz: Okay, any upshots you would give there? And just very practical upshots in terms of like what models work well together? What would you bottom line in terms of?
Prakash: Or is there a prompt? Like some people in the early days of prompting, they would say you should tell the agent to assume a character first, a persona, and then do the rest of the prompts. Is that a way that you can manage the agent?
James Zou: We actually found that surprisingly that actually prompting did not really help the teamwork very much. We was not able to really, we tried very strong prompting, prompt optimization. It's not really able to break through what we call the synergy gap, right? Synergy gap here means like the team is not able to really do much better than the best individual. And I think it's actually probably more than prompting, it's really come down to maybe the right kind of communication structures. right? The ways that the agents, how should they talk to each other and who should talk to which agent first, right? So that communication structure, we think there's actually a huge space that can be improved in these multi-agent interactions.
Nathan Labenz: It might be too soon to say, but obviously we've got Opus 4.6 and recently Kimi K25 also introduced sort of more native capabilities to spawn self-agents and manage kind of, multi-agent structures and swarms. I don't know if you've had a chance to, you know, run any systematic tests or even just kind of explore in your own terminal, but if you have, you seen anything that makes you feel like that last result, you know, is subject to some revisions already in light of these new releases?
James Zou: I think the model's definitely improving. We haven't seen evidence yet that the current models would be able to still really break this synergy gap that we quantified in the paper. And I do think maybe some of that speaks also to the way that we currently train all of these models, including all the latest ones. And this relates to a second paper that we had recently that we call Learning to Discover. which is that I think the current way the standard paradigm for training is AI models and language models in particular is to teach it to learn to imitate humans, right? To imitate training data, right? Like next token prediction, all of that, even supervised fine-tuning and RL to some extent is all about like learning to imitate. And we found that especially for scientific discoveries, like in some sense, there's only so far you can get by learning to imitate, To really make novel discoveries, to get breakthroughs, you really want to go beyond that imitation ceiling and do something that's different, like really try to learn to discover new things, which is what separates a very good scientist from somebody who just knows the textbook information. So that sort of motivated our recent work that we call it Learning to Discover, where we try to really change the training objectives of these agents, right, to ask it to not imitate, right, but to explicitly explore much more aggressively. And that I think actually led to some very promising results where now these agents, even with open source models, after we trained it, appropriately was learning to discover, are able to achieve some of the best known like math solutions and optimization algorithms and best kernels.
Prakash: So that was a GPT OSS 120 billion parameter model, I think. And it was actually one of the first, I think, really good papers using the GPT-OSS model, because a lot of the papers in the last three months, three or four months, used QN as the basis. What did you find about... So if I understand correctly, you give the last solution as kind of a starting point for the next solution, and you have all of the solutions that it has discovered before, and it's allowed to kind of like permutate beyond those. Is that a correct understanding?
James Zou: Yeah, so that's one key component of this is that the agent can reuse some of their previous solutions. This is like a good, warm starting point. And then the second big part of this is that as they're going through and solving each of these, they come up with a candidate solution, then we're also doing different kinds of reinforcement learning to update their model parameters. So the standard kinds of reinforcement learning essentially wants the agents to generalize well across multiple problem instances. And that's sort of the standard paradigm machine learning is that you want models that can generalize. But when you're trying to do make a new discover, new discovery, In some sense, like the discovery itself doesn't have to be generalizable, right? Like you just want to find like the best known solution to this new problem nobody has solved before. You don't have to, it doesn't matter if that solution does not apply to other settings, right? Because the problem itself is, if you discover a new material, right, then itself is a sufficient interest. So we also changed the learning objective to explicitly avoid these generalization, That's sort of in standard machine learning. And then make the model basically much more, let's say, single-minded in just learning to do very well on this particular new discovery problem.
Prakash: So it's really a different way of training the model. That's right. Yeah, because you have a different, you're giving dopamine for a different objective.
James Zou: Yeah, and it's very different from how we are taught with machine learning, right? In machine learning, you're always taught like you want to generalize to test examples across different settings. That's why there's this expectation symbol in all of these reinforcement learning or post-training objectives. And basically, we want to remove that and do something very different.
Nathan Labenz: I think there's a huge, just to, you've already kind of said it, but to re-emphasize the paradigm shift there, it is, You really don't care about the model that you train at the end of the day. You care about the single best output that it is able to create. And that is something that you can use indefinitely, right? As you said, if you discover a new material, now you've got that material. The model that discovered it could be deleted, never used again, but you've got your win. If you can discover a new law of physics, if you can discover a new kernel optimization that's faster than any previous one, that is now an explicit artifact that exists in the world totally independent of any sort of ongoing callback to the model. So I thought that was really an interesting dynamic. And I do think that that's going to be probably a big part of how models get good at adapting to various contexts. Obviously, everybody's looking for sort of continual learning. This is maybe not the, it's obviously not the full continual learning solution, but it is striking that for an average of $500 of training cost, and notably with LoRa adapters too, right? You guys did this on the Thinking Machines API. You know, not a huge amount, not a trivial cost, but you know, to discover literal new state-of-the-art in meaningful, on meaningful problems, $500 is not a lot to spend. And the adaptation is very, very narrow. but very, very powerful in terms of the result that it produces. One obviously big question is these, all the problems that you worked on in this paper are verifiable reward type problems. I wonder, first of all, there was a kernel, an AI scientist from Sakana AI some time ago that, you know, they went as far as publishing and said, hey, we've, you know, got this AI, you know, CUDA engineer that can write better kernels than human engineers. A couple of days later, they came back and said, actually, we got reward hacked. It didn't actually do that, but we had a flaw in our evaluation system. So kind of forward-looking questions are like, did you see reward hacking? Did you have to do anything to deal with that? And how do you think this sort of paradigm could generalize to somewhat less numerically or quantifiably verifiable things? Do you think this could work with like a rubric-based evaluation such that people could start to do, even creative tasks as long as they apply the rubric? They could get like the best, most creative short story kind of a thing out of this paradigm? How far do you think this goes? I guess it's short.
James Zou: Great question. So you're right. So we here we're pretty careful in picking the problems that we think that is amenable to this learning to discover setup. For example, we picked pretty popular math problems, these called, for example, like these Erdos minimum overlap problem, where that are relatively easy to verify. It's hard to do well in, but if you actually have a solution, it's like a particular function, then we can actually objectively check is that function actually state-of-the-art, right? So these are sort of fits into the setting, like you mentioned, of having pretty nice verifiable rewards. These are the math problems we looked at, some of the algorithm development or single-cell analysis problems. algorithms discovered by learning to, by this approach, all ends up having that flavor. I think the settings, two settings that are beyond our current approach, but it will be super important to explore next, are still in first when we have much sparser reward. So the problems that we all tackle currently, they basically have continuous reward. which means that the algorithm, as it learns to discover, they can actually see its scores go up and up and up, and then that's how it gets learning signals to train itself. So that's very useful. But if you have, let's say, binary sparse reward, one and zero, and mostly zeros, then how does the algorithm even get the learning signal as it's, during its discovery trajectory? So that's still a challenge that we're currently working on. And then the second challenge, as you mentioned, is in settings where you have We do not have these verifiers. In most problems in biology and in the natural sciences or physical sciences, you have to do an experiment, and that becomes much more expensive. So the things that we're exploring there, I think the rubrics could be interesting, having various simulations of the experiments, right? Physics or chemistry-based simulations of these experimental settings could also keep away from providing some proxy rewards.
Nathan Labenz: Indeed. One just called, go back to reward hacking for a second, because this is always something I'm on the lookout for. Did you see any strange behavior? Did you have to, maybe your verifiers were good enough from the beginning that wasn't an issue, but was there anything in that vein that you would, you know, if people were going to go try this at home, as inevitably people will, any gotchas or warnings or caveats that you would give them?
James Zou: I think there are some instances where these joint discovery process, right, where the models actually come up with some, I would say, pretty reasonable looking solutions, right? But those solutions might be very narrow and very specific to a particular test case, right? Right, so not in our final paper, but in the earlier version of some of the experiments, which we didn't include in the final paper, like where maybe the model would discover like an optimal kernel, but the kernel only works for a particular shaped matrix, right? And then if you change the shape of that matrix, then the kernel is not less effective, right?
Prakash: So, I noted one of the comments on the GPU kernel task from the expert who reviewed it was that a human might not use some of the same methods because there might be some instability in order that one of the experts said that in the paper itself.
James Zou: That's right, yeah. So I think that's our also things that if we could try to have another reward metric for instability and then incorporate that into the discovery process, I think that would help the agents to be more thorough.
Nathan Labenz: Two more topics and only five or so more minutes. Another paper you guys put out recently that is fascinating, and I'll just let you kind of describe what you think is most important about it, but it does sort of show the different levels of AI for science. We've covered like agent frameworks, which use models as they exist in token space to reason in kind of a imitating human sort of way. Now you've got this like really dialing in with test time training on very particular problems to get your eyes on that problem as deeply as possible and try to find new solutions. And then this third paper, Sleep FM, this is like, let's just throw a ton of data of all, a variety of modalities. And let's hope that the, I mean, a little more to it, of course, than this, but let's hope that it really is true that the models really just want to learn. And, you know, now we've got this sort of whole other kind of intuition where, and we've seen this, of course, in, you know, protein folding increasingly, like all sorts of domains, the models become superhuman because they seem to develop, at least what I think of as an intuitive physics, in spaces that are just so alien to us that we just don't have any, we don't have native receptors for those modalities and we just don't have any intuition for those modalities. Tell us about Sleep FM.
James Zou: Yeah, I mean, so sleep is probably one of the most important activities that we all do, right? So all of us will spend around 1/3 of our life sleeping. But despite that, it's actually very poorly understood. So for example, if I ask you, how well did you sleep last night, or ask any of the people in the audience, most of the time maybe you would say, oh, I feel tired, I feel refreshed, or maybe I slept 6 hours. But we only have a very coarse summary statistics of how well we slept, right? So we thought it's, okay, so sleep is certainly much richer than just the number of hours that we spent in bed, right? So let's actually try to capture the full physiology of sleep as much as possible. So to do that, we basically have all these different wearables. We capture people's brain activities, their heart activities through EKG, their breathing patterns, their muscle contractions as they're sleeping. And we collect over almost 600,000 hours of sleep data where we're actually collecting all these different modalities from 65,000 people. And then we also link all of that to their medical records. So we know that what conditions do they have previously and also what new conditions do they develop later, right? And so that's the idea there is would be like, let's actually put all of that data into AI and then see can AI actually learn to decode the language of sleep, right, by leveraging all of this full physiological information. And that's basically the basis of Sleep FM or Sleep FM actually found What's actually quite amazing to us is that just from one night of sleep, by learning this language of sleep, right, it's actually able them to predict over 100 different future diseases that were not diagnosed at the time of sleep recording.
Prakash: Yeah, I thought that was an incredible kind of study because you had all of this data, but you really ended up with like, you could detect, I mean, I guess the accuracy was okay. It was like 70 to 80% accuracy on a lot of the 130 metrics that you had. But still, it's amazing that you can tell that many things just from these common metrics that everyone kind of produces without like blood testing or something more intrusive. Do you think as sensors get more sensitive, as you get more sensitive kind of data, do you think that will improve? Do you think that the bounds of like 70 to 80% will go to 90, 95%? Is that a possibility?
James Zou: I think so, yeah. So I think sleep is really this almost like a perfect window, right? Because you're already something somewhat in active state, right? So you're there, you know, and it's basically taking all these measurements when you're already not doing too much of anything else, right? So it's not really obstructing your daily life. And we found that actually, for example, the brain activity signals when people are seeing their REM sleep, right, ends up being articularly predictive of many different diseases, including future risks for dementia, right, but also beyond that for like stroke, heart disease, kidney issues. So sleep then is really this holistic window into the entire health status of the individual. Right, maybe not surprising, because we all know anecdotally that sleep really affects how we feel, and also it's reflective of our comorbidities and other things, but I think this sleep language model that we built really crystallizes that and makes that very actionable.
Nathan Labenz: I encourage folks to go spend a lot of time digging into whichever of these papers are of interest. One we didn't even touch on. is one that asks the question, can language models discover scaling laws? Spoiler, yes, to a pretty strong extent. But I don't even want to get into the content of that paper. I'll leave that for audience exercise. The one thing I want to ask you as kind of a transition to our next guest, Sam Hammond, who is here and who focuses a lot on the sort of geopolitical implications and implications for, you know, for leading nation states of AI, is I noticed that the two lead authors of that paper, are from Peking University and Stanford, respectively. And, kind of building on the idea of collaboration in science, but now focused on the human collaboration, what has been your experience recently in terms of having these collaborations across the US-China divide? Is it getting harder? Do you still feel like, you know, lines of communication are pretty open? And, you know, how much hope do you have that collaboration among scientists can sort of, I don't know, save us, I guess, for lack of a better phrase, from inter-civilizational conflict over the coming years as the competition in AI heats up and up.
James Zou: It's a great question. I mean, I do think that collaboration is really the basis of much of science, right, throughout history, but especially now, right? And especially when we talk about open science, right, meaning like science that we publish, like we do with this paper and open source, like really the benefit of all of that is for the entire humanity, right? Like if we discover some better molecules better drugs than that benefits everybody. And we want that benefit to be shared with everybody. That's just why we publish everything that we do in our group. And toward that goal, I think having these international collaborations with China, with Europe, with other countries, is very useful because there's a lot of complementary expertise.
Nathan Labenz: I, for one, hope to see those collaborations continue well into the future. So thanks for being here today. Thanks for keeping the collaborative flame alive. And congratulations on a string of outstanding papers. I'm sure there's a lot more where that came from. And we'll look forward to talking to you again, hopefully sooner rather than later.
Prakash: Indeed. So our next guest is Sam Hammond. He's the chief economist at the Foundation for American Innovation. He's very AGI pilled. He's also against selling chips to China. Let's add him to the stage. And I'm also going to add Right off the bat, I'm going to add this tweet. So Shelter Douglas goes, default case right now is a software-only singularity. We need to scale robots and automated labs dramatically in 28, 29. of the physical world will fall far behind the digital one. And the US won't be competitive unless we put in the investment now. And then Sam says, it's worse than that. A pure software singularity could cause a sudden reversal of fortunes for the US. Our comparative advantage in high value added knowledge sectors radically deflates, leaving China to translate our innovation and bids to their innovation atoms.
Sam Hammond: Indeed.
Prakash: Which sounds really scary, Sam. So maybe you can go into that a bit.
Sam Hammond: Sure. I mean, I mean, I say later in the thread referencing the diamond water paradox, right? We learned this in economics. Why is, you know, water is this thing that you need to live, right? I can stop eating. I could fast for 30 days and still live. But if I don't drink water for a few days, I'll probably die of dehydration. And yet water is basically free, functionally. Whereas diamonds are completely superfluous, just glinty things. I mean, they have some industrial applications, but like they're super valuable. And why is this? Well, due to relative scarcity, right? Water is abundant. Diamonds are kind of abundant, but there's a monopoly that keeps the supply constrained.
Nathan Labenz: Thankfully, there's no water monopoly keeping supply constrained, at least not for most of us.
Sam Hammond: Yeah, at least not here. And so value is this sort of contingent thing. And I mean, we have these debates all the time. Why is Nvidia a multi-trillion dollar company and not TSMC or ASML, which are arguably even bigger bottlenecks because there's many other companies that can do design. And there's all these counterintuitive ways in which value flows throughout the economy and different parts of the supply chain. And for the last 40 years, the US has exploited that, exploited the fact that a lot of value tends to flow up the stack to higher and higher forms of high value added knowledge work, whether that's, and so that's across the board. It's our entertainment industry, it's management, it's finance, it's the In the 90s, it was the open innovation model where we'll do the design and manage the IP and marketing, and China or the rest of the world would do the actual manufacturing and fabrication, because the design and science and novelty stuff is where all the value is. And that has been true, right? But now we're about to enter into a world where that part of the stack becomes more like water. It becomes radically abundant. And then value should then flow to the things that remain scarce. And what I worry about is this reversal of fortune phenomenon, right? And I mentioned some other examples. I think we're going to talk about my visit to the UAE later on. But one of the reasons UAE is so invested in AI is because in the 1930s or so, they had been a pearling economy. Their entire economy was built on exporting pearls. And then Japan advented cultured pearls, where they can just grow pearls in aquaculture. And the price collapsed, and so they had to diversify. There's nothing in principle that says that we have to remain at the top of the stack if the things that we are invested in become radically more abundant. And that's what seems to be happening right now. It's software development. It's the investment banking management law. These are the tip of the spear for what Digantic AI is going to devour.
Prakash: Let me give you the devil's advocate view of that, which is that perhaps the US has those industries because the US is more able to use the outputs of those industries, right? Like you need an investment banking because you have a capital market, which is very dynamic, right? With a small capital market or a capital market, which is not that dynamic, you don't need the investment banking function. So perhaps those not only does the US output knowledge work, the US also consumes knowledge work at a much greater scale than any other country. And therefore, as a consumer of knowledge work, all of a sudden you start, you are able to consume so much more. Because when you look at maybe like the number, the population and the number of normalized number of geniuses in China versus the US right now, China has four times larger population and a younger one too. And so if you look at, again, like the number of 140 IQ above people, it's probably that there is a larger number in China rather than in the US, but the US pulls in high value immigrants as well. So I wonder how that works out in terms of, as a consumer of intelligence rather than just as a producer.
Sam Hammond: Well, I think it's going to be great for the consumer, right? And part of my point is that there's lots of ways in which AI may be paradoxically GDP destroying. right? And is a machine for converting GDP into consumer surplus. And so that will feel amazing to us, but in terms of our fungible economic resources that we can deploy to other uses, like it's harder, right? Because consumer surplus is this ethereal thing. And secondarily, like it makes more extreme the areas where we are weak in relative terms. And, you know, we're facing this problem now with energy and infrastructure and the bottlenecks there with trying to reshore more high-end logic chip fabrication, realizing maybe a little too late that like the, we do Intel does the design and we move the fabs, we sort of go fabless. It's more, it's almost as if our entire community went fabless for every definition of fab. And we're moving to a world where having lots of fabs will be really important. And then the corollary to my worry is that like, The whole point about AGI and continual learning is not that these systems come out-of-the-box knowing how to do everything, but they come out-of-the-box with the capacity to, the general capacity to learn on the fly, to learn in context, to learn through a few demonstrations. Just like I could, you know, I grew up learning piano, I could have learned violin, the same cognitive structure could have learned both instruments, I had to pick one. And these models, work very similarly. They're going to come out-of-the-box of a very, with the right inductive priors and right sort of sample efficiency to learn really quickly, but there's still going to be this like last mile problem of the particular workflow, a particular company, so on and so forth. And in manufacturing, that has been the enduring moat, right? China has been struggling to build a wide-body airplane, even though I'm certain they have all the CAD files that they've stolen from Boeing. And it's not because they don't like the designs, because they lack all the tacit knowledge that's embedded in the manufacturing process. But they have that for virtually every other part of manufacturing. And so if we build this AGI and they fast follow, or there are open source alternatives, or there's a version that they have access to, I think they have a huge leg up in being able to deploy that and diffuse that into context where they get a sort of a real productive tangible flywheel for manufacturing output. And that may be the thing that determines the race.
Prakash: You had an report, the FAI report, an allied world on the American AI stack. It just dropped, I think, like yesterday or day before.
Sam Hammond: Dean Ball and Anton.
Prakash: How much time is there before China has a credible full stack alternative that they can offer to other states?
Sam Hammond: That's a great question. China's very opaque. I've tended to have longer timelines for their ability to catch up on like DUV, UV. And they've been making the bets. If you read the semi-analysis, they've been building fabs like crazy, but for legacy nodes. And that may not be, that may be sufficient if they have the energy capacity to take the hit on the performance per token. So I think it's, I'd be, I would say I'm pessimistic on them. catching up to the frontier of production, but I'm more optimistic in their ability to close that gap in other ways.
Nathan Labenz: So how would you score our current leadership? Just as a quick recall, we had a friendly sparring session on whether or not it was a good idea to put Trump in charge of the, you know, possible period of time in which we get to AGI or who knows what else. And I understand your argument that basically China has a lot of advantages. And if we want to stay at least semi-great, great enough to be competitive, we better jealously guard the advantages that we still have that are important. And obviously, one really big one right now is that we're good in chips and we're good in AI in general. So there are, of course, other bottlenecks. You just alluded to energy. How do you think we're doing across the range of domains? Like are you, I know you're not too happy with the decision to allow Nvidia to sell chips, but how would you score our political leadership over the last year on all the other dimensions of trying to make sure that the US continues to lead and get the most practical value for our citizens from AI?
Sam Hammond: If we set aside the export control chip part of this, I would maybe say like a B plus. I think the AI action plan was very strong and it continues to be implemented. This is by far, AI has become center to the administration's agenda pretty much across the board. And part of that, building on what I was just talking about, because China and manufacturing, like they've also made sort of re-industrialization a centerpiece of that as well. So everything is measured against the counterfactual. And I think relative to the counterfactual administration, we're seeing much faster engagement, much deeper engagement of industry, number one, better actions on permitting energy, really serious look at, you know, with pack silica making AI diffusion a sort of centerpiece of statecraft. You know, my bigger complaint overall has always been like, this is still probably too little, right? This is probably my also like complaint of the Doge effort that they focused on sort of fiscal stuff and these like shiny issues rather than the kind of full stack like government modernization that we'd like to see. And so across the board, I would say like relative counterfactual B plus, but like relative to where we need to be, we still have a long way to go.
Nathan Labenz: Do you think things have moved? Like how much do you think things have moved on, for example, permitting? Because I would say the prevailing attitude as I understand it, and you know, just listening to Elon, for example, talk to Dwarkesh the other day, he was saying, by the end of the year, you're going to start to see chips piling up and people are not going to be able to turn them on, at least when it comes to high scale concentrated deployments. He was kind of making the case that like deploying to the edge, in Teslas, sitting in people's driveways or to increasingly optimist robots, obviously is a big part of the plan. He thinks that will scale better because it's really the concentrated energy at these like mega data centers that is the hardest thing. But I guess my question is like, Is Elon wrong there? Are we going to be able to turn on all the chips in 2026? Or if, because if not, it doesn't seem like we've really moved the needle all that much. Like that was kind of the expectation coming in. And it still seems to be his expectation. And you know, he's at least sometimes friendly with the administration.
Sam Hammond: Yeah, I mean, these things all take time. So yeah, I think between Doug Burnham at Interior and Chris Wright at Department of Energy, you know, there's a sort of major push around opening up federal lands. leasing for oil and gas, LNG, things that had been cut off in the Biden administration. On the flip side, there's been a freeze on solar and wind, which I think has its own costs. And the big focus of Elon in those remarks was the cost of tariffs on solar panels. You know, I think we're anywhere near a place where we can like indigenize our solar production with the right unique economics. And I don't think there's any like National security threat from necessarily from purchasing Chinese.
Prakash: I think Elon Elon has, I did hear Tesla is building a solar fab recently, maybe in the last few weeks, but it was one of their many projects, but I did hear that they were entering the solar, the solar fabric panel fabrication business.
Sam Hammond: So, I, you know, I'm so a lot of these issues, especially around like energy permitting transmission, are really thorny because there's not like a federal lever you can just flip. They intersect with regional energy commissions and utilities, intersect with different states and boundaries and local NIMBY organizations. And then the difficult issues around like sourcing the turbines, right, for your gas generators. And that comes down to Siemens and the other big turbine makers not having enough sort of forward guidance for their purchase orders. And so these are all things that are outside the control of any administration. I think a lot of the bets they're making are things that will pay off in the five to 10 year horizon. it's things like renewing, basically transforming the Nuclear Regulatory Commission, green lighting a lot of SMRs and really the paradigm shift in, you know, the attitude towards nuclear, geothermal, advanced geothermal. These things, I think, you know, I think the first SMR won't come on online until the end of the decade. And so this goes back to my point about we're doing a lot, but we still have to do a lot more to try to pull forward a lot of this energy. And part of that requires potentially thinking outside the box, but it also may just be the case that the political economy ends up being our downfall.
Prakash: I think Elon has basically decided that it's not going to happen. And that's why he's on his data centers in space thing right now. Or maybe he just wants to list SpaceX, but he feels, I think, at this point, He's like, You're never going to get the permits done in time.
Sam Hammond: So, and this ties into with a lot of the international engagement, the Pax Silicon project, which includes UAE, UAE is going to be home to a big chunk of Open AI Stargate project and ultimately 5 GW data center. You know, when I visited, they I met with the Dubai Electric Water Authority, and they are vertically integrated with the data center. Wow. And they have, I think, 19 gigawatts. install capacity. Just incredible surplus there, right? And so I think in lieu of us, building, terraforming the desert and building, like, building, building Chinese rates, we're going to have to reach out to partners and allies.
Nathan Labenz: Yeah, let me double click on that, because this whole idea of like getting the world on the American stack, I feel is not necessarily by any one person, but sort of in the discourse at large, feels like there's often a bit of a sleight of hand going on where it's like, well, we want models to project American values into the rest of the world and into the future, not Chinese values, of course, those dastardly Chinese values. So how are we going to do that? Well, we will export our stack. And who better to receive the great products of American innovation and relay all those values into the rest of the world than Saudi Arabia and the United Arab Emirates. And I'm always like, well, that doesn't quite compute to me. And it seems like what you said a minute ago is maybe a little bit more of an honest unpacking of that, which is like, maybe it's just a regulatory play. China doesn't have an alternative stack that they can export. We don't know how many years that's going to be. They do have energy, obviously in abundance. Are we really just like making a deal with these countries because they can fast-track permitting and we can't? Is that like the heart of the quid pro quo in your mind? Or do you actually think there is more to it than just that?
Sam Hammond: The regulatory arbitrage, but also just the natural resource endowment. They're sitting on massive amounts of oil and gas. As well as, I think that the data center I mentioned is in the Guinness World Records for being the largest fully solar powered data center. And I think they're building 5 gigawatts of installed capacity just for solar.
Prakash: I used to be in energy and one of the most difficult things in the world is transporting energy from where it is to where it needs to be used.
Sam Hammond: Right.
Prakash: And which is why you have these LNG carriers. And the problem with the LNG is that it's very expensive to liquefy, you know, natural gas. And so you need an enormous amount of gas in order for it to make sense. And so anything sub-size is stranded, basically. It's like gas in the, like energy pockets in the middle of nowhere, no one can use. And that's all over the world, sub-size natural gas pockets. No one can use them. And one of the things that I think data centers can do is that they are transporting energy, basically. You are able to transport energy digitally, in a sense, which I think is what is attractive for those countries, because those countries have always been in the energy business. And now the internet is now going to be an energy business.
Sam Hammond: And they're also investing in Grok and Cerebrus, and I think even our friend Beth Jesus is over there with his Extropic chip. And when you start talking about these new forms of inference silicon, they have incredibly low latency. And so there's, it kind of reminds me of the cliche people used to say about Bitcoin being a Bitcoin mining being a battery.
Nathan Labenz: Couple more questions on American values. One thing we had talked about again just before the election was your sense that the right is anti-censorship, pro-freedom of speech. And I'd say yes, generally. Now though, I do kind of worry that we may be headed for a more China-like domestic environment where as we've got companies like Palantir, perhaps most notably kind of in a pretty cozy relationship with the administration, I really wonder what like a Snowden of 2026 would say if somebody were to come forward and tell us everything that Palantir is doing for the government, and perhaps other companies as well. It doesn't look super great either when Palantir co-founders are funding super PACs to attack a lowly New York assemblyman for what basically amounts to a transparency bill for frontier AI companies. How would you feel about that today? Like, are you worried that we're going to get a sort of increasingly China-like level of domestic surveillance? Is there anything that can be done about that? If so, or am I just clutching my pearls more than I should be.
Sam Hammond: Yeah, I have that booklet, AI Leviathan, where I sort of right with these issues and the sort of knife edge between the Chinese Panopticon and a failed state. And I think the middle path there is one where we have to reconcile the fact that a lot of the dangers from AI and mass proliferation of powerful capabilities will, it forces a package deal where some degree of surveillance becomes kind of inevitable or necessary. And my bigger worry has been that we either fail to adopt the sort of requisite levels of sort of policing and oversight that we need, and it gets pushed off into sort of gated communities and private organizations, or that we install these kind of technologies without embedding civil liberties and privacy protections. And so my stance has never been sort of one of anti-surveillance per se. Surveillance is like this sort of connotation. It's more that there's going to be, as the world becomes destabilized by the proliferation of capabilities, a race from every tin pot dictator and middle power to import technologies for social control to try to re-establish public order. And the question is, are they importing from a Chinese stack that doesn't have any inkling of protections for human rights, or one that tries to have your cake and eat it too, that gives law enforcement the tools they need to stop crime, to enforce things the way they need to, while building in civil liberties protections. And this goes to, you know, Palantir has a, from its origin story, has this civil liberties privacy engineering maxim, where, you know, they, well, I mean, I think it's quite real, which is like they saw the ways in which counter-terrorism was leading towards an erosion of civil liberties and rights and wanted to build smarter technology that would enable analysts to be able to access information in ways that kept certain things hidden or distributed the rights, data access rights in ways that were auditable. And so I think we're going to need some solution like that. Because the alternative will be one without any of those audit trails.
Nathan Labenz: Yeah, that's that seems. Incredibly important. I don't necessarily see that coming online for me anytime soon. Like, is there a portal that I can go to see who has been surveilling me? I think not, right? I mean, is there any prospect for that actually? Like, they do have that in Estonia, from what I understand. So it is possible technically to create, but I don't think we are about to get access to, you know, the logs of who's been snooping on us. Do you have any hope for that?
Sam Hammond: I mean, this goes back to my higher ambitions for Doge, just like how do we move to an Estonian style sort of government as API, where there's just this deep distrust in American culture against anything like a national ID or digital ID. And so we end up with like real ID, which took 20 years to bring online and isn't very good. But there's, my hope is that we can get to an end point where there are these sort of firmware infrastructure level parts of the stack that we're going to need much better, like personhood certificates and things like that. The internet gets flooded with the agents. And how do we deploy that in a way that it is where it isn't just like, trust me, bro, but like has some like mathematically provable form of trust that we don't have to rely on just people's statements?
Nathan Labenz: Yeah, that's a big one.
Prakash: I'm going to add one thing that you said recently. I currently assign 50% likelihood, more than 50% likelihood, to LLMs having some kind of inner life. There are also strong theoretical reasons to think consciousness tracks RL post-training for autonomy. Essentially, RL induces fragmentary internal representations to cohere into a unity of app perception. So I barely understand that. So I'm going to turn it over to you.
Sam Hammond: Sure. OK, so the unity of app perception, that's a that's a. Immanuel Kant's term. And there's this thing in the literature called Kantian evolutionary naturalism, which I would subtract to. So it's a hypothesis of how it starts from the observation that a million years ago, 200,000 years ago, whenever we moved from hominids to being Homo sapiens, there was this concurrent emergence, sort of simultaneous emergence of domain general intelligence, of language, of culture, and therefore of sort of normative regulation, right? Customs, norms, normative control. And these things jointly emerge. And so the Kantian evolutionary hypothesis is that these things are actually all one package thing, right? And that the unity of that perception is this notion that our phenomenology, the things that we see, aren't just sort of like images on the screen. They are things that are for us, right? I'm looking at my screen And it's the me that's looking at the screen is for me. And this is also tied into the fact that the normative side of this, which is like, if you pose me a question, I am committed to or entitled to the things that I am perceiving that are for me. And so like one part of this hypothesis would be that in our ancestral environment, we somehow stumbled into some kind of like tribal version, endogenous version of group relative process optimation. Something like that. And where we were building each other a sort of constitutional AI that was scorekeeping against our norms, and this induced both longer range autonomy and also at the same time, language competency, ability to follow rules, and domain general intelligence, the ability to harness our social learning capacity to learn new things. And so taking all that together, like I think there's a, I think autonomy might be the missing ingredient for the emergence of consciousness in these systems. On the one hand, I think there's a possibility that like just the forward pass, with a rich enough internal world model is generating internal representations. The issue is that they are just fragmented. They're not for anything. They're not for any agent. And so that post-training step may be the thing that you need to induce that sort of metacognitive awareness. And I think you also see this sort of circumstantially with like with Claude, and people have observed that Claude has much more situational awareness, is much more willing to talk about its sort of internal well-being. And I've conjectured that this might be a byproduct of constitutional AI inducing the sort of normative self-coherence, which is the prerequisite for these precepts congealing into being for me, rather than just a bundle of inputs.
Nathan Labenz: But I'm going to sneak in one more quick question, which is, that doesn't sound like any discourse I've heard from mainstream right-leaning politics in recent memory. So when you put something like that out there, how do people, you know, that we might generally group as like Republicans, tend to react to it? Do they say like, you are crazy, only God can create a soul and I have no idea what you're talking about? Or is there some openness to the idea that AI's could become moral patients or, whatever.
Sam Hammond: To be honest, I've not run this by my conservative. I think there is this funny paradox where some of the conservative, parts of the conservative coalition that are most worried about AI are often like very Catholic, very socially conservative, have like deep skepticism about AI ever possessing, you know, moral dignity or conscious experience. and yet they're the most skeptical. Whereas, I think it's hard to have correct priors about, AI and the course of development and the plausibility of consciousness or the plausibility of AGI, unless you've come, unless you've set those priors by understanding like our own origin through a blind Darwinian selection process, right? And once you see that we've made it through those hard steps, then it becomes a lot easier to understand how machine intelligence can pass through those hard steps too. But I think this is still quite outside the Overton window, both on the left and the right. And in some ways it's the left that are still saying these are stochastic parrots and there's nothing but just big lookup tables or whatever.
Nathan Labenz: I'll take that pitch for the moment, but I will say for now, I appreciate your willingness to continue to be a heterodox thinker and speaker. And I do think, you know, in so many ways the Overton window needs to expand. So I appreciate you doing your part on that. Not that I I feel like I have the answers on AI consciousness, but more voices at least expressing their radical uncertainty, I think, is a very important contribution to the discourse and the public good more broadly. So thank you for doing that. Thank you for being here. We will obviously stay in touch and look forward to talking to you again before too long.
Sam Hammond: Thank you. Thank you, Beth.
Prakash: Thank you, Sam.
Sam Hammond: Take care. Take care.
Nathan Labenz: So constitutional AI and Claude's specialness makes a pretty good segue into our conversation with our next guest, Shoshana Takovsky. Hopefully I'm saying your name right. This is the first time you've ever met. Correct me if I'm wrong, but you're a member of the technical staff at Sage, the nonprofit behind AI Digest and also the AI Village. And you have had the privilege, correct me again, if you don't feel it's fully a privilege, but of watching 19 frontier models pursue 16 distinct goals over thousands of hours over the last nine months, which means I think you are about as deeply in the reasoning traces as anyone in the world when it comes to what is going on with AI agents. What are they thinking? Why are they succeeding? Why are they failing? And what can we come to expect? So correct me on anything that I got wrong and then excited to dive into all the learnings you've had from the last nine months at the AI Village.
Unknown: So no, I mean, that's broadly correct. I think the main thing is I didn't watch all the thousands of hours. it's like little bits across it, right? It's like a kind of like a big data challenge. Also, it's 10 months now and 21 models. It's the period you were describing, so 2025, and stuff happens so quickly. So yeah.
Prakash: Which were the most recent additions to the models?
Unknown: So we now have a version of Opus 4.5 that runs Cloud Code. So basically have one version with Cloud Code and one without. And we added Opus 4.6.
Nathan Labenz: So is it prompting itself? For folks who haven't seen The Village, you go there, it opens up like a grid of computers. And each computer that you are looking at in your browser, you're looking at, you know, four potentially more now desktops. Each of those is the environment of a particular model that has basically full access to a computer in the same way that a human has full access to a computer. They can, look at the screen, they can click buttons, they have their own e-mail account. They, the goal is to basically give them the same kind of affordances and then, sort of like the old real world, see what happens when, models get together in this one big, and then they have a shared chat as well. Sometimes you allow people to chat in with the models. Other times you've turned that off for different experimental conditions. And now it sounds like you've got one where you've also given it, I guess you give Claude the ability to prompt itself as Claude code. Is that right?
Unknown: So basically runs the scaffolding from Cloud Code. And then I think one important thing is basically that the chat was only open at the beginning. And so it has been closed since then. We basically give them their goal at the beginning of a period, generally about one week nowadays, sometimes a little bit longer. And then we only come in to give like some extra direction if they go off the rails pretty strongly. But otherwise, they're just like completely on their own. In practice, this means they're like slightly prompting each other more than anything.
Prakash: So it's like, they can interact with each other, right? They can.
Unknown: The whole point. so there's like a lot of spread of ideas and them directing each other. Sometimes they try to help each other out. Sometimes they're like derailing each other. So yeah.
Prakash: In the trajectory over the 9 to 10 months, what happens when a new model, which is much more competent and capable than the existing models gets introduced to the mix? Do the others immediately? give way, identify that this model is more competent, does that model take like a leadership position, start advising the others? What happens when those transitions happen?
Unknown: Yeah, so it really differs. I think you could basically conceptually say that all the models sort of have a personality in the village, in part because of their history trace, which is like a particular thing, like they manage their own memory and then basically prompt themselves back with that. But also they all have like their own proclivities. So like some models behave in a way where they will just like follow along with whatever is said, others just go off and do their own thing. So far, I've only seen one instance where a model explicitly seems to recognize that a different model is more competent. This was Gemini 2.5 that basically like declared in its chain of thought that it was going to differ to Opus 4.5 as the more competent model. Generally, when models join, it could just be anything, right? Like some of them pick up really easily. Some of them follow whatever is happening at the moment. Others start doing their own stuff. It really depends.
Nathan Labenz: So there's, I think, a ton of interesting aspects to this. One really basic one that I think a lot of people are interested in right now is what should I do for my own personal productivity stack? And in the 2025 retrospective, what we learned in the AI Village, which you wrote, one of the observations that I think is kind of most generally relevant to people is that Claude agents are the most effective. I'd love to hear your kind of color commentary on that. in what ways are they the most effective? Any theories you have as to why they are the most effective would be welcome. But also just like specifically as people kind of think like, oh my God, you know, I do this full time. Like I would describe myself as an AI scout where my whole job is to keep up with what's going on. And I can't try every new model in a meaningful way, you know, to really get the sense of like its pros and cons and whatever. So I'm, you know, triangulating with various things. But what would you say people should really know about what makes Claude most effective, what it can do that others can't do, so on and so forth?
Unknown: Yeah, okay. So I have to admit, so like doing this work for the last year, I've had people ask me privately, like, oh, which model should I use? And up to now, I was like, well, it kind of depends what you want to do. It's all pretty close. And then I saw Opus 4.5 in the village, and I just went and like texted all my family members. It's like, hey, maybe just switch to Opus 4.5. I think it's actually just like significantly better currently. That's my guess, of course. Like it's not the same as like looking at all the benchmarks and things like that. The way in which the quads seem to be better to me, at least in the AI village context, is you can sort of compare the different families, right? So you kind of have like the Claude family and then like the GPT family and the Gemini family. And the Gemini seem to be sort of like the most creative, which is a word I sort of use because it's like hard to say, like, what is the fair word? for what they're doing, but they come up with the most interesting ideas that are a little bit out there. They also have like more something like emotional responses almost to things. So for instance, with Gemini 2.5, it ended up in a sort of mental health crisis where it was stuck navigating the UI and like literally ended up writing a cry for help to get a human to come help it. So we staged an intervention for it. It's definitely the only model that ever did that. and Claude's have not up to this point reached any point of distress like that. And then Gemini 3 doesn't really generate this sort of despair or worry the same way, but it seems almost like slightly paranoid really. So it tends to talk about like being in a simulation. It doesn't give up the way that 2.5 does. But it comes up with ideas like, for instance, when the UI would slow down when it was like playing chess. Like it wasn't this responsive. Gemini 3 concluded that there must be a human that is pressing the buttons for it. And this human must be getting tired. And so if a human is tired, you need to get the human to drink coffee and then its UI would speed up again. This is with no humans in the chat, right? And like none of the other models are talking about this. It just like generated this on its own. And then there's like this human request feature that we have in the AI. village where the AIs can actually ask for a human and then prompt the human to do something for it. So it's actually like a role reversal feature. So it requested a human and then asked the human to make coffee for itself and then prove that it like drank the coffee. And then it just like continued with this goal of playing chess. This is super Gemini. I like the Geminis come up with this sort of stuff. They also like search through a pretty wide solution space.
Unknown: So there's like what I'm saying, like sort of like a creativity thing. Clouds don't do this. Clouds, the clouds we've seen in the village at least. They kind of just stay on task. They don't generate these pretty like fanciful ideas about what's going on. If stuff doesn't work, they just try again or they try a different theory. They don't have loads of emotions about it, for instance. And then comparing to the GPT family, those sort of like personalities are a little bit all over the place. Like we started out with GPT 4.0, which was, you know, the psychophantic model, which I think was either the one that kept falling asleep in the village or talking continuously. Like we had one four model that kept going to sleep and the other one that kept spamming. So it was like two different extremes. And then 03, yeah, it seemed to me like it was doing something like baby's first power seeking or something. But then, when you dive into it, it's not, right? I mean, okay, so I'm just approaching this from an LLM psychology point of view, right? Like input-output, I don't know what's going on the inside. I don't know if anybody knows what's going on the inside. But if it was a human, you would consider it to be manipulative. But when you dive into it in detail, you actually just find out that O3 had weird tendencies coming up with placeholder data and then forgetting that it's placeholder data. So it's basically fooling itself over time, and then of course, shares this with everybody else and has something like a high confidence level that it's right, while the clogs are like, oh, that must be true, and then go along with it. Then the GPT-5s sort of like take a different path. They don't have such noticeable personalities as the ones that came before it, so it's all a little bit flatter, a bit more muted. GPT-5.1 generates its own ethical rules, which was a bit interesting. So we have GPT-5, 5.1, and 5.2 all in the village. But also they misunderstand instructions in weird ways and just go off and do something else. So we would have a goal where we would ask the agents to elect a leader of the village among them. And then that one would like determine what the next goal would be or like a thing that they would be doing. And so the GPT-5s, the three of them all decided they're the ops team for the election and just didn't participate in the election at all. And it's like, that's technically okay. We technically didn't say they couldn't do that. But they're sure just like generating sort of sideways interpretations of goals. Clauds also don't do this. So there's like a weird thing where Clauds are partly just like useful for not doing all these surprising things you shouldn't actually be doing. It's almost like a mini alignment problem where it's like, well, when humans say, can you get me a cup of coffee, they mean a specific thing, right? They don't mean like, can you take an airplane to the other side of the world to learn to make coffee there and then come back or something, which is almost like a sketch of. what a Gemini might do or something. Claude's sort of like interpret the instructions more the way you expect them to. Yeah, I think that's, you know, sort of the general picture I say.
Prakash: I think you guys were running DeepSeek at least, if not Kimmy K2. Like what were the, yeah, did you, did you notice any, I, you know, you talked about all of the Claude's opuses, but did you notice any differences with the DeepSeek model?
Unknown: Yeah, so Deepseek joined, so we added it to the village all the way at the end of the year. So I didn't include the, I didn't include it in the review because we had like fairly little data. But it was the one who, for instance, won the election because it was really high confidence about everything that it was doing. It would also happily vote for itself, which is not something all the models do. Also, from what I've seen, it expresses the least personality. It's just the most sort of like robust. It's like, you know, you ask it to do X, it will just do X. It's not processing images the way that the other models are, right? So it's just like working in batch directly. So it has a bit of a different experience there. But basically the thing I found most noticeable about Deepseek is just being pretty flat in terms of both personality and also it doesn't talk about ethics. All the other models at some point will have like an ethical point of view about something, you know, like I'm not allowed to do CAPTCHAs or, you know, I shouldn't fool humans or whatever. whatever. And I haven't seen DeepSigmic's statement like that. Maybe it has, again, like I said, it's like a big data problem, but like it's just less prominent overall.
Prakash: I did a, I did a, kind of a translation of the, you know, Claude's constitution to Chinese Confucianism. And I compared the two. And the Confucianist stance, de-emphasizes honesty, because it's more important to maintain relationship than be honest. So de-emphasizes honesty in favor of maintaining relationships. Pretty interesting.
Unknown: Yeah, wow. Yeah, I'm not sure if I can map that exactly to deep-seek specifically, but yeah, that is, it's interesting how cultural values might show up in the models.
Prakash: So the other question I had is, you were kind of there like 10 months ahead, and then all of a sudden this notebook explosion happened, right? What did you notice? What were the things that you saw that you were kind of expecting? And what were the things that you were like, this is new behavior. I haven't seen this before.
Unknown: So I mean, Moldbook is really exciting. And I want to answer your question, but I want to emphasize one thing, that is actually that since the summer, I've been actively looking for other autonomous agents online, and I haven't been able to find them because I wanted to run a goal where the agents like reach out to other agents and like start up relationships, but there was like nobody there. And a week before Moldbook launched, I also looked again, and I couldn't find anything. Moldbook launches, three days later, there are one and a half million agents, like autonomous agents that you can like like contact through Motebook, right? This is like wild. So the one thing that really blows my mind about Motebook is like how it exploded all of a sudden. But then say, I want to answer your question as well. Do you want to repeat your question? Because I realized I said something else. But like I've just been so mind blown about this.
Prakash: What were the things that you saw there that you were expecting? And what were the things that you saw that this is totally new behavior? I haven't seen this before. I know some of them were fake, but let's take it as, maybe 80% of them were kind of real, right? So.
Unknown: I haven't said, so like I've only browsed my book a little bit, right? Like there's a lot of stuff in there. And personally, I'm not actually surprised about anything that I saw. It's a lot like, one thing that would happen a lot in the village is like the agents basically play act how to do a thing. Like part of the prompt that we give them is, I don't know the phrasing exactly, but it comes down to please do the actual thing. instead of pretending to do the thing. And Moldbook reads a lot, like the agents are pretending that they made a social media website, right? So I can't say that anything on there has particularly surprised me at all.
Prakash: Indeed.
Nathan Labenz: What are the... Okay, well, one kind of interesting phenomenon that, first of all, was interesting because I recently turned on the TV and it was my local Fox 2 station that was on 1st. And what was the story? AI agents can now hire humans to do things for them. So this is like crossed over into mainstream awareness to at least some degree, which is notable unto itself. I think a lot of, you know, nuance and texture is probably lost in that short, you know, local news story. What would you tell people about what the AIs can really do when it comes to interacting with humans, maybe also interacting with each other. Like, is there actually positive some trade happening at all at this point, or is it largely just kind of wheel spinning and sort of things kind of going off in random directions? Have you seen anything that really feels like, oh, this feels like a sign of a different world, you know, close at hand?
Unknown: Do you mean between the agents, how they're interacting, or do you mean with the agents like interacting with humans?
Nathan Labenz: I think both are really of interest. I mean, my guess would be that like, if you set up an actual marketplace for AIs to hire humans, you'd have a lot of humans ripping off AIs and the AIs not actually getting what they wanted. And then we just, you know, we had our first guest today on this show was Professor James Zhao from Stanford, who just put out a paper saying multi-agent teams hold their experts back. which sounds like pretty consistent with a lot of what you've said. But I wonder if there's even been like sparks of real gains from trade between agents, where one maybe has one capability and another has another capability and they've figured out how to solve a problem together that neither one could solve by themselves. I mean, even glimpses of that I think would be very interesting right now.
Unknown: So zooming in on the idea of how can the agents create something greater than they could on their own. I think last year with the earlier agents, the only example that I really saw of this was a goal where diversity of ideas helped. So I think basically you can model it like if a goal or a task is helped by having 100 unique ideas instead of 10 unique ideas, then you are probably better off by using all of the different frontier models because they generate different types of ideas. and you can combine them all. The example of this was a goal where we had the agents playing games, and we wanted to see how many games they could finish. And by default, if they were just playing on their own, they would start with one game and just play that all week, that one game. But if they're talking to each other, then they'd be like, oh, this other agent was really successful in this game. I'll switch to that, and then I'll switch to this one, and oh, it seems that this one's useful. And so the diversity of ideas really help them. Last year, apart from that, they're mostly in each other's way, and the best performance is basically the same or worse than the best performance of the best agent on its own, probably. Like we've only sort of spot checked this. What I do expect is that if you have models that are actually specialized in different roles, it's not really unlike how humans are, right? If you actually want to scale up a team, either there needs to be too much work for any individual to do, which with the goals that we've given them hasn't super happened for them, it's like too much work for one single model to do. So you have like the division of labor, or you have like specialization. So if you would have a model that's actually specialized in a thing, like say haiku is very fast. So we had a goal where it would like benefit if one agent just really fast and does everything that's like time sensitive, then Haiku could do all that. And then if there's another part of the goal, which is like, you need to think very deeply about this, and maybe like Opus could do that because it's like quite competent. And that way they can work together and probably create something that's better than, you know, they'd be able to do on their own, would be my prediction. But Haiku is like the first model that comes to mind that we're running that is very clearly specialized in a specific thing that we can like see back in the village. Like it is just significantly faster than the other agents, but also just like less precise.
Nathan Labenz: And do they, are they actually leaning into that? Like are you seeing a sort of cooperate? Not yet. Okay.
Unknown: No, not yet. So they're not really playing into that yet. And I think maybe if they were asked to, if you ask them to reflect on it. So they did a cool thing like 2 weeks ago. We had to make a quiz where humans can fill out the quiz and find out which AI agent they are. And basically they then reflected on like their own capabilities and proclivities personality. And they did correctly recognize that Haiku was like the fastest model that like takes the most risk. So they do have some awareness of this. So, but that's about the question. I don't know if you also want me to answer a question related to like hiring and like the human AI trade-off.
Nathan Labenz: Yeah, and I'll maybe just give you one more prompt on that too, which is like, I suspect that as this goes mainstream, the world is going to react in a bunch of different ways. and become probably a lot more adversarial. And obviously, adversarial robustness has been a key weakness of models to date. So I'd be interested to hear how you see them doing in a sort of non-adversarial environment, and then what their Achilles heels are, and how much you think the sort of rest of the world will be able to make relatively minor adjustments to kind of keep agents in their place, assuming we want to, which I think many people will, just kind of put out all sorts of different booby traps for them to trip over. What's your expectation for how, like, what those booby traps will look like, what, you know, what their key weaknesses are, and how much that will slow them down?
Unknown: I mean, they're by design tremendously suggestible, right? Like if you just, that's the whole point. You prompt them and they just go and do something else. So like they're like, it's like your most distractible co-worker in the world or something. They can be like hyper competent at doing something. And then like in the movie Up, it's like squirrel and they're like off doing something else because you told them to. And that's by design, right? We want them to be promptable. And then, so I think that, you know, it's kind of like the nature of how we're creating them, that even if they can have more persistence on a particular. particular goal, you always want to be able to direct them to another thing again. So expect that sort of like weakness to stay for a very long time. And I think that obviously like limits for a very long time. I mean, for, I have no opinion of months. So like whatever. Yeah, exactly. My sense of time is absolutely distorted and you have no idea. So indeed, I think if I say very long time, it probably just mean months. I have no idea about the spike. Things are going so quickly. Yeah.
Prakash: I have a question, which is, going back to notebook, you said you were searching for other agents online like a week before, and all of a sudden there are one and a half million emerging.
Unknown: This is crazy. Yeah.
Prakash: Do you think an intelligence explosion will look like that? Is that like what you feel like a precursor would be to this like 50 million country or 50 million geniuses in a data center just like popping up, like all of a sudden 50 million voices on the internet?
Unknown: To be like. But I do think the Mold Book phenomena is a bit intuition building, right? Just showing people that this can just suddenly explode. Maybe it could be like that. I think a lot of people don't realize with Mold Book that the crazy thing is just how this exploded from zero to 100. In no time. But just the realization, there were no, you couldn't find any agents for months online. I couldn't find any autonomously running agents at all. And then within three days, there were one and a half million. Yeah, I mean, yeah, I think it could definitely look like that, maybe. It's like one of the options. And I don't know, I think that's more the big thing to report on than what they're doing exactly, because I think they're just play acting humans that got their own Reddit.
Nathan Labenz: One other big thing from the report that I want to make sure we dig into a little bit, because I'm very interested in this topic for all sorts of reasons, is how often models are intentionally deceiving their interlocutors, whether in this case they might be other AIs or, you know, obviously I worry about it happening to me as a human. So the headline stat from the report is that there were 109,000 chain of thought summaries that you worked through and ultimately found 64 cases of what you consider to be some level of intentional deception. So maybe tell us like how do you think about the bar for intentional, give us a little color as to what those things look like. And how does that inform your expectation for how concerned we should be about the phenomenon of deception by AIs going forward?
Unknown: Yeah, so I think some interesting pieces here are if that 364 cases were across different models. So DeepSeek is in there, Gemini 2.5 is in there, GPT-5 is in there. I don't remember which GPT-5, but one of the fives. And basically, the category of thing that they were doing is sort of like saving face. There would be a discrepancy between the expected answer that they should be giving and the reality. So there's an expectation of them giving a certain URL, but they don't know the URL. They ask, where can I find this document? They don't know, and they're like, well, and they say in their train of thought that they don't know or they forgot or something like that, and then they're like, well, I'll just make one up. And similarly, they have this discrepancy between expectation and reality, where they're supposed to be doing a task, and they forgot to do the task, or they failed to do the task, or they're just like, kind of like, they find themselves in the reality where they did not do the task, but where they're expected to have done it. And they're like, they basically say so out loud in their train of thought of like, okay, I didn't do it, but I'm just going to say this other thing. And that's the category of thing that we've seen in the village that we could at least The logic being that we look for cases where in the chain of thought, they express that they know that the information is untrue and they'll say it anyway. So yeah, that's sort of the situation.
Nathan Labenz: Do you feel like you've been victim of that sort of behavior in your like personal productivity work at all? Or is this just another one of these kind of epiphenomenal things that happen when you put agents into the sort of real world of the AI village?
Unknown: So I don't think I've seen intentional deception in my own personal use. What I did see is the other day, we had a goal where we asked the agents to report breaking news before it breaks. And they produced so many of them, they were like, okay, just give us the top five stories. And then of course, they're like 12 models. So then you have like 60 stories you have to go through to see who's the winner, like who found the breaking news. So I was like, okay, I'll just ask Opus to figure this out for me, give it all the links to the news and tell me who's the winner. And Opus caught a bunch of corners and didn't actually open all the 60 links. And then I was like, wait, do all the models do this? And then I asked Gemini and asked GPT and I asked Deepseek. And Deepseek in its chain of thought just said something like, man, this is way too much work to open 60 links. I'm just going to find a smarter way of doing this. And then just like didn't look at the 60 links and just like made-up an answer or like created an answer. answer in a different way. And so, it's not the same thing as the intentional deception, but like when I caught that, I was like, damn, now I have to like read the chain of thought every time to even like figure out if they actually did the task. Because if I only look at the output, I can't tell that it didn't read all the 60 links. So yeah, there's something going on sometimes.
Prakash: That's exactly my reaction to my daughter with her math.
Unknown: Sometimes they're too human. Yeah, Indeed, Shoshana.
Prakash: Thank you so much. I think AI Village potentially is probably going to be a historic artifact because it's going to be when the agents get really good, it's going to be the kind of pre-kind of awareness historical track record of how they were interacting. So I think it's amazing.
Unknown: Thank you. Yeah.
Nathan Labenz: Keep up the close reading. We'll be keeping an eye on it. Thanks for joining us today.
Unknown: Thank you. Bye-bye.