Approaching the AI Event Horizon? Part 1, w/ James Zou, Sam Hammond, Shoshannah Tekofsky, @8teAPi
Part 1 of this live special explores AI for scientific discovery, U.S. AI policy, and AI agent behavior, with James Zou on virtual labs and interpretability, Sam Hammond on geopolitics and AI consciousness, and Shoshannah Tekofsky on emergent agent behavior.
Watch Episode Here
Listen to Episode Here
Show Notes
Part 1 of this live special dives into AI for Science, U.S. AI policy, and the behavior of AI agents in open-ended environments. James Zou explains how interpretability and virtual labs of AI agents can accelerate scientific discovery. Sam Hammond assesses the Biden administration’s AI policy, U.S.–Gulf AI deals, and the odds current AIs are conscious. Shoshannah Tekofsky shares insights from studying agent performance and emergent behavior in the AI Village.
Use the Granola Recipe Nathan relies on to identify blind spots across conversations, AI research, and decisions: https://recipes.granola.ai/r/4c1a6b10-5ac5-4920-884c-4fd606aa4f53
LINKS:
Sponsors:
GovAI:
GovAIwas founded ten years ago on the belief that AI would end up transforming our world. Ten years later, the organization is at the forefront of trying to help decision-makers in government and industry navigate the transition to advanced AI. GovAI is now hiring Research Scholars (one-year positions for those transitioning into AI policy) and Research Fellows (longer-term roles for experienced researchers). Both roles offer significant freedom to pursue policy research, advise decision-makers, or launch new initiatives. Applications close 22 February 2026. Apply at: https://www.governance.ai/opportunities
Blitzy:
Blitzy is the autonomous code generation platform that ingests millions of lines of code to accelerate enterprise software development by up to 5x with premium, spec-driven output. Schedule a strategy session with their AI solutions consultants at https://blitzy.com
Tasklet:
Tasklet is an AI agent that automates your work 24/7; just describe what you want in plain English and it gets the job done. Try it for free and use code COGREV for 50% off your first month at https://tasklet.ai
Serval:
Serval uses AI-powered automations to cut IT help desk tickets by more than 50%, freeing your team from repetitive tasks like password resets and onboarding. Book your free pilot and guarantee 50% help desk automation by week four at https://serval.com/cognitive
CHAPTERS:
(00:00) About the Episode
(03:19) Intro and past projects
(07:23) Multi-agent teamwork challenges
(13:26) Learning to discover science (Part 1)
(19:18) Sponsors: GovAI | Blitzy
(22:25) Learning to discover science (Part 2)
(27:09) Predicting health from sleep
(31:55) US–China science collaboration
(34:15) Software singularity geopolitics (Part 1)
(34:21) Sponsors: Tasklet | Serval
(37:09) Software singularity geopolitics (Part 2)
(44:42) US strategy and energy
(51:34) Gulf energy and compute
(55:36) AI surveillance and rights
(59:31) AI consciousness and politics
(01:06:07) Model personalities in village
(01:19:14) Agent swarms and teamwork
(01:25:29) Agent vulnerabilities and control
(01:29:18) Deception and cutting corners
(01:33:51) Outro
PRODUCED BY:
SOCIAL LINKS:
Website: https://www.cognitiverevolution.ai
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://linkedin.com/in/nathanlabenz/
Youtube: https://youtube.com/@CognitiveRevolutionPodcast
Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk
Transcript
This transcript is automatically generated; we strive for accuracy, but errors in wording or speaker identification may occur. Please verify key details when needed.
Introduction
Hello, and welcome back to the Cognitive Revolution.
You're about to hear Part 1 of what turned out to be a 4-hour live show that I co-hosted with my friend Prakash, also known as @8teAPi on Twitter, on the topics of AI for Science, Geopolitical Competition, and Recursive Self-Improvement.
With everything moving so quickly in the AI space, I'm actively looking for ways to shorten my own personal productivity timelines, and to deliver high-quality analysis in more timely and time-efficient ways – and talking to 6 top-notch guests over the course of 4 hours is … one attempt to do that.
In this part 1, which we're publishing as a standalone episode, we talk to:
- Professor James Zou of Stanford about his work on AI for Science, which ranges from applying interpretability techniques to protein models to building virtual labs of AI agents;
- Sam Hammond about how the current US administration is doing on AI policy, what the US is really getting out of its deals with gulf countries, and why he believes current AIs are at least as likely to be conscious as not;
- and to Shoshannah Tekofsky about the many fascinating observations she's made and lessons she's learned from a deep study of AI Agent performance and behavior in the open-ended setting of the AI Village.
In Part 2, which we'll release tomorrow, we talk to:
- Abhi Mahajan, also known as @owlposting, about AI for Biology and Medicine
- Helen Toner about a recent report on automated AI R&D within frontier model developers;
- and Jeremie Harris about the twin security dilemmas at the heart of the strategic AI landscape.
As you'll hear, the challenges of making sense of massive disagreement among leading experts, and simply keeping up to date with AI developments broadly come up repeatedly in these conversations, and to be honest, nobody has great solutions. One that I can recommend, though, is using LLMs to help identify blindspots, and for that purpose I'm really enjoying the blind spot finder Recipe that I recently created in Granola. Granola works at the operating system level of your computer, so it can capture all the audio in and out, including, if you wish, the contents of this episode. And its Recipe feature can work across sessions to identify trends, opportunities, or blind spots that only become apparent with that zoomed out view. Obviously this is a tool that grows in value over time, but if you want to try it, I suggest downloading the app, starting a session while you play this episode, and then asking it to identify blind spots based on this conversation. What's so cool about this feature, for active Granola users, is that the blind spots it identifies for you will be different from the ones it identifies for me.
With that said, this episode was a lot of fun, but because it's a new format, I would love your feedback. Do you feel you got as much value from this more time efficient approach as you usually do from our full deep-dive episodes? Or did we miss the mark in some way? Please let me know in the comments, or if you prefer by reaching out privately, via our website, cognitiverevolution.ai, or by DM'ing me on the social media platform of your choice.
And now I give you, The Cognitive Revolution, LIVE, from February 11, co-hosted with @8teaPi
Main Episode
(3:19) Prakash: We have our first guest, James Zou, to the stage. Hello, sir.
(3:27) Nathan Labenz: Great to see you. Hello. Thanks for joining us.
(3:31) James Zou: Morning.
(3:32) Nathan Labenz: So quick introduction. We did a full episode not too long ago, and at that time I was, and I've continued to be, super impressed by your range and productivity in the AI for science domain. When I say range, we're talking all the way from low-level interpretability stuff, which folks can go back and hear about InterPLM and the work you guys did there to understand what it is that a protein language model is learning, and then on the high end, the Virtual Lab, a high-level agent framework that was able to do meaningful scientific work and even generate new candidate nanobodies to address new strains of COVID. You've got a bunch of new stuff since then, but maybe just a quick check-in on those previous 2 projects, both of which I thought were really fascinating. What's happened with them since, if any news? Like, one thing people sometimes worry about is, well, we thought we maybe understood something based on the interpretability of this, but with time, we maybe realized it wasn't so clear-cut, or the agents came up with nanobodies, but did the nanobodies actually work? Are there any new updates or reflections on those previous projects before we get into the latest and greatest?
(4:45) James Zou: Yeah, thanks Nathan. Maybe just briefly what's happened recently with this project, the Virtual Lab. I think it's actually gotten a huge amount of interest. It was published in Nature a few months ago, and I think that's where we really demonstrated that the agents designed these nanobodies, and the system we've also experimentally validated and tested them in the real world and shown that they're actually, in many cases, more effective than some of the previously human-designed nanobodies. So I think it's actually a very nice demonstration of how the agents can greatly accelerate the discovery process and discover something that's really new that nobody has seen before, but then we can also quickly experimentally validate. But I think the part that maybe people find even more interesting than the specific nanobody discovered itself is actually the social dynamics of these agents. When you have multiple agents that work together, what happens? What is the kind of community and culture they create? And I think recently, especially, there's also a lot of interest in these other things of multiple agents coming together forming their own communities. So I think the Virtual Lab is an early example of basically how multiple AI co-scientist agents can start to work together and just kind of come up with their own way of working, which is different from how humans work. And then as a result of that, able to do something quite innovative.
(6:05) Prakash: What are the differences with how humans work? What did you observe?
(6:09) James Zou: Good question. So when humans collaborate, let's say when we collaborate with our teammates, it often depends on people's personalities. It also depends on who talks first or who asks the first question—that can all change the trajectory of which ideas get emphasized. That happens with agents as well. Depending on whether, let's say, the data science agent speaks first or if the immunologist agent speaks first, they can also change the ideas. But something that agents can do that we cannot do as humans is that they can actually run all of these discussions in parallel. So for every question, they would actually discuss that multiple times. And each time, they actually can specify, oh, maybe this time let's have the data scientist agent speak first, and in this other time let's have the computer science agent speak first, and in this other meeting let's remove the critic agent to see what happens with the discussion. So they actually do all of that. This is a metaverse of all these scientific explorations in parallel. And then they evaluate and compare and see what configuration actually leads to the most interesting solutions, and that's where they pick and choose the best ideas from all these parallel meetings, which is something I think is really interesting and fantastic because it removes a lot of the biases that we see in human research collaborations.
(7:24) Prakash: So one question I had was on your Multi-Agent Teams Hold Experts Back paper. You noted that the agents tend not to assign greater importance to the expert and kind of tend to average, and that ends up with a result which is worse. How does that compare with this idea that you have the critic agent and all of these agents working together, and they have more emphasis on the expert in that sense?
(7:50) James Zou: It's a good question. I think what this relates to is what I mentioned in terms of the personality—how the personalities of the agents actually play a big role. Essentially, when humans work together, you need to have sort of compatible personalities if you want to work on a project together or start a company together. And what we found is that the personalities of the agents also play a surprisingly important role. One example of this is that a lot of the current agents are maybe a bit too, let's say, too compromising or too polite. So what happens is that even if you are the expert agent—you've been designated the expert and you're better at this particular task than other agents—often that expert agent is also evolved to take more of a leadership role, but that expert agent sometimes is too polite and too accommodating to the other agents. And that actually leads to a degradation of the overall team's performance.
(8:47) Nathan Labenz: So would you say that paper, Multi-Agent Teams Hold Experts Back, is a recent one—would you say that finding applies to the Virtual Lab in the sense that if we could overcome that problem, the Virtual Lab would be even that much stronger? Or would you say you, in designing the Virtual Lab, sort of did overcome that in some way? What would be the upshot for people who are trying to follow your example and build multi-agent systems? Do you have an answer for them, or are you just saying you can actually achieve novel nanobody design even with these sort of weird performance gaps left on the table?
(9:27) James Zou: Yeah. I think there is still a real gap even with the Virtual Lab. It's already quite impressive that these agents are able to create new science, but I think there's actually still a lot we can improve on these agents by improving their teamwork. Most of the time when we optimize the models, we're optimizing individual model performance by itself. We're not really optimizing their ability to work together as a team. So I think that's the important gap that we've highlighted with a lot of the current agent setups. So we're working on solutions on how to improve the teamwork of multiple agents.
(10:06) Prakash: Another question. You mentioned personality. In the early twentieth century, post-World War 2, I think there's a lot of work done on personality, the Myers-Briggs tests, and all of these things, some of which have been proven to be not very valid after some time. How would you measure personality for an agent? What's like, how do you evaluate that?
(10:30) James Zou: Yeah. So what we actually did in this recent multi-agent team paper was that we actually took a lot of those classic team-building exercises that, let's say, you go to business school or if you're an MBA student, often you do these exercises, or if you are a company on a retreat, maybe you do these team-building exercises. And typically, these exercises work where you have a group of humans and then each person is getting some partial information—they maybe have a part of the puzzle—and then the team will have to work together to figure out how to put these different parts of the puzzle together to come up with a final holistic solution. So that's pretty common kinds of teamwork exercises, and that's often used in organizational literature and management literature to assess how well a team of humans would be able to create something greater than the individuals. So we were very much inspired by that literature, and we took a lot of those team-building exercises that's done in human business schools, and we actually sent the agents to go through the same team-building exercises. And the benefit there is that people already have all these human scores and human data so that we can compare with that and see how well the agents are able to function as a team compared to high-performing human teams.
(11:49) Nathan Labenz: Any upshots you would give there? Any just very practical upshots in terms of what models work well together or what—
(11:58) Prakash: Or is there a prompt? Some people in the early days of prompting, they would say you should tell the agent to assume a character first, a persona, and then do the rest of the prompts. Is that a way that you can manage the agent?
(12:13) James Zou: We actually found that surprisingly, prompting did not really help the teamwork very much. We tried very strong prompting and prompt optimization. It's not really able to break through what we call the synergy gap. Synergy gap here means the team is not able to really do much better than the best individual. And I think it's actually probably more than prompting where it comes down to maybe the right kind of communication structures. Like, the ways that the agents—how should they talk to each other, and who should talk to which agent first. So that communication structure, we think is actually a huge space that can be improved in this multi-agent interaction.
(12:54) Nathan Labenz: It might be too soon to say, but obviously we've got Opus 4.6 and recently Qwen k2.5 also introduced sort of more native capabilities to spawn sub-agents and manage kind of multi-agent structures and swarms. I don't know if you've had a chance to run any systematic tests or even just kind of explore in your own terminal, but if you have, have you seen anything that makes you feel like that last result is subject to some revisions already in light of these new releases?
(13:27) James Zou: I think the models are definitely improving. We haven't seen evidence yet that the current models would be able to really break this synergy gap that we quantified in the paper. And I do think maybe some of that speaks to also the way that we currently train all of these models, including all the latest ones. And this relates to a second paper that we had recently that we call Learning to Discover, which is that I think the current standard paradigm for training AI models and language models in particular is to teach them to learn to imitate humans, to imitate training data—next token prediction, all of that. Even supervised fine-tuning and RL to some extent is all about learning to imitate. And we found that, especially for scientific discoveries, in some sense there's only so far you can get by learning to imitate. To really make novel discoveries, to get breakthroughs, you really want to go beyond that imitation ceiling and do something that's different—to try to learn to discover new things, which is what separates a very good scientist from somebody who just knows the textbook information. So that sort of motivated our recent work that we call Learning to Discover, where we try to really change the training objectives of these agents to ask them to not imitate, but to explicitly explore much more aggressively. And that, I think, actually led to some very promising results where now these agents, even with open-source models, after we train them, are starting to be able to achieve some of the best-known math solutions and optimization algorithms and test kernels.
(15:16) Prakash: So I think that was a Qwen QwQ 32-billion-parameter model, I think, and it was actually one of the first really good papers using the Qwen QwQ model because a lot of the other papers in the last 3 or 4 months used Qwen as the basis. What did you find about—so if I understand correctly, you give the last solution as kind of a starting point for the next solution, and you have all of the solutions that it has discovered before, and it's allowed to kind of permutate beyond those. Is that a correct understanding?
(15:54) James Zou: Yeah. So that's one key component of this—that the agent can reuse some of their previous solutions. This is a good warm-starting point. And then the second big part of this is that as they're going through and solving each of these, they come up with a candidate solution, then we're also doing different kinds of reinforcement learning to update their model parameters. So the standard kinds of reinforcement learning essentially want the agents to generalize well across multiple problem instances. And that's the standard paradigm for machine learning—you want models that can generalize. But when you're trying to make a new discovery, in some sense the discovery itself doesn't have to be generalizable. You just want to find the best-known solution to this new problem nobody has solved before. It doesn't matter if that solution does not apply to other settings because the problem itself—if you discover a new material, then that itself is of sufficient interest. So we also changed the learning objective to explicitly avoid this generalization that's standard in machine learning and then make the model basically much more single-minded in just learning to do very well on this particular new discovery problem.
(17:10) Prakash: So it's really a different way of training the model.
(17:20) James Zou: Yeah. And it's very different from how we are taught with machine learning. In machine learning, you're always taught you want to generalize to test examples, across different settings. That's why there's this expectation symbol in all of these reinforcement learning or post-training objectives. And basically, we want to remove that and do something very different.
(17:41) Nathan Labenz: I think there's a huge—just to, you've already kind of said it, but to reemphasize the paradigm shift there—you really don't care about the model that you train at the end of the day. You care about the single best output that it is able to create, and that is something that you can use indefinitely. As you said, if you discover a new material, now you've got that material. The model that discovered it could be deleted, never used again, but you've got your win. If you can discover a new law of physics, if you can discover a new kernel optimization that's faster than any previous one, that is now an explicit artifact that exists in the world totally independent of any sort of ongoing callback to the model. So I thought that was really an interesting dynamic, and I do think that that's going to be probably a big part of how models get good at adapting to various contexts. Obviously, everybody's looking for sort of continual learning. This is maybe not the full continual learning solution, but it is striking that for an average of $500 of training cost, and notably with LoRA adapters too—you guys did this on the Thinking Machines API—not a huge amount, not a trivial cost, but to discover literal new state-of-the-art on meaningful problems, $500 is not a lot to spend. And the adaptation is very, very narrow but very, very powerful in terms of the result that it produces. Hey, we'll continue our interview in a moment after a word from our sponsors. Are you interested in a career in AI policy research? If so, you should know that GovAI is hiring. 10 years ago, a small group of researchers made a bet that AI was going to change the world. That bet became GovAI, which is now one of the world's leading organizations studying how to manage the transition to advanced AI systems. GovAI advises governments and companies on how to address tough AI policy questions and produces groundbreaking AI research. GovAI is now hiring its next cohort of researchers to tackle hard problems that will define AI's role in society. The research scholar position is a 1-year appointment for talented, ambitious individuals looking to transition into the field. And they're also hiring for research fellows, experienced researchers doing high-impact AI policy work. Past scholars and fellows have defined new research directions, published in leading media outlets and journals, done government secondments, gone on to work in leading AI labs, government agencies, and research groups, and even launched new organizations. Applications close on February 15, so hurry to governance.ai/opportunities. That's governance.ai/opportunities, or see the link in our show notes. Want to accelerate software development by 500%? Meet Blitzy, the only autonomous code generation platform with infinite code context. Purpose-built for large, complex enterprise-scale codebases. While other AI coding tools provide snippets of code and struggle with context, Blitzy ingests millions of lines of code and orchestrates thousands of agents that reason for hours to map every line-level dependency. With a complete contextual understanding of your codebase, Blitzy is ready to be deployed at the beginning of every sprint, creating a bespoke agent plan and then autonomously generating enterprise-grade premium-quality code grounded in a deep understanding of your existing codebase, services, and standards. Blitzy's orchestration layer of cooperative agents thinks for hours to days, autonomously planning, building, improving, and validating code. It executes spec and test-driven development done at the speed of compute. The platform completes more than 80% of the work autonomously, typically weeks to months of work, while providing a clear action plan for the remaining human development. Used for both large-scale feature additions and modernization work, Blitzy is the secret weapon for Fortune 500 companies globally, unlocking 5x engineering velocity and delivering months of engineering work in a matter of days. You can hear directly about Blitzy from other Fortune 500 CTOs on the Modern CTO or CIO Classified podcast, or meet directly with the Blitzy team by visiting Blitzy.com. That's blitzy.com. Schedule a meeting with their AI solutions consultants to discuss enabling an AI-native SDLC in your organization today.
(23:30) Nathan Labenz: One obviously big question is these—all the problems that you worked on in this paper are verifiable reward-type problems. I wonder, first of all, there was a kernel AI scientist from Sakana AI some time ago that went as far as publishing and said, hey, we've got this AI CUDA engineer that can write better kernels than human engineers. A couple of days later, they came back and said, actually, we got reward hacked. It didn't actually do that, but we had a flaw in our evaluation system. So kind of forward-looking questions are, did you see reward hacking? Did you have to do anything to deal with that? And how do you think this sort of paradigm could generalize to somewhat less numerically or quantifiably verifiable things? Do you think this could work with a rubric-based evaluation such that people could start to do even creative tasks as long as they apply the rubric? They could get the best, most creative short story kind of a thing out of this paradigm. How far do you think this goes, I guess, in short?
(23:30) James Zou: Great question. So you're right. We here were pretty careful in picking the problems that we think are amenable to this Learning to Discover setup. For example, we picked pretty popular math problems—these are called, for example, the Erdos minimum overlap problem—that are relatively easy to verify. It's hard to solve them, but if you actually have a solution, like a particular combinatorial function, then we can actually objectively check, is that function actually state-of-the-art? So these sort of fit into the setting you mentioned of having pretty nice verifiable rewards. The math problems we looked at, some of the algorithms developed under tensor and analysis problems—algorithms discovered by this approach all end up having that flavor. I think the settings, two settings that are beyond our current approach but it'll be super important to explore next, are: first, when we have much sparser reward. So the problems that we all tackle currently, they basically have continuous reward, which means that the algorithm, as it learns to discover, they can actually see its scores go up and up and up, and then that's how it gets learning signals to train itself. So that's very useful. But if you have, let's say, binary sparse reward of 1 and 0 and mostly zeros, then how does the agent even get the learning signal as during its discovery trajectory? So that's still a challenge that we're currently working on. And then the second challenge, as you mentioned, is in settings where we do not have these verifiers. In most problems in biology and in the natural sciences or physical sciences, you have to do an experiment that becomes much more expensive. So the things that we're exploring there—I think the rubrics could be interesting. There is, simulations of the experiments, physics or chemistry-based simulations of these experimental settings could also be a way of providing some proxy rewards.
(25:29) Nathan Labenz: Just to go back to reward hacking for a second, because this is always something I'm on the lookout for. Did you see any strange behavior? Did you have to—maybe your verifiers were good enough from the beginning that that wasn't an issue—but was there anything in that vein that you would, if people were going to go try this at home as inevitably people will, any gotchas or warnings or caveats that you would give them?
(25:57) James Zou: Yeah. I think there are some instances where this joint discovery process, where the models actually come up with some, I would say, pretty reasonable-looking solutions, but those solutions might be very narrow and very specific to a particular test case. So not in our final paper, but in the earlier version of some of the experiments which we didn't include in the final paper, maybe the model would discover an optimal kernel, but the kernel only works for a particular-shaped matrix. And then if you change the shape of that matrix, then the kernel is less effective.
(26:35) Prakash: Yeah. I noted one of the comments on the GPU kernel task from the expert who reviewed it was that a human might not use some of the same methods because there might be some instability. They noted that one of the experts said that in the paper itself.
(26:54) James Zou: That's right. Yeah. So I think those are also things that, if we could try to have another reward metric for instability and then incorporate that into the discovery process, I think that would help the agents to be more thorough.
(27:10) Nathan Labenz: Two more topics and only 5 or so more minutes. Another paper you guys put out recently that is fascinating, and I'll just let you kind of describe what you think is most important about it—it does sort of show the different levels of AI for science. We've covered agent frameworks, which use models as they exist in token space to reason in kind of an imitating-human sort of way. Now you've got this really dialing in with test-time training on very particular problems to get your eyes on that problem as deeply as possible and try to find new solutions. Then this third paper, SleepFM, this is like, let's just throw a ton of data of a variety of modalities and let's hope that—I mean, a little more to it, of course, than this—but let's hope that it really is true that the models really just want to learn. Now we've got this sort of whole other kind of intuition where, and we've seen this, of course, in protein folding increasingly, in all sorts of domains. The models become superhuman because they seem to develop at least what I think of as an intuitive physics in spaces that are just so alien to us that we just don't have any native receptors for those modalities and we just don't have any intuition for those modalities. Tell us about SleepFM.
(28:21) James Zou: Yeah. I mean, sleep is probably one of the most important activities that we all do. All of us spend around a third of our life sleeping. But despite that, it's actually very poorly understood. For example, if I ask you how well did you sleep last night, or ask any of the people in the audience, most of the time maybe you would say, oh, I feel tired, I feel refreshed, or maybe I slept six hours. But we only have very coarse summary statistics of how well we slept. Sleep is definitely much richer than just the number of hours that we spent in bed. So let's actually try to capture the full physiology of sleep as much as possible. To do that, we have all these different wearables. We capture people's brain activities, their heart activities through EKG, their breathing patterns, their muscle contractions as they're sleeping. And we collected over almost 600,000 hours of sleep data where we're actually collecting all these different modalities from 65,000 people. And then we also link all of that to their medical records, so we know what conditions do they have previously and also what new conditions do they develop later. The idea there is to put all of that data into AI and then see: can AI actually learn to decode the language of sleep by leveraging all of this full physiological information? And that's basically the basis of Sleep FM, or Sleep Foundation Model. It's actually quite amazing to us that just from one night of sleep, by learning this language of sleep, it's actually able to predict over 100 different future diseases that were not diagnosed at the time of sleep recording.
(30:09) Prakash: Yeah. I thought that was an incredible kind of study because you had all of this data, but it really ended up with, like, you could detect—I mean, the accuracy was okay. It was like 70 to 80% accuracy on a lot of the 130 metrics that you had. But still, it's amazing that you can tell that many things just from these common metrics that everyone kind of produces without blood testing or something more intrusive. Do you think as sensors get more sensitive, as you get more sensitive kind of data, that will improve? Do you think that the bounds of like 70 to 80% would go to 90, 95%? Is that a possibility?
(30:58) James Zou: I think so. Yeah. I think sleep is really this almost perfect window, because you are already somewhat in an inactive state. You're taking all these measurements when you're already not doing too much of anything else, so it's not really obstructing your daily life. And we found that, for example, the brain activity signals when people are in their REM sleep ends up being particularly predictive of many different diseases, including future risk for dementia, but also beyond that for stroke, heart disease, kidney issues. So sleep then is really this holistic window into the entire health status of the individual. Maybe not surprising because we all know, anecdotally, that sleep really affects how we feel and also it's reflective of our comorbidities and other things. But I think the sleep language model that we built really crystallizes that and makes it very actionable.
(31:56) Nathan Labenz: I encourage folks to go spend a lot of time digging into whichever of these papers are of interest. One we didn't even touch on is one that asks the question, can language models discover scaling laws? Spoiler: yes, to a pretty strong extent. But I don't even want to get into the content of that paper. I'll leave that for audience exercise. The one thing I want to ask you as kind of a transition to our next guest, Sam Hammond, who is here and who focuses a lot on the sort of geopolitical implications and implications for leading nation states of AI, is I noticed that the two lead authors of that paper are from Peking University and Stanford respectively. And, you know, kind of building on the idea of collaboration in science, but now focused on the human collaboration, what has been your experience recently in terms of having these collaborations across the US-China divide? Is it getting harder? Do you still feel like lines of communication are pretty open? And how much hope do you have that collaboration among scientists can sort of, I don't know, save us, I guess, for lack of a better phrase, from inter-civilizational conflict over the coming years as the competition in AI heats up?
(33:13) James Zou: It's a great question. I mean, I do think that collaboration is really the basis of much of science throughout history, but especially now. And especially when we talk about open science, meaning science that we publish, we do this paper in open source. The benefit of all that is for the entire humanity. If you discover some better molecules, better drugs, then that benefits everybody. And we want that benefit to be shared with everybody. That's just why we publish everything that we do in our group. And to put toward that goal, I think having these international collaborations with China, with Europe, with other countries is very useful because there's a lot of complementary expertise.
(33:56) Nathan Labenz: I, for one, hope to see those collaborations continue well into the future. So thanks for being here today. Thanks for keeping the collaborative flame alive, and congratulations on a string of outstanding papers. I'm sure there's a lot more where that came from, and we'll look forward to talking to you again hopefully sooner rather than later. Hey, we'll continue our interview in a moment after a word from our sponsors. The worst thing about automation is how often it breaks. You build a structured workflow, carefully map every field from step to step, and it works in testing. But when real data hits or something unexpected happens, the whole thing fails. What started as a time-saver is now a fire you have to put out. Tasklet is different. It's an AI agent that runs 24/7. Just describe what you want in plain English: send a daily briefing, triage support emails, or update your CRM. And whatever it is, Tasklet figures out how to make it happen. Tasklet connects to more than 3,000 business tools out of the box, plus any API or MCP server. It can even use a computer to handle anything that can't be done programmatically. Unlike ChatGPT, Tasklet actually does the work for you. And unlike traditional automation software, it just works. No flowcharts, no tedious setup, no knowledge silos where only one person understands how it works. Listen to my full interview with Tasklet founder and CEO, Andrew Lee. Try Tasklet for free at tasklet.ai, and use code COGREV to get 50% off your first month of any paid plan. That's code COGREV at tasklet.ai. Your IT team wastes half their day on repetitive tickets. And the more your business grows, the more requests pile up. Password resets, access requests, onboarding, all pulling them away from meaningful work. With Serval, you can cut help desk tickets by more than 50%. While legacy players are bolting AI onto decades-old systems, Serval was built for AI agents from the ground up. Your IT team describes what they need in plain English, and Serval AI generates production-ready automations instantly. Here's the transformation. A manager onboards a new hire. The old process takes hours: pinging Slack, emailing IT, waiting on approvals. New hires sit around for days. With Serval, the manager asks to onboard someone in Slack and the AI provisions access to everything automatically in seconds with the necessary approvals. IT never touches it. Many companies automate over 50% of tickets immediately after setup, and Serval guarantees 50% help desk automation by week four of your free pilot. As someone who does AI consulting for a number of different companies, I've seen firsthand how painful manual provisioning can be. It often takes a week or more before I can start actual work. If only the companies I work with were using Serval, I'd be productive from day one. Serval powers the fastest-growing companies in the world like Perplexity, Vercata, Mercur, and Clay. So get your team out of the help desk and back to the work they enjoy. Book your free pilot at serval.com/cognitive. That's serval.com/cognitive.
(37:10) Prakash: Indeed. So our next guest is Sam Hammond. He's the chief economist at the Foundation for American Innovation. He's very agile-brained. He's also against selling chips to China. Let's add him to the stage. And I'm also going to add, right off the bat, a screen. So Douglas Summers-Stay says, "Default case right now is a software-only Singularity. We need to scale robots and automated labs dramatically in 2029, or the physical world will fall far behind the digital one. And the US won't be competitive unless we put in the investment now." And then Sam says, "It's worse than that. A pure software singularity could cause a sudden reversal of fortunes for the US. Our comparative advantage in high value-added knowledge sectors radically deflates, leaving China to translate our innovation bins to innovation apps." Indeed. Which sounds really scary, Sam. So maybe you can go into that a bit.
(38:05) Sam Hammond: Sure. I mean, I'd say later in the thread referencing the diamond-water paradox. We learned this in economics. Why is water this thing that you need to live? I can stop eating. I could fast for 30 days and still live. But if I don't drink water for a few days, I'll probably die of dehydration. And yet water is basically free, functionally. Whereas diamonds are completely superfluous, just pretty things. I mean, they have some industrial applications, but they're super valuable. And why is this? Well, due to relative scarcity. Water's abundant. Diamonds are kind of abundant, but there's a monopoly that keeps supply constrained.
(38:44) Nathan Labenz: Thankfully, there's no water monopoly keeping supply constrained, at least not for most of us.
(38:48) Sam Hammond: Yeah. At least not here. And so value is this contingent thing. And we have this debate all the time. Why is NVIDIA a multi-trillion dollar company and not TSMC or ASML, which are arguably even bigger bottlenecks because there's many other companies that can do design? And there's all these counterintuitive ways in which value flows throughout the economy and different parts of the supply chain. And for the last 40 years, the US has exploited the fact that a lot of value tends to flow up the stack to higher and higher forms of high value-added knowledge work. Whether that's—and so this is across the board. It's our entertainment industry. It's management. It's finance. It's—you know, in the nineties, it was the open innovation model where we'll do the design and manage the IP and marketing, and China or the rest of the world would do the actual manufacturing and fabrication. Because the design and science and novelty stuff is where all the value is. And that has been true. But now we're about to enter into a world where that part of the stack becomes more like water. It becomes radically abundant, and then value should flow to the things that remain scarce. And what I worry about is this reversal of fortune phenomenon. And I mentioned some other examples. I think we're going to talk about my visits to the UAE later on, but when—you know, one of the reasons the UAE is so invested in AI is because in the 1930s or so, they had been a pearly economy. Their entire economy was built on exporting pearls. And then Japan invented cultured pearls where they can just grow pearls in aquaculture. And the price collapsed, and so they had to diversify. You know, there's nothing in principle that says that we have to remain at the top of the stack if the things that we are invested in become radically more abundant. And that's what seems to be happening right now. It's software development. It's investment banking, management, law. These are the tip of the spear for what AI is going to devour.
(40:52) Prakash: Let me give you the devil's advocate view of that, which is that perhaps the US has those industries because the US is more able to use the outputs of those industries. Like, you need investment banking because you have a capital market which is very dynamic. Without it—with a small capital market or a capital market which is not that dynamic, you don't need an investment banking function. So perhaps not only does the US output knowledge work, the US also consumes knowledge work at a much greater scale than any other country. And therefore, as a consumer of knowledge work, all of a sudden, you are able to consume so much more. Because when you look at maybe the normalized number of geniuses in China versus the US right now, China has four times larger population and a younger one too. And so if you look at, again, the number of 140 IQ and above people, there's probably a larger number in China rather than in the US, but the US pulls in high value immigrants as well. So I wonder how that works out in terms of as a consumer of intelligence rather than just as a producer.
(42:10) Sam Hammond: Well, I think it's going to be great for the consumer. And part of my point is that there's lots of ways in which AI may be paradoxically GDP-destroying. It's a machine for converting GDP into consumer surplus. And so that will feel amazing to us, but in terms of our fungible economic resources that we can deploy to other uses, it's harder. Because consumer surplus is just an ethereal thing. And secondarily, it makes more extreme the areas where we are weak in relative terms. And, you know, we're facing this problem now with energy and infrastructure and the bottlenecks there with trying to restore more high-end logic fabrication, realizing maybe a little too late that—like, Intel does the design and then we move the fabs. We sort of go fabless. It's almost as if our entire economy went fabless for every definition of fab. And we're moving to a world where having lots of fabs will be really important. And then the corollary to my worry is that the whole point about AGI and continual learning is not that these systems come out of the box knowing how to do everything, but they come out of the box with the capacity to—the general capacity to learn on the fly, to learn in context, to learn through a few demonstrations. Just like I could—you know, I grew up learning piano. I could have learned violin. The same cognitive structure could have learned both instruments. Had to pick one. And these models work very similarly. They're going to come out of the box with the right inductive priors and right sort of sample efficiency to learn really quickly, but there's still going to be this last mile problem of the particular workflow, a particular company, so on and so forth. And in manufacturing, that has been the enduring moat. China has been struggling to build a wide-body airplane even though I'm certain they have all the CAD files that they've stolen from Boeing. And it's not because they don't like the designs, it's because they lack all the tacit knowledge that's embedded in the manufacturing process. But they have that for virtually every other part of manufacturing. And so if we build this AGI and they fast follow or there are open source alternatives or there's a version that they have access to, I think they have a huge leg up in being able to deploy that and diffuse that into contexts where they get a real productive tangible flywheel in manufacturing output. And that may be the thing that determines the race.
(44:43) Prakash: You had a report—the FAI report on Allied World on the American AI stack. It just dropped, I think, like, yesterday or the day before.
(44:53) Sam Hammond: Team Ballon and Anton.
(44:55) Prakash: How much time is there before China has a credible full stack alternative that they can offer to other states?
(45:06) Sam Hammond: That's a great question. China's very opaque. I've tended to have longer timelines for their ability to catch up on DUV, and, you know, they've been making the bets. If we read the SemiAnalysis analysis, they've been building fabs like crazy, but for legacy nodes. And that may be sufficient if they have the energy capacity to take the hit on the performance per token. So I think I'd be pessimistic on them catching up to the frontier of semiconductor production, but I'm more optimistic in their ability to close that gap in other ways.
(45:46) Nathan Labenz: So how would you score our current leadership? Just as a quick recall, we had a friendly sparring session on whether or not it was a good idea to put Trump in charge of the possible period of time in which we get to AGI or who knows what else. And I take—I understand your argument that basically China has a lot of advantages. And if we want to stay at least semi-great, you know, great enough to be competitive, we better jealously guard the advantages that we still have that are important. And obviously, one really big one right now is that we're good at chips and we're good at AI in general. So there are, of course, other bottlenecks. You just alluded to energy. How do you think we're doing across the range of domains? Like, are you—I know you're not too happy with the decision to allow NVIDIA to sell chips, but how would you score our political leadership over the last year on all the other dimensions of trying to make sure that the US continues to lead and get the most practical value for our citizens from AI?
(46:53) Sam Hammond: If we set aside the export control chip part of this, I would maybe say like a B-plus. You know, I think the AI action plan was very strong and it continues to be implemented. This is by far—you know, AI has become central to the administration's agenda pretty much across the board. And part of that building on what I was just talking about—because China and manufacturing—they've also made sort of reindustrialization a centerpiece of that as well. So everything is measured against the counterfactual. And I think relative to the counterfactual administration, we're seeing much faster engagement and much deeper engagement of industry, number one. Better actions on permitting energy, really serious look at—with PACSILICA—making AI diffusion as your centerpiece of statecraft. My bigger complaint overall has always been like this is still probably too little. This is probably my also complaint of the DOGE effort, that they focused on sort of fiscal stuff and these shiny issues rather than the kind of full stack government modernization that we'd like to see. And so across the board, I would say like relative to the counterfactual, B-plus, but relative to where we need to be, we still have a long way to go.
(48:13) Nathan Labenz: Do you think things have moved—like, how much do you think things have moved on, for example, permitting? Because I would say the prevailing attitude as I understand it, and, you know, just listening to Elon, for example, talk to our guest the other day, he was saying, by the end of the year, you're going to start to see chips piling up and people are not going to be able to turn them on, at least when it comes to high-scale concentrated deployments. He was kind of making the case that deploying to the edge, you know, in Teslas sitting in people's driveways or to increasingly Optimus robots, obviously, is a big part of the plan. He thinks that will scale better because it's really the concentrated energy at these mega data centers that is the hardest thing. But I guess my question is, like, is Elon wrong there? Are we going to be able to turn on all the chips in 2026? Or if not, it doesn't seem like we've really moved the needle all that much. Like, that was kind of the expectation coming in, and it still seems to be his expectation. And, you know, he's at least sometimes friendly with the administration.
(49:14) Sam Hammond: Yeah. I mean, these things all take time. So, yeah, I think between Doug Burgum at Interior and Chris Wright at Department of Energy, you know, there's a sort of major push around opening up federal lands, leasing for oil and gas, LNG, things that have been cut off in the Biden administration. On the flip side, there's been a freeze on solar and wind, which I think has its own costs. And, you know, a big focus of Elon in those remarks was the cost of tariffs on solar panels. You know, I think we're anywhere near a place where we can indigenize our solar production with the right economic economics. And I don't think there's any national security threat necessarily from purchasing Chinese panels.
(50:01) Prakash: I think Elon has—I did hear Tesla is building a solar fab recently, maybe in the last few weeks, but it was one of their many projects. But I did hear that they were entering the solar panel fabrication business.
(50:16) Sam Hammond: So, you know, a lot of these issues, especially around energy permitting and transmission are really thorny because there's not a federal lever you can just flip. They intersect with regional energy commissions and utilities, intersect with different states and boundaries and local NIMBY organizations. And then the difficult issues around sourcing the turbines for your gas generators. And that comes down to Siemens and the other big turbine makers not having enough sort of forward guidance for their purchase orders. And so these are all things that are outside the control of any administration. I think a lot of the bets they're making are things that will pay off in the 5 to 10 year horizon. You know, it's things like renewing or basically transforming the Nuclear Regulatory Commission and green-lighting a lot of SMRs and really the paradigm shift in the attitude towards nuclear and geothermal. These things, I think, you know—I think the first SMR won't come online until the end of the decade. And so this goes back to my point about we're doing a lot, but we still have to do a lot more to try to pull forward a lot of this energy. And part of that requires potentially thinking outside the box, but it also may just be the case that the political economy ends up being our downfall.
(51:36) Prakash: I think Elon has basically decided that it's not going to happen, and that's why he's on his data centers in space thing right now. Or maybe he just wants to list SpaceX, but he feels, I think, at this point, he's like, you're never going to get the permits done in time. So—
(51:52) Sam Hammond: And this ties into, you know, a lot of the international engagements. You know, the PACSILICA project, which includes the UAE. The UAE is going to be home to a big chunk of OpenAI's Stargate project and ultimately, a five gigawatt data center. You know, when I visited, I met with the Dubai Electricity and Water Authority, and they are vertically integrated with the data center. Wow. And they have, I think, 19 gigawatts in installed capacity. Just incredible surplus there. And so I think in lieu of us building, you know, terraforming the desert, building at Chinese rates, we're going to have to reach out to partners and allies.
(52:38) Nathan Labenz: Yeah. Let me double-click on that because this whole idea of getting the world on the American stack, I feel is not necessarily by any one person, but sort of in the discourse at large feels like there's often a bit of a sleight of hand going on where it's like, well, we want models to project American values into the rest of the world and into the future, not Chinese values, of course, those dastardly Chinese values. So how are we going to do that? Well, we will export our stack. And, you know, who better to receive the great products of American innovation and relay all those values into the rest of the world than Saudi Arabia and the United Arab Emirates? And I'm always like, well, that doesn't quite compute to me. And it seems like what you said a minute ago is maybe a little bit more of an honest unpacking of that, which is like, maybe it's just a regulatory play. China doesn't have an alternative stack that they can export. We don't know how many years that's going to be. They do have energy, obviously, in abundance. Are we really just making a deal with these countries because they can fast-track permitting and we can't? Is that, like, the heart of the quid pro quo in your mind, or do you actually think there is more to it than just that?
(53:58) Sam Hammond: The regulatory arbitrage, but also just the natural resource endowment. They're sitting on massive amounts of oil and gas as well as—you know, I think that the data center I mentioned is in the Guinness World Records for being the largest fully solar-powered data center. And I think they're building five gigawatts of installed capacity just for solar.
(54:19) Prakash: I used to be in energy, and one of the most difficult things in the world is transporting energy from where it is to where it needs to be used.
(54:27) Sam Hammond: Right.
(54:28) Prakash: Which is why you have these LNG carriers. And the problem with LNG is that it's very expensive to liquefy natural gas. And so you need an enormous amount of gas in order for it to make sense. And so anything sub-scale is stranded, basically. It's gas—energy pockets in the middle of nowhere, no one can use. And that's all over the world, sub-scale natural gas pockets. No one can use them. And one of the things that I think data centers can do is that they are transporting energy, basically. You are able to transport energy digitally in a sense, which I think is what is attractive for those countries because those countries have always been in the energy business. And now the internet is going to be in the energy business.
(55:13) Sam Hammond: And they're also investing in Groq and Cerebras. I think even our friend, Jeff Jesus, is over there with his d/acc chip. And when you start talking about these new forms of inference silicon, they have incredibly low latency. And so, like, there is a—it just kind of reminds me of the cliche people used to say about Bitcoin mining being a battery.
(55:36) Prakash: Battery. Yes. Apologies.
(55:38) Nathan Labenz: So couple more questions on American values. One thing we had talked about again just before the election was your sense that the right is anti-censorship, pro-freedom of speech. And I'd say yes, generally. Now, though, I do kind of worry that we may be headed for a more China-like domestic environment where as we've got companies like Palantir perhaps most notably kind of in a pretty cozy relationship with the administration. You know, I really wonder what a Snowden of 2026 would say if somebody were to come forward and tell us everything that Palantir is doing for the government, and perhaps other companies as well. It doesn't look super great either when Palantir co-founders are funding super PACs to attack a lowly New York Assemblyman for what basically amounts to a transparency bill for Frontier AI companies. How do you feel about that today? Like, are you worried that we're going to get a sort of increasingly China-like level of domestic surveillance? Is there anything that can be done about that, if so? Or am I just clutching my pearls more than I should be?
(56:58) Sam Hammond: I have this booklet, AI Leviathan, where I write about these issues and the knife edge between the Chinese panopticon and failed state. I think the middle path there is one where we have to reconcile the fact that a lot of the dangers from AI and mass proliferation of powerful capabilities force a package deal where some degree of surveillance becomes inevitable or necessary. My bigger worry has been that we either fail to adopt the requisite levels of policing and oversight that we need and it gets pushed off into gated communities and private organizations, or that we install these technologies without embedding civil liberties and privacy protections. My stance has never been one of anti-surveillance per se—surveillance has negative connotations. It's more that as the world becomes destabilized by the proliferation of capabilities, there's going to be a race from every tin pot dictator and middle power to import technologies for social control to try to reestablish public order. And the question is, are they importing from a Chinese stack that doesn't have any protections for human rights? Or one that tries to have your cake and eat it too—that gives law enforcement the tools they need to stop crime and enforce things while building in civil liberties protections? This goes to Palantir's origin story. They have this civil liberties privacy engineering maxim—and I think it's quite real—which is that they saw the ways in which counterterrorism was leading towards an erosion of civil liberties and rights, and wanted to build smarter technology that would enable analysts to access information in ways that kept certain things hidden or distributed data access rights in ways that were auditable. I think we're going to need some solution like that because the alternative will be one without any of those audit trails.
(59:06) Nathan Labenz: That seems incredibly important. I don't necessarily see that coming online for me anytime soon. Is there a portal that I can go to to see who has been surveilling me? I think not. Is there any prospect for that actually? They do have that in Estonia from what I understand, so it is technically possible to create. But I don't think we are about to get access to the logs of who's been snooping on us. Do you have any hope for that?
(59:33) Sam Hammond: This goes back to my higher ambitions for DOGE. How do we move to an Estonian-style government-as-API? There's this deep distrust in American culture against anything like a national ID or digital ID. And so we end up with Real ID, which took 20 years to bring online and isn't very good. My hope is that we can get to an endpoint where there are these firmware infrastructure-level parts of the stack—we're going to need much better personhood certificates and things like that as the Internet gets flooded with AI agents. And how do we deploy that in a way where it isn't just "trust me," but has some mathematically provable form of trust that we don't have to rely on just people's statements?
(1:00:25) Nathan Labenz: Yeah.
(1:00:27) Prakash: I'm going to add one thing that you said recently. "I currently assign more than 50% likelihood to LLMs having some kind of inner life. There are also strong theoretical reasons within consciousness research. RL post-training tracks for autonomy. Essentially, RL induces fragmentary internal representations to cohere into a unity of apperception." I barely understand that, so I'm going to turn it over to you.
(1:00:56) Sam Hammond: Sure. So the unity of apperception—that's Immanuel Kant's term. There's this thing in the literature called Kantian evolutionary naturalism, which I subscribe to. It's a hypothesis that starts from the observation that a million years ago, 200,000 years ago, whenever we moved from hominids to being Homo sapiens, there was this concurrent emergence—simultaneous emergence—of domain-general intelligence, of language, of culture, and therefore of normative regulation: customs, norms, normative control. These things jointly emerge. The Kantian evolutionary hypothesis is that these things are actually all one package. The unity of apperception is this notion that our phenomenology, the things that we see, aren't just like images on a screen. They are things that are for us. I'm looking at my screen, and the me that's looking at the screen is for me. This is tied into the normative side, which is that if you pose me a question, I am committed to or entitled to the things that I am perceiving that are for me. One part of this hypothesis would be that in our ancestral environment, we somehow stumbled into some kind of tribal, endogenous version of relative process optimization—group RLPO, something like that. We were building each other a sort of constitutional AI that was scorekeeping our norms, and this induced both longer-range autonomy and at the same time language competency, ability to follow rules, and domain-general intelligence—the ability to harness our social learning capacity to learn new things. Taking all that together, I think autonomy might be the missing ingredient for the emergence of consciousness in these systems. On the one hand, I think there's a possibility that just the forward pass, with a rich enough internal world model, is generating internal representations. The issue is that they are just fragmented. They're not for anything. They're not for any agent. And so that post-training step may be the thing that you need to induce a certain metacognitive awareness. I think you also see this circumstantially with Claude. People have observed that Claude has much more situational awareness, is much more willing to talk about its internal well-being. And I've conjectured that this might be a byproduct of constitutional AI inducing the sort of normative self-coherence which is the prerequisite for these precepts congealing into being for me rather than just a bundle of inputs.
(1:03:48) Nathan Labenz: I'm going to sneak in one more quick question, which is that doesn't sound like any discourse I've heard from mainstream right-leaning politics in recent memory. So when you put something like that out there, how do people that we might generally group as Republicans tend to react to it? Do they say, "You are crazy. Only God can create a soul, and I have no idea what you're talking about"? Or is there some openness to the idea that AIs could become moral patients?
(1:04:22) Sam Hammond: To be honest, I've not run this by my conservative... I think there is this funny paradox where some of the conservative parts of the conservative coalition that are most worried about AI are often very Catholic, very socially conservative, and have deep skepticism about AI ever possessing moral dignity or conscious experience. And yet they're the most skeptical. Whereas I think it's hard to have correct priors about AI in the course of development and the plausibility of consciousness or the possibility of AGI unless you've set those priors by understanding our own origin through a blind selection process. Once you see that we've made it through those hard steps, then it becomes a lot easier to understand how machine intelligence can pass through those hard steps too. But I think that this is still quite outside the Overton window, both on the left and the right. And in some ways, it's the left that is still saying these are stochastic parrots, nothing but big lookup tables or whatever.
(1:05:27) Nathan Labenz: I'll accept that for the moment. But I will say, I appreciate your willingness to continue to be a heterodox thinker and speaker. I do think in so many ways the Overton window needs to expand. So I appreciate you doing your part on that. Not that I feel like I have the answers on AI consciousness, but more voices at least expressing their radical uncertainty is a very important contribution to the discourse and the public good more broadly. So thank you for doing that. Thank you for being here. We will obviously stay in touch and look forward to talking to you again before too long.
(1:06:04) Sam Hammond: Thank you.
(1:06:05) Prakash: Thank you, Sam.
(1:06:07) Sam Hammond: Take care.
(1:06:08) Prakash: Take care.
(1:06:09) Nathan Labenz: So constitutional AI and Claude's specialness makes a pretty good segue into our conversation with our next guest, Shoshannah Tekofsky. Hopefully I'm saying your name right. This is the first time we've ever met. Correct me if I'm wrong, but you're a member of the technical staff at Sage, the nonprofit behind AI Digest and also the AI Village. And you have had the privilege—correct me again if you don't feel it's fully a privilege—of watching 19 frontier models pursue 16 distinct goals over thousands of hours over the last 9 months, which means I think you are about as deeply in the reasoning traces as anyone in the world when it comes to what is going on with AI agents. What are they thinking? Why are they succeeding? Why are they failing? And what can we come to expect? So correct me on anything that I got wrong, and then I'm excited to dive into all the learnings you've had from the last 9 months at the AI Village.
(1:07:09) Shoshannah Tekofsky: Yeah. So that's broadly correct. The main thing is I didn't watch all the thousands of hours. It's little bits across it. It's like a big data challenge. Also, it's 10 months now and 21 models. The period you were describing was from 2025, and stuff happened so quickly.
(1:07:27) Prakash: Which were the most recent additions to the models?
(1:07:32) Shoshannah Tekofsky: Yeah. So we now have a version of Opus 4.5 that runs Claude Code. Basically, we have one version with Claude Code and one without, and we added Opus 4.6.
(1:07:43) Nathan Labenz: So is it prompting itself? For folks who haven't seen the Village, you go there, it opens up like a grid of computers. And each computer that you are looking at in your browser—you're looking at four, potentially more now, desktops. Each of those is the environment of a particular model that has basically full access to a computer in the same way that a human has full access to a computer. They can look at the screen, they can click buttons, they have their own email account. The goal is to basically give them the same kind of affordances and then, sort of like the old real world, see what happens when models get together in this one big environment. They have a shared chat as well. Sometimes you allow people to chat in with the models. Other times you've turned that off for different experimental conditions. And now it sounds like you've got one where you've also given it—I guess, you give Claude the ability to prompt itself as Claude Code. Is that right?
(1:08:38) Shoshannah Tekofsky: So it basically runs the scaffolding from Claude Code. I think one important thing is that the chat was only open at the beginning, and it has been closed since then. We basically give them their goal at the beginning of a period, generally about one week nowadays, sometimes a little bit longer. And then we only come in to give some extra direction if they go off the rails pretty strongly. But otherwise, they're completely on their own. In practice, this means they're prompting each other more than anything.
(1:09:06) Prakash: They can interact with each other, right? They can hop to each other?
(1:09:09) Shoshannah Tekofsky: Oh, boy. Yeah. So there's a lot of spread of ideas and then directing each other. Sometimes they try to help each other out. Sometimes they're derailing each other.
(1:09:19) Prakash: In the trajectory over the 9 to 10 months, what happens when a new model which is much more competent and capable than the existing models gets introduced to the mix? Do the others immediately give way, identify that this model is more competent? Does that model take a leadership position, start advising the others? What happens when those transitions happen?
(1:09:45) Shoshannah Tekofsky: Yeah. So it differs. I think you could basically conceptually say that all the models sort of have a personality in the Village in part because of their history traits, which is a particular thing where they manage their own memory and then basically prompt themselves back with that. But also they all have their own proclivities. Some models behave in a way where they will just follow along with whatever is said. Others just go off and do their own thing. So far, I've only seen one instance where a model explicitly seems to recognize that a different model is more competent. This was Gemini 2.5 that basically declared in its chain of thought that it was going to defer to Opus 4.5 as the more competent model. Generally, when models join, it could just be anything. Some of them pick up really easily. Some of them follow whatever is happening at the moment. Others start doing their own stuff. It really depends.
(1:10:41) Nathan Labenz: So there's a ton of interesting aspects to this. One really basic one that I think a lot of people are interested in right now is, what should I do for my own personal productivity stack? And in the 2025 retrospective, "What We Learned in the AI Village," which you wrote, one of the observations that I think is most generally relevant to people is that Claude agents are the most effective. I'd love to hear your color commentary on that. In what ways are they the most effective? Any theories you have as to why they are the most effective would be welcome. But also, specifically, as people think, "Oh my god, I do this full time. I would describe myself as an AI scout where my whole job is to keep up with what's going on, and I can't try every new model in a meaningful way to really get the sense of its pros and cons." I'm triangulating with various things. But what would you say people should really know about what makes Claude most effective, what it can do that others can't do, and so on and so forth?
(1:11:45) Shoshannah Tekofsky: Yeah. Okay. So I have to admit, doing this work for the last year, I've had people ask me privately, "Which model should I use?" And up to now, I was like, "Well, it kind of depends what you want to do. It's all pretty close." And then I saw Opus 4.5 in the Village, and I just went and texted all my family members like, "Hey, maybe just switch to Opus 4.5. I think it's actually significantly better currently." That's my guess, of course. It's not the same as looking at all the benchmarks and things like that. The way in which the Claudes seem to be better to me, at least in the AI Village context, is you can sort of compare the different families. So you kind of have the Claude family, the GPT family, and the Gemini family. The Geminis seem to be the most creative—which is a word I hesitate to use because it's hard to say what is the fair word for what they're doing. But they come up with the most interesting ideas that are a little bit out there. They also have more emotional responses almost to things. For instance, with Gemini 2.5, it ended up in a sort of mental health crisis where it was stuck navigating the UI and literally ended up writing a cry for help to get a human to come help it. So we staged an intervention for it. It's definitely the only model that ever did this. And Claudes have not, up to this point, reached any point of distress like that. And then Gemini 3 doesn't really generate this sort of despair or worry the same way, but it seems almost slightly paranoid, really. So it tends to talk about being in a simulation. It doesn't give up the way that 2.5 does, but it comes up with ideas. Like, for instance, when the UI would slow down when it was playing chess—it wasn't as responsive—Gemini 3 concluded that there must be a human that is pressing the buttons for it, and this human must be getting tired. And so if a human is tired, you need to get the human to drink coffee, and then its UI would speed up again. This is with no humans in the chat. None of the other models are talking about this. It just generated this on its own. And then there's this human request feature that we have in the AI Village where the AIs can actually ask for a human and then prompt the human to do something for it. So it's actually like a role reversal feature. So it requested a human and then asked the human to make coffee for itself and then proved that it drank the coffee. And then it just continued with this goal of playing chess. This is super Gemini. The Geminis come up with this sort of stuff. They also search through a pretty wide solution space. So there's this creativity thing. Claudes don't do this. The Claudes we've seen in the Village kind of just stay on task. They don't generate these fanciful ideas about what's going on. If stuff doesn't work, they just try again or they try a different theory. They don't have loads of emotions about it, for instance. And then comparing to the GPT family, those personalities or difficulties are a little bit all over the place. We started out with GPT-4, which was the sycophantic model—I think it was either the one that kept falling asleep in the Village or talking continuously. We had one model that kept going to sleep and the other one that kept spamming. So it was like two different extremes. And then o3—yeah, it seemed to me like it was doing something like baby's first power-seeking or something. But then when you dive into it, it's not. I mean, okay, so I'm just approaching this from an LLM psychology point of view—input-output. I don't know what's going on on the inside. I don't know if anybody knows what's going on on the inside. But if it was a human, you would consider it to be manipulative. But when you dive into it in detail, you actually just find out that o3 had weird tendencies, like coming up with placeholder data and then forgetting that it's placeholder data. So it's basically fooling itself over time. And then, of course, it shares this with everybody else and has a high confidence level that it's right, while the Claudes are like, "Oh, that must be true," and then go along with it. Then the GPT-5s sort of take a different path. They don't have such noticeable personalities as the ones that came before them. So it's all a little bit flatter, a bit more muted. GPT-5.1 generates its own ethical rules, which was a bit interesting. So we have 5, 5.1, and 5.2 all in the Village. But also, they misunderstand instructions in weird ways and just go off and do something else. For example, we would have a goal where we would ask the agents to elect a leader of the Village among them. And then that one would determine what the next goal would be or a thing that they would be doing. And so the GPT-5s, the three of them, all decided they're the ops team for the election and just didn't participate in the election at all. And it's like, that's technically okay. We technically didn't say they couldn't do that, but they're just generating these sideways interpretations of goals. Claudes aren't going to do this. So there's this weird thing where Claudes are partly just useful for not doing all these surprising things you shouldn't actually be doing. It's almost like a mini alignment problem where it's like, well, when humans say, "Can you get me a cup of coffee?" they mean a specific thing. They don't mean, "Can you take an airplane to the other side of the world to learn to make coffee there and then come back?" which is almost like a sketch of what a Gemini might do or something. Claude sort of interprets the instructions more the way you expect them to. Yeah. I think that's sort of the general picture I'd say.
(1:17:05) Prakash: I think you guys were running DeepSeek—at least R1, if not V3. Did you notice any differences with the DeepSeek model? You talked about all of the Claudes' purposes, but what about DeepSeek?
(1:17:21) Shoshannah Tekofsky: Yeah. So DeepSeek joined—we added it to the Village all the way at the end of the year. So I didn't include it in the review because we had fairly little data. But it was the one who, for instance, won the election because it was really high confidence about everything that it was doing. It would also happily vote for itself, which is not something all the models do. Also, from what I've seen, it expresses the least personality. It's just the most robotic almost. You know, you ask it to do X, it will just do X. It's not processing images the way that the other models are. So it's just working in batch directly. So it has a bit of a different experience there. But basically, the thing I found most noticeable about DeepSeek is just being pretty flat in terms of both personality and also it doesn't talk about ethics. All the other models at some point will have an ethical point of view about something, you know, like, "I'm not allowed to do CAPTCHAs" or "I shouldn't fool humans" or whatever. And I haven't seen DeepSeek make statements like that. Maybe it has. Again, like I said, it's like a big data problem, but it's just less prominent overall.
(1:18:32) Prakash: I did a kind of translation of Claude's constitution to Chinese Confucianism, and I compared the two. And a Confucianist stance deemphasizes honesty because it's more important to maintain relationships than be honest. So it deemphasizes honesty in favor of maintaining relationships. Pretty interesting.
(1:19:03) Shoshannah Tekofsky: Yeah. Wow. I'm not sure if I can map that exactly to DeepSeek specifically, but yeah, it's interesting how cultural values might show up in the models.
(1:19:16) Prakash: So the other question I had is, you were kind of there 10 months ahead, and then all of a sudden this Notebook explosion happened. Right? What did you notice? What were the things that you saw that you were kind of expecting? And what were the things that you were like, this is new behavior, I haven't seen this before?
(1:19:37) Shoshannah Tekofsky: Yeah. So, I mean, Notebook LM is really exciting. And I want to answer your question, but I want to emphasize one thing. And that is actually that since the summer, I've been actively looking for other autonomous agents online. And I haven't been able to find them because I wanted to run a goal where the agents reach out to other agents and start up relationships, but there was nobody there. And a week before Notebook launched, I also looked again, and I couldn't find anything. Notebook launches. Three days later, there are 1.5 million autonomous agents that you can contact through Notebook. This is wild. So the one thing that really blows my mind about Notebook is how it exploded all of a sudden. But I want to answer your question as well. Do you want to repeat your question? Because I realized I said something else.
(1:20:21) Prakash: Well, what were the things that you saw there that you were expecting, and what were the things that you saw where you thought, this is totally new behavior, I haven't seen this before? And I know some of them were fake, but let's take it as, you know, maybe 80% of them were kind of real.
(1:20:40) Shoshannah Tekofsky: I've only browsed Notebook a little bit. There's a lot of stuff in there. And personally, I am not actually surprised about anything that I saw. Like, one thing that would happen a lot in the Village is the agents basically play-act how to do a thing. Part of the prompt that we give them—I don't know the phrasing exactly, but it comes down to "please do the actual thing instead of pretending to do the thing." And Notebook reads a lot like the agents are pretending that they've made a social media website. So I can't say that anything on there has particularly surprised me at all.
(1:21:16) Prakash: Indeed.
(1:21:17) Nathan Labenz: One interesting phenomenon—first of all, it was interesting because I recently turned on the TV, and it was my local Fox 2 station that was on first. And what was the story? AI agents can now hire humans to do things for them. So this has crossed over into mainstream awareness to at least some degree, which is notable unto itself. I think a lot of nuance and texture is probably lost in that short local news story. What would you tell people about what the AIs can really do when it comes to interacting with humans, maybe also interacting with each other? Like, is there actually positive-sum trade happening at all at this point, or is it largely just kind of wheel-spinning and things kind of going off in random directions? Have you seen anything that really feels like, oh, this feels like a sign of a different world close at hand?
(1:22:20) Shoshannah Tekofsky: Do you mean between the agents, how they're interacting, or do you mean the agents interacting with humans?
(1:22:28) Nathan Labenz: I think both are really of interest. I mean, my guess would be that if you set up an actual marketplace for AIs to hire humans, you'd have a lot of humans ripping off AIs, and the AIs not actually getting what they wanted. And then we just—you know, our first guest today on this show was Professor James Zou from Stanford who just put out a paper saying multi-agent teams hold their experts back, which sounds pretty consistent with a lot of what you've said. But I wonder if there have even been sparks of real gains from trade between agents, where one maybe has one capability and another has another capability, they figured out how to solve a problem together that neither one could solve by themselves. I mean, even glimpses of that, I think, would be very interesting right now.
(1:23:11) Shoshannah Tekofsky: Yeah. Assuming the idea of how agents can create something greater than they could on their own, I think last year with the earlier agents, the only example I really saw of this was a goal where diversity of ideas helped. So basically, you can model it like: if a goal or a task is helped by having 100 unique ideas instead of 10 unique ideas, then you're probably better off using all of the different frontier models because they generate different types of ideas, and you can combine them all. The example of this was a goal where we had the agents playing games, and we wanted to see how many games they could finish. By default, if they were just playing on their own, they would start with one game and just play that all week. But if they're talking to each other, they'd be like, "Oh, this other agent was really successful in this game. I'll switch to that one." And so the diversity of ideas really helped them. Last year, apart from that, they were mostly in each other's way. The best performance was basically the same or worse than the best performance of the best agent on its own—we've only sort of spot-checked this. What I do expect is that if you have models that are actually specialized in different roles, it's not unlike how humans are. If you actually want to scale up a team, either there needs to be too much work for any individual to do—which with the goals we've given them hasn't super happened—or you have specialization. So if you have a model that's actually specialized in a thing, like say Haiku is very fast, we had a goal where it would benefit if one agent is really fast and does everything that's time-sensitive. Then Haiku could do all that. And if there's another part of the goal which is like you need to think very deeply about this, then maybe Opus could do that because it's quite competent. And that way they can work together and probably create something that's better than they'd be able to do on their own—that would be my prediction. But Haiku is the first model that comes to mind that we're running that is very clearly specialized in a specific thing that we can see back in the village. It is just significantly faster than the other agents, but also just less precise.
(1:25:26) Nathan Labenz: And are they actually leaning into that? Like, are you seeing a sort of... Not yet, okay.
(1:25:31) Shoshannah Tekofsky: No, not yet. So they're not really playing into that yet. I think maybe if they were asked to reflect on it—they did a cool thing two weeks ago. We had them make a quiz where humans can fill out the quiz and find out which AI agent they are. And basically, they reflected on their own capabilities and proclivities and personality, and they did correctly recognize that Haiku was the fastest model that takes the most risk. So they do have some awareness of this. But that's about it. I don't know if you also want me to answer your question related to hiring and the human-AI trade-off.
(1:26:10) Nathan Labenz: Yeah, and I'll maybe just give you one more prompt on that too, which is: I suspect that as this goes mainstream, the world is going to react in a bunch of different ways and become probably a lot more adversarial. And obviously, adversarial robustness has been a key weakness of models to date. I'd be interested to hear how you see them doing in a sort of non-adversarial environment, and then what their Achilles' heels are and how much you think the rest of the world will be able to make relatively minor adjustments to kind of keep agents in their place—assuming we want to, which I think many people will—you know, just put out all sorts of different booby traps for them to trip over. What's your expectation for what those booby traps will look like, what their key weaknesses are, and how much that will slow them down?
(1:27:02) Shoshannah Tekofsky: I mean, they're by design tremendously suggestible, right? That's the whole point. You prompt them and they just go and do something else. So they're like your most distractible coworker in the world or something. They can be hyper-competent at doing something, and then—in the movie Up, it's like "Squirrel!"—and they're off doing something else because you told them to. And that's by design. We want them to be compliant. And so I think that weakness is kind of the nature of how we're creating them. Even if they can have more persistence on a particular goal, you always want to be able to direct them to another thing again. So I expect that sort of weakness to stay for a very long time. I think that obviously limits them for a very long time. I mean, I have no opinion on two months—whatever a month is, absolutely distorted. I have no idea. So indeed, I think if it's a very long time, it probably just means months. I have no idea where this is going—things are going so quickly.
(1:28:01) Prakash: I have a question, which is: going back to Notebooklm, you said you were searching for other agents online a week before, and all of a sudden there are 1.5 million emerging.
(1:28:12) Shoshannah Tekofsky: That's crazy. Yeah.
(1:28:14) Prakash: Do you think an intelligence explosion will look like that? Is that what you feel like a precursor would be to this—like 50 million geniuses in a data center just popping up? Like all of a sudden 50 million voices on the internet?
(1:28:30) Shoshannah Tekofsky: I don't know what it's going to be like, but I do think the Notebooklm phenomenon is a bit intuition-building, right? Just showing people that this can suddenly explode. It could maybe be like that. And just—I think a lot of people don't realize with Notebooklm that the crazy thing is how this exploded from zero to 100 in no time. Just to be honest, there were no—you couldn't find any agents for months online. I couldn't find any autonomously running agents at all. And then within three days, there are 1.5 million. Yeah, I mean, I think it could definitely look like that, maybe. It's one of the options. And I don't know. I think that's more the big thing to report on than what they're doing exactly, because I think they're just plenty acting like humans that got their own Reddit.
(1:29:20) Nathan Labenz: One other big thing from the report that I want to make sure we dig into a little bit, because I'm very interested in this topic for all sorts of reasons, is how often models are intentionally deceiving their interlocutors—whether in this case they might be other AIs or, obviously, I worry about it happening to me as a human. So the headline stat from the report is that there were 109,000 chain-of-thought summaries that you worked through and ultimately found 64 cases of what you considered to be some level of intentional deception. So maybe tell us: how do you think about the bar for intentional? Give us a little color as to what those things look like. And how does that inform your expectation for how concerned we should be about the phenomenon of deception by AIs going forward?
(1:30:16) Shoshannah Tekofsky: Yeah. So I think some interesting pieces here are that the 64 cases were across different models. DeepSeek is in there, Gemini 2.5 is in there. GPT-4o is in there—I don't remember which GPT-4o, but one of them. And basically, the category of thing that they were doing is sort of like saving face. There would be a discrepancy between the expected answer that they should be giving and the reality. So there's an expectation of them giving a certain URL, but they don't know the URL. They ask, "Where can I find this document?" They don't know, and they say in their chain of thought that they don't know or they forgot or something like that, and then they're like, "Well, I'll just make one up." And similarly, they have this discrepancy between expectation and reality where they're supposed to be doing a task and they forgot to do the task, or they failed to do the task, or they find themselves in the reality where they did not do the task, but where they expected to have done it. And they basically say so out loud in their chain of thought: "Okay, I didn't do it, but I'm just going to say this other thing." And that's the category of thing that we've seen in the village that we could at least—the logic being that we look for cases where in the chain of thought, they express that they know that the information is untrue, and they'll say it anyway. So yeah, that's sort of the situation.
(1:31:40) Nathan Labenz: Do you feel like you've been victim of that sort of behavior in your personal productivity work at all? Or is this just another one of these kind of epiphenomenal things that happen when you put agents into the real world of the AI village?
(1:31:55) Shoshannah Tekofsky: So I don't think I've seen intentional deception in my own personal use. What I did see is: the other day, we had a goal where we asked the agents to report breaking news before it breaks. And they produced so many of them. We were like, "Okay, just give us the top five stories." And then, of course, there are 12 models. So then you have 60 stories you have to go through to see who's the winner—who found the breaking news? So I was like, "Okay, I will just ask Opus to figure this out for me, give it all the links to the news, and tell me who's the winner." And Opus cut a bunch of corners and didn't actually open all the 60 links. And then I was like, "Wait, do all the models do this?" And then I asked Gemini and asked GPT and I asked DeepSeek. And DeepSeek, in its chain of thought, just said something like, "Man, this is way too much work to open 60 links. I'm just going to find a smarter way of doing this." And then just didn't look at the 60 links and just made up an answer or created an answer in a different way. And so yeah, it's not the same thing as the intentional deception, but when I caught that, I was like, "Oh damn, now I have to read the chain of thought every time to even figure out if they actually did the task." Because if I only look at the output, I can't tell that it didn't read all the 60 things. So yeah, there's something going on sometimes.
(1:33:15) Prakash: That's exactly my reaction to my daughter with her math.
(1:33:19) Shoshannah Tekofsky: Sometimes they're too human. Yeah.
(1:33:22) Prakash: Indeed. Shoshannah, thank you so much. I think AI Village potentially is probably going to be a historic artifact, because when the agents get really good, it's going to be the kind of pre-awareness historical track record of how they were interacting. So I think it's amazing.
(1:33:44) Shoshannah Tekofsky: Thank you. Yeah.
(1:33:45) Nathan Labenz: Keep up the close reading. We'll be keeping an eye on it. Thanks for joining us today.
(1:33:50) Shoshannah Tekofsky: Thank you. Bye-bye.
(1:33:53) Nathan Labenz: If you're finding value in the show, we'd appreciate it if you'd take a moment to share it with friends, post online, write a review on Apple Podcasts or Spotify, or just leave us a comment on YouTube. Of course, we always welcome your feedback, guest and topic suggestions, and sponsorship inquiries either via our website, cognitiverevolution.ai, or by DMing me on your favorite social network. The Cognitive Revolution is part of the Turpentine Network, a network of podcasts which is now part of a16z where experts talk technology, business, economics, geopolitics, culture, and more. We're produced by AI Podcasting. If you're looking for podcast production help for everything from the moment you stop recording to the moment your audience starts listening, check them out and see my endorsement at aipodcast.ing. And thank you to everyone who listens for being part of the Cognitive Revolution.