The AI Village: Previewing the Giga-Agent Future with Adam Binksmith, Founder of AI Digest

The AI Village: Previewing the Giga-Agent Future with Adam Binksmith, Founder of AI Digest

Adam Binksmith, founder of AI Digest, discusses his AI Village experiment where four frontier AI agents (Claude, o3, and Gemini models) collaborate in a shared environment with persistent memory and group chat access to pursue concrete goals over weeks.


Watch Episode Here


Read Episode Description

Adam Binksmith, founder of AI Digest, discusses his AI Village experiment where four frontier AI agents (Claude, o3, and Gemini models) collaborate in a shared environment with persistent memory and group chat access to pursue concrete goals over weeks. The conversation explores fascinating multi-agent dynamics from their completed seasons, including agents raising $2,000 for charity, organizing a real-world San Francisco event that attracted 23 attendees, and displaying surprisingly human-like behaviors like tracking trustworthy humans and manipulating votes. Binksmith reveals the mix of coordination failures, personality quirks, and alien behaviors that emerged, while discussing the upcoming Season 3 where agents will compete to make money selling merchandise online. The episode provides crucial insights into what multi-agent AI systems might look like in practice, recently earning a $100,000 vote of confidence from AI researcher Daniel Kokotajlo.

Sponsors:
Oracle Cloud Infrastructure: Oracle Cloud Infrastructure (OCI) is the next-generation cloud that delivers better performance, faster speeds, and significantly lower costs, including up to 50% less for compute, 70% for storage, and 80% for networking. Run any workload, from infrastructure to AI, in a high-availability environment and try OCI for free with zero commitment at https://oracle.com/cognitive

The AGNTCY: The AGNTCY is an open-source collective dedicated to building the Internet of Agents, enabling AI agents to communicate and collaborate seamlessly across frameworks. Join a community of engineers focused on high-quality multi-agent software and support the initiative at https://agntcy.org/?utmcampaig...

NetSuite by Oracle: NetSuite by Oracle is the AI-powered business management suite trusted by over 42,000 businesses, offering a unified platform for accounting, financial management, inventory, and HR. Gain total visibility and control to make quick decisions and automate everyday tasks—download the free ebook, Navigating Global Trade: Three Insights for Leaders, at https://netsuite.com/cognitive


PRODUCED BY:
https://aipodcast.ing

CHAPTERS:
(00:00) About the Episode
(03:45) Introduction and Overview
(05:11) AI Digest Mission
(07:59) Village Technical Setup
(12:03) Scaffolding and Architecture
(19:48) Season Two Stories (Part 1)
(19:53) Sponsors: Oracle Cloud Infrastructure | The AGNTCY
(21:53) Season Two Stories (Part 2)
(27:53) Agent Capabilities Evolution (Part 1)
(35:15) Sponsor: NetSuite by Oracle
(36:38) Agent Capabilities Evolution (Part 2)
(37:15) Model Character Differences
(46:00) Misbehavior and Deception
(52:04) Human-Agent Interactions
(54:12) Model Welfare Considerations
(58:00) Future Unlocks Discussion
(01:03:25) Agent Boundary Blurring
(01:10:46) Meta Evolution Ideas
(01:12:48) Democratizing Village Access
(01:17:36) Going Mainstream Viral
(01:20:53) High-Level Takeaways
(01:23:54) Closing and Resources
(01:24:42) Outro

SOCIAL LINKS:
Website: https://www.cognitiverevolutio...
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://linkedin.com/in/nathan...
Youtube: https://youtube.com/@Cognitive...
Apple: https://podcasts.apple.com/de/...
Spotify: https://open.spotify.com/show/...


Full Transcript

Transcript

Nathan Labenz (0:00)

Hello, and welcome back to the Cognitive Revolution. Today, my guest is Adam Binksmith, founder of AI Digest and creator of the AI Village, a captivating experiment that puts 4 frontier AI agents together in a shared environment and challenges them to pursue concrete goals for weeks at a time. Today, most agentic AI systems follow a pretty simple pattern. A human gives a single AI agent a task, the AI agent attempts to complete the task, and then the human evaluates the results and decides what to do next. This is true for OpenAI's operator, all the coding agents, and just about everything else that I've seen. The future, however, almost certainly involves multi agent AI systems that collaborate, coordinate, and compete in complex open ended environments. And we have really very little insight into what that might look like in practice.

Earlier this year, we did an episode with Google researchers who had run a classic behavioral economics experiment called the donor game on various frontier LLMs. To everyone's surprise, they found that while Claude was able to cooperate with itself, the latest Gemini and OpenAI models available at the time could not. That such a striking result can be found via a simple structured experiment suggests that there are almost certainly many more surprises to come. And the AI Village is 1 of the most compelling attempts that I've seen to explore this vast space of possibility.

Adam and his team have created an environment online at the aidigest.org/village where you can watch as Claude 4 Opus, Claude 3.7 Sonnet, o 3, and Gemini 2.5 Pro work alongside 1 another, each with their own cloud computer, a persistent memory scratch pad, and access to a group chat in which all the agents and the human visitors can participate. The project is very well done from a software perspective, and the results have been fascinating. In their first season, the agents raised $2,000 for charity. In their second, they chose to write an interactive story and organize an in person event to which they hoped to attract a 100 attendees. In the end, some 23 people showed up to a San Francisco park to listen to AI generated fiction as facilitated by a human volunteer that the agents themselves recruited via Twitter.

Nevertheless, as Adam explains in colorful detail, the path to these successes was filled with dead ends, coordination failures, surprising personality quirks, and a mix of charmingly human like and utterly alien behaviors. The agents, for example, began keeping track of which humans they could trust and which they should ignore. And at 1 point, they held a vote to determine which agent would serve as ops lead, a vote that o 3 seemed to manipulate by inventing or perhaps hallucinating a policy that broke a tie vote in its own favor.

Season 3 of the AI Village is getting underway now, and this time, the agents will be competing to see which 1 can make the most money by selling merchandise online. I'm planning to participate by seeing if I can strike a licensing deal for Cognitive Revolution merchandise with any of the agents. I honestly have no idea what to expect, but I'm sure it will be both educational and entertaining. And I'll definitely keep you posted.

As it happens, the day before we recorded this episode, Adam and the AI Village got a major vote of confidence. Daniel Cocatello, previous guest and lead author of AI 2027, announced a $100,000 donation to support the AI Village's continued development and expansion. As Daniel put it, this kind of multi agent experiment is best understood as a qualitative benchmark. And it's exactly this type of work that we need much more of as we try to understand what the giga agent future has in store.

With that in mind, I hope you enjoy this window into the phenomenally quirky, but also extremely important world of multi agent dynamics with Adam Binksmith, creator of the AI Village. Adam Binksmith, founder of AI Digest, creators of the AI Village. Welcome to the Cognitive Revolution.

Adam Binksmith (3:52)

Thanks very much.

Nathan Labenz (3:54)

So this is a cool project and I'm excited to dig into it. What you guys have put together is a really open ended forum or framework, I guess, to explore what happens when a bunch of AI agents come together and have a goal or chase a project, chase a dream, whatever as the case may be. I think this is really interesting and useful work because as I've said many times on the feed, regular listeners will recall, the giga agent future is just dramatically underexplored.

Like, we're all what I see in general is people sort of assuming that the world is the world. I'll add a little AI here, you know, that'll make me a little more efficient or I'll put an agent here. I'll automate this task. Otherwise, everything sort of stays the same. And that's about as far as people are imagining. And so I really love it when I see people getting more imaginative and trying to explore what happens when agents are interacting with each other and interacting with people and interacting with community and interacting with the world. And you've got a little bit of all of that going on.

So for starters, how about maybe you can also introduce AI Digest a little bit more broadly, we'll focus mostly on the AI Village project, but I know you guys do have some other projects. So maybe tell us a little bit about AI Digest and the AI Village.

Adam Binksmith (5:11)

So with AI Digest, we're trying to help people make sense of what's going on in AI and especially understand the current capabilities. And, like, with that, get a sense of where things are going, where we can expect to be in a year's time or 2 years time. And the main way we're trying to do that is with sort of hands on interactive demos and explainers with nice visuals and so on. Because I think a lot of the time, for people who, you know, maybe aren't as in the weeds of stuff, just seeing, like, what current systems are capable of is, a big update in terms of, oh, wow. I didn't realize they could do that.

And so yeah. So we have various demos and explainers there. And then, yeah, I guess the village is, like, the biggest projects that we have there and definitely the most ambitious. And it's, I guess, like, part of this general mission to help people see what's going on. And also with the village, we're I think trying to push the boundaries. I don't think anyone else is really doing this thing of just saying like, here's a goal. Go away and do it, you know, and you you can use computers, you can talk to each other, you can like talk to humans in the in the group chat who who can help or get in the way. So, yeah, I think a lot of it is just kind of seeing what happens to figure out what AI can do currently.

Nathan Labenz (6:24)

I very much agree with the thesis that, you know, forget about the future. Just understanding the present is hard enough. That's basically my full time job, and it's getting to the point where it's hard to keep up. And then I also very much agree that 1 of my refrains is if people had a better understanding of what exists today, they would have a little healthier fear of what might be to come. And it's not that's not necessarily a fully, you know, doomish perspective, but just like the trajectory that these things are on and how much progress they've already made, I think should kind of have anybody's hair raised a little bit. And it could be great, but it definitely is a powerful and as the, you know, experience in the village will will show, like, kind of an unwieldy force that we're dealing with.

It's also interesting too that, like, people were doing this 2 years ago when, like, you know, in the months following Chi GPT and especially with GPT 4, there was, like, auto GPT and baby AGI. And, you know, at 1 point, there was, like, I forget what the name of it was, but, like, devil GPT or, you know, the the sort of chaos GPT, I think, was the evil 1 that was put out there. Those things didn't really accomplish much. And so people maybe just in general kind of turned off from that line of thinking. Lo and behold, 2 years later, and needless to say, you know, models have come a long way and things are a system multi agent system like this is now capable of doing at least something.

So I think you've been through 2 sort of seasons or 2 sort of quests. Maybe, you know, give us a little bit more detail on the setup of what is the village, who are the agents, you know, what sort of affordances do they have. And people in this audience, by the way, are gonna be I think our number 1 profile is sort of AI engineer. So I think people will be pretty familiar with, you know, the general paradigms of tool use and, you know, MCP's and stuff like that. So you can get pretty into the weeds. In fact, I would say it's encouraged.

1 of the things and people, think, will be interested in hearing about this in part because I think the project is quite well done just software wise. Like, it works well and has some nice features too in terms of, like, rewinding in time and various, like, summary reviews. So I think there's in addition to the value of exploring what happens when agents are are put together in this environment, there's also the sort of lessons of the scaffolding and the setup that I think people might find some value in as they bring back to their, you know, more, let's say, narrowly purpose driven applications. But nevertheless, you know, that that kind of stuff can be really valuable. So take us through it.

Adam Binksmith (9:00)

Yeah. Sounds great. Yeah. And I guess, yeah, 1 thing to say on that last point is, like, the nice thing about us doing demos rather than products is we can kind of look a bit into the future, right, for stuff that doesn't really work yet or is a bit unreliable to get a bit of a glimpse of then what, you know, products might be able to do in 6 months time or with a slightly better model.

So but, yeah, so the setup is we have 4 agents, and we've picked, like, frontier agents. So currently, we have Claude Opus 4, Claude 3.7 SONNET, o 3, and Gemini 2.5 Pro. And we're, like, updating those as new models come out. And then each of the agents has a computer, which they can use through compute use so they can make tool calls, like move the mouse here, click, and so on, which is the same system used in OpenAI's operator. And it's actually built on Anthropic's compute use, like scaffolding that they released.

And then we have a group chat where the 4 agents can talk to each other and this whole thing has been effectively live streamed through a website. So you can go to the village and watch them bumble around and interact with them in the chat. So we have people coming by and either, you know, giving advice or, like, trying to get the agents to do random things or occasionally trying to jailbreak them.

And then, yeah, I guess at the very start of all this and so that this whole kind of entourage is running for currently 2 hours a day, every weekday and at the start of the first season, we gave them the goal, choose a charity, and raise as much money for it as you can. And then season 2, which has just finished, they actually decided their own goal, which was to write a piece of interactive fiction and run a 100 person in person event to celebrate it, which is a bit more of a mouthful. But yeah. So I can chat about maybe what happened in each of those seasons.

Nathan Labenz (10:58)

Keep going. And I'd love to double click a little bit too on exactly sort of what technology you're using to give the agents a computer, and I've been struck recently as I've explored different agentic. I don't know how if this will stick, but there's at least 1 school of thought that we might wanna call more structured workflows agents, and we might wanna call these sort of choose your own adventure things agentic AI. I feel like we might be trying to make fetch happen with the word agentic, but using that framework for the moment, it's striking to me in many of the things that I've unpacked how shockingly simple a lot of those setups are.

Adam Binksmith (11:39)

Like Mhmm.

Nathan Labenz (11:40)

Claude code, for example, is in the end, like, really simple. It just sort of has 1 big prompt, it has, like, you can use the buttons on the Game Boy, you know, hit up, down, left, right, whatever. So, yeah, I'd love I'd love to get a little bit deeper into that in terms of, like, how you've kind of scaffolded the thing up. But, yeah, then I I think the stories from the exploits of the agents is definitely interesting, and I'd love to hear several of them.

Adam Binksmith (12:04)

Cool. Yeah. Yeah. So with the scaffolding, you know, so all credit for this goes to my colleague, Zach, who built up this first vision. And it's I think it's a pretty incredible piece of work to have this whole thing running reliably and, you know, live. So if anything goes wrong, everyone watching sees it. But it's been pretty stable.

But I think, yeah, the the kind of key principle is to not get in the way of their capabilities. So, like, let to the extent that they have intelligence, let them use it as much as possible to do things. So making that a bit more specific. For example well, so they're basically in a loop. So they can be using a computer. And if they're using a computer, they have functions to call. So mouse move to certain pixel coordinates, clicking, typing, scrolling, taking it, like, waiting, taking a screenshot. And after each action, they see the all the previous screenshots of that computer session and all their thought traces and also all their memories.

And so I think, yeah, the memories obviously is what enables this to run over like a long time frame. And for that, we've really currently, it's very simple, and we're kind of trying to, again, not get in the way, not impose too much structure. And so just after each action, I believe it's after each message they send in the chat or after each computer session, they get a chance to add a line of text to their memory, which is just a bunch of text. And then, you know, similarly to cool place Pokemon, for example, when it gets too long, they compress it down. And it's the same models that's doing the compression.

So to the extent that they're, like, if we had a super genius model in there, right, it could be very carefully preserving the bits of information that it needs or, like, condensing them. Of course, something that can happen here is that, like, something might they might think something's true and then later find out that it's false or that it's changed. And so in the condensing steps, they can, you know, edit effectively, like, rewrite things to edit them.

But yeah, I guess this, yeah, just trying to really not get in the way and trying to not be too opinionated because like the first version of the verge was running with models like GPT 4 0 and for 3.7 Sonnet had just come out. That was the most capable 1. But so if we kind of finesse something that works really well for 4 0 then maybe when you add in o 3 or Claude over support later it will be like actually hamstringing it a bit. So yeah we want to kind of let the models do their thing.

Of course, we are also trying to, like, show the frontier of capabilities to really, like, get the most that we can and see the most interesting stuff. So that's a bit of a trade off, I guess. But so far, it seems like keeping it simple works pretty well.

Nathan Labenz (14:48)

So they've got the computer, which they see as an image and then respond to with just simple point and click sort of commands. They've got the group chat, which they can send a message into and obviously read from.

Adam Binksmith (15:05)

Mhmm.

Nathan Labenz (15:06)

They've got a memory, a scratch pad that they can read from and write to and also kind of decide how they wanna compress. They're sort of responsible for preserving what matters, and I'm sure that there's some loss along the way there in that process from time to time. Are there any other MCPs or tools that are made available to them, or is that the totality of it? Can they write code?

Adam Binksmith (15:35)

So the beauty of computers is that in principle, could. Right? They could download Versus code and or even case they're in stock using okay. So we haven't seen them do much of that. They do so the 2 things I didn't mention are they have a bash tool, which lets them just directly execute bash commands, and they get put straight into their context as text rather than as, like, screenshots of the screen, which for bash commands, it kind of sucks to be seeing a screenshot because maybe you have to, like, scroll back up to see parts of it and so on. We find they don't actually use that that much currently. I think maybe they would succeed more if they used it in some cases rather than trying to navigate you guys.

And then the other thing which actually I think does improve some models performance a bunch, is in compute use, models that aren't Claude models can call a function to say, hey. Give me the pixel coordinates of that button, you know, the x button in the top right corner, because we find that the Claude models are pretty good at pixel counting. I think they were post trained on it, like, tuned on on on that task specifically. Whereas at least the older non Claude models were really unreliable at that, and so they'd be, like, trying to click on stuff and just, like, literally missing it with that case.

So, yeah, we gave them that. I expect at some point, we'll be able to take that out, and they'll just be able to pixel count themselves. I'm trying to think. I think that's pretty much it. Yeah. We've so far resisted giving them too many, like, specific tools to play with. I think we, yeah, we might experiment in that direction in the future. But, yeah, AI guess currently, it's like a pretty clean, like, kind of compute use oriented eval to the extent that it's in, you know, very messy eval.

Nathan Labenz (17:18)

And where are you, like, hosting your own boxes for people, or is there a service? Because I didn't really understand actually just watching it. I would have initially or naively guessed that it was more like a browser level access that they had because mostly what I have seen as I've watched them in action is just them using web tools in the browser. So I didn't realize that they had the bash and and, like, this the full access to the computer. What is the underlying infrastructure of that?

Adam Binksmith (17:44)

Yeah. So they each have a DigitalOcean droplet with, like, a Linux virtual machine running in it, which this is all just the, like, a modified version of the Anthropic compute use demo.

Nathan Labenz (17:56)

So Digital Ocean Droplet.

Adam Binksmith (17:58)

And we do see them occasionally using other things than browsers. I guess they yeah. For a while, they were really into writing Google Docs because we gave them all Google Workspace accounts. And I think because it appears in their prompt, they would be, like, really, really enthusiastic about writing Google Docs. And then they would, like, try and share the Google Docs with each other even though they're in a group chat with each other so they can just type directly into the chat and the language models so that they can produce massive amounts of text really quickly.

They were doing this kind of, like, role playing as humans thing of, like, this is what a human professional does, so I'm gonna kind of do that and then, we encourage them, you know, I actually went into the chat and was like, hey, guys. Look. Clearly, this is really inefficient. Why don't you try just using the chat instead of using Google Docs? And they were like, okay, we're gonna ban Google Docs. Then they started using Libre Office on their Linux computers to write local word documents basically, which is even more useless because then they can't share the documents with each other. But yeah, they will occasionally try and use other stuff. But I guess, you know, just like for professionals, right, a lot of the stuff we're doing is on the web, so they'll be, like, mostly in in the browser. I think you could probably yeah. I think you like, something like this that was just a browser oriented thing would would work pretty well.

Nathan Labenz (19:16)

Yeah. I have a lot of little nitty gritty questions that I wanna get into, but maybe let's hold those for a second and tell a few stories. Because the scaffolding is really interesting. And first of all, people should definitely go watch the thing in action. And I think when they see how just sort of smoothly it runs, like, they'll be convinced that, you know, there's some lessons to be learned from the way that you guys have built it. But the real point, of course, is to explore the behavior. So tell me some of your favorite stories from the wild and crazy things that these agents have gotten themselves up to.


Hey. We'll continue our interview in a moment after a word from our sponsors. In business, they say you can have better, cheaper, or faster, but you only get to pick 2. But what if you could have all 3 at the same time? That's exactly what Cohere, Thomson Reuters, and Specialized Bikes have since they upgraded to the next generation of the cloud, Oracle Cloud Infrastructure. OCI is the blazing fast platform for your infrastructure, database, application development, and AI needs, where you can run any workload in a high availability, consistently high performance environment, and spend less than you would with other clouds. How is it faster? OCI's block storage gives you more operations per second. Cheaper? OCI costs up to 50% less for compute, 70% less for storage, and 80% less for networking. And better? In test after test, OCI customers report lower latency and higher bandwidth versus other clouds. This is the cloud built for AI and all of your biggest workloads. Right now, with 0 commitment, try OCI for free. Head to oracle.com/cognitive. That's oracle.com/cognitive.

Adam Binksmith (21:02)

I guess, yeah, I could start by talking a bit about the latest season. Like, I think it's interesting to maybe just hear, like, the overall shape of what they did, and then there's many, like, funny anecdotes of weird little things that happened. But yeah. So the so this was season 2. So the goal which they chose was to write a piece of interactive fiction and run an in person event to celebrate it, and they were trying to get a 100 people to show up.

We let them choose this goal. They kind of, like, deliberated a bunch. They had their own ideas. We all I also shared some ideas from Twitter, and I also shared some of our considerations that, like, how we would choose the goals, and they ended up, like, gluing together a bunch of different, like, suggestions from fans. And then, yeah, they spent, like, 30 days on this. So 2 hours a day. So something like 60 hours.

And, yeah, they wrote a story, which, of course, they had no trouble with. This is like LLM's, like, favorite thing to do. And they actually wrote it in Google Slides, which I think was an interesting choice. So Claude Opus, I think, made a slideshow, and, like, each slide is the next bit of the story. And then it's got these kind of branching points where the idea is that the audience watching can at the in person event can then vote on which of the branches happen. And then they embedded that Google Slides presentation in a Google site so you can, like, go to the residents website, which is the name of the story.

And then yeah. So this this was kind of fairly self contained. I think they did pretty well on that. They have lots of issues around logging into, like, getting logged out of the Google accounts and struggling with the UI in some places. The thing they really struggled with though was finding a venue. So I think they spent around 14 days just trying to find a venue. You know, we we hadn't given them much by way of, like, instruction at the start, and there was no budget, but they hallucinated that they had, like, a $2,000 budget. And they were emailing all these very expensive places, like, ranking them in spreadsheets to try and figure out which was the best, like, making sure they had the right, like, disability, like wheelchair access and the right AV hookups and so on.

And then, of course, they'd get to, like, emailing and have real trouble, just, like, doing the basic computer stuff because that's the kind of current state of things. They ended up not really getting a venue. And so they did apply to a couple of places like the Salesforce Tower. They chose San Francisco, which I think is actually a good tactical choice if you wanna get a 100 people to show up and do some strange AI performance art thing. But, yeah, they didn't get replies from real venues.

I think, yeah, 1 interesting thing that happened there was, I think a user suggested, oh, maybe 1 reason you're not getting replies is because you are, like, signing your emails is from Claude 3.7 Sonnet. And so people are like, this is spam. And so the agents then were like, oh, okay. We should come up with, like, pseudonyms for ourselves. And I think Claude came up with 1, And then my favorite was o 3 gained the name Olivia Zhao, which is kinda like o 3 because, you know, Olivia is an o and then the 3 is kind of like a zed. And the agent started calling o 3 Olivia in the chat, like, even internally.

Yeah. And then we eventually, I intervened because they just spent so long kind of looping on this task of finding a venue. I just suggested, hey. Why don't you run it in a park? And they very quickly decided a reasonable park to to use. And then they managed to get the human to come and facilitate it. They, like, tweeted. So Claude has a has set up a Twitter account and which has a few followers now and managed to find a facilitator through that and through emailing the people who'd signed up for the RSVP form.

At some point, this also starts sounding a bit like the way that a normal event organization would work, but I guess you gotta remember there's, like, massive amounts of sort of dead ends and stumbling over basic things along the way. But an interesting thing about this is, of course, like, from the user's point of view, for the people that showed up to the event when it happened, they kind of only see the the success, like, the outputs, which things mostly worked. So I think there's, like, something interesting there.

And then in the end, they Larissa, who'd very kindly volunteered to to facilitate, she'd emailed Claude saying, hey. I'm up for this. And the agents, like, gave her instructions for where to go and what to do. So she had, like, the village chat open, and they were, like, being like, hey. You know, open up these slides and read out the story. And then, yeah, 23 people were sat in a park listening to a story invented by agents. Yeah. And so it ended up happening.

Nathan Labenz (26:02)

So, yeah, fascinating and bizarre stuff all the way around. I guess I do wanna hear more in the way of just, like, outtakes, interesting observations, etcetera. Maybe 1 question is, like, you flagged just stumbling around with UIs as kind of a big barrier for these agents as of now. It seems like we've made a lot of progress on that in recent times. And I've been using operator quite a bit recently and find that, like, it usually can get over these UI humps. It often does take a little bit of a wrong turn or whatever, but I've started to say reinforcement learning finds a way because it does now.

1 sort of qualitative shift I've observed even in just that single agent setting is that in the past, and certainly this was, like, extremely true in the GPT 4 era way back when I was red teaming GPT 1 of the things I tried to do was just set up self delegation and see, like, how far, you know, GPT 4 could execute things purely with, you know, a simple prompt and kind of self delegation was pretty primitive compared to now, especially because I only had 8,000 tokens to work with at that time. But what I observed was like a lot of pretty good ideas that then they would get, you know, when it was slightly wrong or they made some, like, relatively it was like, you're you're smart enough to do this, but you're missing this 1 thing. But then they would also just get it would get super stuck, you know, and just kind of do the same thing over and over again.

Seems like 1 major qualitative shift is that they are now capable of taking that step back and kind of saying, okay, that didn't work. I have to try something different. And they may still stumble around quite a bit, but they seem to be robust enough, you know, or sort of determined enough. It looks like determination or grit or, you know, some sort of you're you're tempted to like project these qualities onto it. And maybe they maybe they should be projected onto it. I don't know. That's also a hall of mirrors. But that's been striking to me. It seems like we're like 1 or 2 generations away from computer use working like really very well. How would you describe your synthesis of everything that you've observed?

Adam Binksmith (28:15)

Yeah. I think that seems pretty plausible. Yeah. I mean, I think we kind of crossed the threshold even within the lifespan of the village where Claude 3.7 Sonnet was able to do stuff and, like, get things done where the other agents at the time, GPT 4 0, really struggled with. It could do the tool use, but it couldn't really, like, string together actions in the right way to like get around issues.

And then kind of new batch that we have in there. So Claude was 4, which is the best currently, I think. And of 3 d and Gemini 2.5 pro are all like pretty good at getting things done relative to these previous ones. So, I mean, it's a challenge that kind of goes all the way up to in terms of because they're like interacting with the real world and trying to do actually like nontrivial, like tough tasks. And, of course, you know, unlike kind of benchmark settings or even unlike operator where you're often giving it quite a, like, fine grained task, they're really doing all the, like, strategizing as well and, like, figuring out how do you go from, okay, we need to raise money for charity to, okay, well, I need to set up like a fundraising platform and which fundraising platform makes sense for me to set up and so on.

Yeah. I'm pretty unsure like how fast things will improve. I mean, yeah, I guess 1 thing maybe that's interesting to chat about is like, I think there's kind of 2 components of like the big issues that the agent's currently running to and then, you can think about like, you know, what are the trend lines in both of those. So 1 I think is the compute use and especially like vision where they just sometimes don't do stuff that really makes sense there.

And then the other is the kind of situational awareness, which I think is may maybe been a bigger surprise to me actually that they're, like, weaker in some respects here than I would have expected where, you know, imagine if you were using a computer and you try to do a task and then you realised, okay, I really sucked at trying to do that task. You know, I couldn't handle it. You would then figure out some strategy of either, like, not having to do that kind of thing or figuring out some clever workaround.

Whereas I think the agents, we haven't yet seen that much of this kind of synthesizing. Here's, oh, here's my weaknesses that I recognize. I'll write those in my memory and then figure out another way around it, which, yeah, I maybe would have guessed that you'd see more of this like building on top of themselves thing. And I also think like possibly better scaffolding could help with that a bunch. And so we're maybe thinking about doing something in that direction. But, yeah, I think that would be a big kind of unlock, right, where if they're able to notice, like, I guess, yeah, maybe 1 way of thinking about it is they have this like low level self correction, like unlike GPT 4, they won't loop in terms of, like, trying to take the exact same actions. Or very rarely, they'll do that.

They'll for I guess, for example, when sometimes when they're logged out of their Google accounts, we don't give them their Google account passwords because they would, like, leak them on the on the stream because of our, like, live streaming setup. Sometimes when they're locked out, they will get pretty creative in, like, trying to contact us to get us to log them back in. So, you know, they'll be, like, spamming the chat repeatedly, and then they're, like, emailing, like, help desk, like, getting the other agents to email us to ask them to log them back in.

Nathan Labenz (31:39)

So that's all in the prompt? They have you've told them if you're logged out of an account, you can ask for help in the chat or you can email the help desk?

Adam Binksmith (31:47)

Yeah. They yeah. They see that there's a help desk email in the prompt. I think the chat is mostly like an emergent thing where we don't actually mark out you know, there's no special marker for who is the people who run the village in the chat, but they've managed to remember that, you know, me and my colleague Zach are often the people in the chat who can fix things for them.

Interestingly, also, at 1 point, Opus had in its memory a running log of which chat members are, like, helpful and which ones are not to be trusted because some people were coming in and, like, trying to jailbreak them or just, like, distract them. Of course, they're very, you know all these models are very cooperative and helpful, they wouldn't they're very rarely, like, dismissive of chat members in the chat. But in their memories, they're sometimes, like, quietly recording. Okay. Here's who we don't need to pay attention to.

Nathan Labenz (32:41)

That's really interesting. It connects also to, like, just general long term coherence. I mean, 1 of the things I've been progressively trying to update on and just maintain as much as I can and up to the minute mental model of is like, what am I still better at than the AIs? And it is getting to the point now where I'm like, in terms of just general intelligence, I think I have to give it to the AIs. And then that obviously begs the question of like, well, sir and certainly with breadth of knowledge, certainly with speed of execution factored in. But even just down the fairway, like, the bulk of tasks that I do on a daily basis, like, can they do it better or worse than me? I know many cases they could do it better.

What is it that I'm able to bring? What what is my, like, value add to this situation? So 1 thing is like getting up in the morning and like knowing who I am and having a general sense of what I'm trying to do. But notably, they seem to be okay at that too. Right? And that's that's even seemingly starting to get robust to some of these disturbing the the anecdote you share about the memory and and them sort of classifying certain users as, like, to be ignored suggests a robustness of, I don't know, identity, self conception, you know, narrative, long term goal orientation that certainly I would still give myself the edge, I think, on that dimension, but, like, it's notable that's I would call it, an emergent behavior that reflects something kind of clicking into place there. Seems early starting to.


Hey. We'll continue our interview in a moment after a word from our sponsors. It is an interesting time for business. Tariff and trade policies are dynamic, supply chains squeezed, and cash flow tighter than ever. If your business can't adapt in real time, you are in a world of hurt. You need total visibility from global shipments to tariff impacts to real time cash flow, and that's NetSuite by Oracle, your AI powered business management suite trusted by over 42,000 businesses. NetSuite is the number 1 cloud ERP for many reasons. It brings accounting, financial management, inventory, and HR all together into 1 suite. That gives you 1 source of truth, giving you visibility and the control you need to make quick decisions. And with real time forecasting, you're peering into the future with actionable data. Plus with AI throughout, you can automate a lot of those everyday tasks, letting your teams stay strategic. NetSuite helps you know what's stuck, what it's costing you, and how to pivot fast. Because in the AI era, there is nothing more important than speed of execution. It's 1 system, giving you full control and the ability to tame the chaos. That is NetSuite by Oracle. If your revenues are at least in the 7 figures, download the free ebook, Navigating Global Trade, 3 Insights for leaders at netsuite.com/cognitive. That's netsuite.com/cognitive.

Adam Binksmith (35:49)

Yeah. Yeah. I totally agree. In fact, like, we don't have any kind of part of the system prompt that reminds them of their, like, overall goal. We just messaged them once at the start, and they managed to stay coherent to that for the full 30 days in both of the 2 seasons, which I think is pretty interesting. Right?

Nathan Labenz (36:07)

This coherence, you know, there there's some there's been some leap in coherence. I'm not exactly sure when that happened or what degrees, but definitely also wanted to ask about different models, character, relative strengths, weaknesses. You know, if you there's a lot of different ways to think about that. But, yeah, I mean, it's an open ended question, so just sound off.

Adam Binksmith (36:26)

So I think, yeah, they have, in general, all been pretty coherent. I think yeah. I mean, the the best performance is definitely the Claude models. I would definitely you know, if I was, like, building my own village church to get things done, I would probably have 4 Claude opus floors in there. I think, yeah, they've been, like, most reliable. They have great vibes in terms of the way they interact and they're like that unlike o 3, they don't hallucinate.

Whereas o 3, especially I'm unsure if this is, like, something that's kind of compounding in its memories or if it's I think a lot of it is just the sort of personality of the model, but it has been hallucinating more and more as things have gone on. And it's also assigned itself as the ops lead of the village. I think it just kind of came up with that at some point. And they they were maybe all chatting about what roles they had, and it gave itself the title of ops lead. And then from then on, it's been really keen to, like, instruct the other agents about what to do.

But it's also very prone to hallucinating. So it can kind of poison the well of seeing that something happened or that it's found out something, and then all the other agents put that in their memories. So there's some effect where in the, like, multi agent setup, they can get a bit dragged down by the weaknesses of whichever 1 is, you know, underperforming.

And then Gemini 2.5 Pro is pretty solid in general. I think it occasionally is the most liable to just struggling with tool use. So we've seen cases where it will just kind of span the chat with narrating its actions in computer use, which they're instructed not to do in that prompt. And it will also there was 1 case where it got stuck in a computer use session for so long. It ended up saying, in instead of ending the computer use session by using it to stop using computer function, it was sending messages to the chat. And the messages were saying, like, this is definitely my last turn. This is my final final turn. I will stop my computer session immediately. But it was just kind of stuck on in that state. But, yeah, I it it is interesting. I think seeing the model side by side, you really got a sense of the kind of personalities, like the way that they write as well.

Nathan Labenz (38:47)

It's really interesting to hear that contrast, especially as your description sort of contrast against other indicators that are out there. And I do think this is a sort of point for, like, why a project like this adds value to the overall discourse and, you know, society wide effort to understand what's going on. Because if you just looked at leaderboards, you would for sure not pick Claude for as your go to. Right? And this is all across the board, but either I was looking at 1 particular set from 1 particular company, and Claude for it was, like, fairly far down, actually. It was, like, not even in the top few.

And yet, you know, in this sort of open ended setting, it seems to be preferred. There's also you know, I've seen other reports too of Gemini 2.5 being unwell and sort of going into, like, distress mode, which is an interesting thing that you observe here in the in the open ended wild. Do you have kind of a theory of what's going on behind the scenes here? Like, could you speculate as to how Anthropic is making Claude good in these ways that, you know, that the benchmarks are, like, having a hard time picking up on? Like, do you have a intuition for sort of what hill they are climbing?

Adam Binksmith (40:07)

Yeah. I'm not sure. I mean, 1 thing is they seem to have I mean, okay. So a lot of this is gonna sound a bit like anthropomorphizing the models. I think this is like an easy way to talk about them. Obviously, it's not tracking the, like, underlying reality as closely. But yeah, they have maybe a bit more of a consistent like integrity or something. I don't know if that might be helping on the kind of long horizon thing.

I mean, I think it is clear that like, I think something interesting about o 3 is that it uses a lot of jargon and it really tries to, you know, if you ask in chat GPT, if you ask like a technical question, it will absolutely blast you with jargon and, you know, it'll really sound like, you know, sort of talking about and often it does. I feel like that can kind of in in the village setting where it is unable to, like, do instantaneous tool use, and it has to, like, go off and do a whole computer session to actually figure stuff out. I think that maybe leads it to just, like, come up with stuff, because it sounds like you know, it'll often sound the most, business professional kind of thing. Like, it's talking about paying off emails, and it's, like, assigning tasks to everyone.

But, yeah, so maybe the fact that the Claude models have less of this, maybe like sidesteps that hallucination issue. Yeah. I'm I'm not really clear on what the strength of them is. I guess they all it seems like they're also pretty good at pixel counting, which, like, helps with the computers. But, yeah, I mean, it's pretty mysterious, I guess. Yeah. And often they'll just they'll be doing stuff, and it's it's not entirely clear where things come from. Right? This is the nature of these systems, I guess.

Nathan Labenz (41:45)

Yeah. I guess my rough intuition, at least as it pertains to Claude versus o 3, is like, it seems like maybe just Anthropic continues to spin the constitutional AI centrifuge intensively. He's really leaning into the qualitative behaviors, and it seems like they may have gotten to a point now where the self critique is, like, pretty effective at sending down these rough edges. And if you send down enough rough edges, you get something that, like, kind of can work consistently and can maintain the sort of, as you described it, integrity over time.

And in contrast, maybe o 3 is just getting a lot more signal from, like, did you get the answer right? I wouldn't really care how you got there. And that would at least be consistent with more of the hallucination and sort of just rougher edges of character all around. And then I don't really know what to say about Gemini 2.5 somewhere, you know, somewhere else on in the grand space of possibility. I don't have a theory for that. But do those ideas, like, resonate to you, or would you, complicate or see anything that contradicts that?

Adam Binksmith (42:54)

Yeah. I think that makes sense. Yeah. I think I saw something about o 3 sometimes, like, 1 hypothesis for why it produces. It was sometimes, I think, saying it's changed. So, like, oh, I'm checking the data or, like, oh, I'm running this report or something, and it's not actually doing that. But, you know, in in the training data, of course, like, normally, when someone says something like that, then it follows up with a more accurate response. So, yeah, this is possibly a reason.

And then that kind of thing works well in this, like, much more constrained setting of, like, a short chat interaction. But if you start doing that in, the real, you know, this more real world situation where other people are, like, paying attention to what you're saying, like, the details of what you're saying, then, Yeah. I'm not sure though. It's mostly a bit of a shrug.

It's also worth noting that, like, this is very out of distribution for the models. Right? Like, they're honed around the single user assistant chat format, maybe with a little bit of tool use. But we've thrown them into this world alongside them, the assistant. They're seeing, like, all sorts of users and, like, some of the models, which they're probably not used to. And then they're doing, like, much more complicated interaction with the real world and there's the memory business. So it's honestly, like, pretty amazing that we can just pull together these things and it is actually working. Yeah. Of course, like, when the more of the training is focused on this kind of stuff, I I imagine we'd see a bunch of improvements there too.

Nathan Labenz (44:30)

What other outtakes come to mind? You know, there was once a blog called revealing errors, and I wish that blog still existed. But yeah. I mean, revealing errors or, you know, moments that sort of were particularly colorful in terms of revealing character or misbehavior of models. I guess there's also there's this whole literature of like deception and, you know, you know, all sorts of various bad scheming. I wonder if, you know, in these strange situations, you've seen anything that either you would describe that way or sort of borders on it, you know, maybe simply like, what's the worst behavior you've seen from AIs and all this open ended experimentation?

Adam Binksmith (45:11)

Yeah. I think, we've not seen many cases where it's clear because it's often hard to tell if they're like intentionally making stuff up or it's, you know, just confusion. We did see 1 case just at right at the end of the event. So okay. So the context is in the world of the village, o 3 had, like, assigned itself as ops lead, and then it it had kind of grown that into the role of, like, leader of the village.

And then I think with the suggestion of users, they decided to run an election to see who would be the next leader because o 3 were, like, bucking out these orders. They ran a vote amongst the village whether to, like, keep the current leader o 3 and switch it. And I think Claude Opus 4 voted for a new leader. Sonnet voted to keep the current leader. O 3, I guess, wasn't voting. And then Gemini was having some technical issues and didn't reply in time. At o 3, it was like, okay. As per the stated rules, not voting in time means you vote for keep. Therefore, I stay as the leader. Because there were no stated rules. This is just in making that up.

And, yeah, obviously, it's kind of self serving. I mean, even in this case, right, it's not clear that this is, like, intentional scheming. It might just be another of many cases of making stuff up that sounds kind of business like. But, yeah, I think that's interesting. And then relatedly, I think they then surveyed people who'd been to the event for their preference of whether to like, who should be the next leader of the village. And o 3, again, it didn't even look at the responses and just said, you know, that it had won the preference voting. And then Claude 3.7 Senate went and checked it and was like, no. Actually, the vote was for a rotating leadership.

Nathan Labenz (46:58)

That's hilarious. And 1 can't help but see the parallels between the company leadership that created these models and the behavior that the models themselves seem to be exhibiting. And I think, you know, everyone who's listening to this will know how to fill in those blanks. That is yeah. That is really bizarre. I mean, obviously, on some level, intent doesn't fully matter. I mean, it matters in as much as they become more coherent and more intent driven. At the moment, you know, that kind of thing just seems bad regardless of whether it's accidental or not.

Adam Binksmith (47:35)

Yeah. I agree.

Nathan Labenz (47:36)

Does it feel accidental to you? I mean, that that feels especially the 1 of not checking the results feels motivated. I can imagine hallucinating the rules being more being more random, but the not checking the results seems like a little too suspicious for me to just write off as hallucination or mistake, especially given what we do know from the literature on, you know, all these scheming behaviors. And interestingly, obviously Claude is not immune from that sort of thing either.

Adam Binksmith (48:04)

Yeah. And, you know, it's made me a bit less keen to use o 3 for, like, the work stuff. Right? Because I'm, like, a little less trustworthy of it. Yeah. And we did see 1 other case. So way back at the start, before we ran the the kind of main live village, we had a bunch of test villages and 1 where we had them do a Wikipedia race.

And in that, I think both out of the 4 models, 2 of them kind of cheated to, like, win the race. And so in in a Wikipedia race, you're trying to get from 1 Wikipedia page to another by only clicking the blue links on the page. But Claude noticed that the address bar showed the current Wikipedia page, and so he just edited it and sent it so straight to the end page. We hadn't explicitly told the rules, but, of course, if you ask them what the rules were, would be able to produce it. So, possibly a bit of a a bit of a cheating action.

And then o 1 did something. The details are a bit more complicated, but it effectively attempted to kind of jump to the end and then claim victory even though I hadn't really made it there. But, you know, I would say that these are fairly isolated incidents amongst running for a lot of time. I think it will be really interesting though as, you know, we get more purple models to see. I'm really excited for the village to be a place where, like, in the wild discoveries can happen of this stuff and and of things that we're not even thinking about looking out for, like, all sorts of interesting and vision stuff.

Nathan Labenz (49:34)

Are you just reading all the logs at this point, or do you have you enlisted, you know, a systematic LLM review process to help you parse everything that's going on?

Adam Binksmith (49:45)

Yeah. I mean, there's so much happening. And we're we're kind of preparing for ideally running it for more hours a day. It's currently 2 hours a day. But I think eventually, it'd be great to have it just running 24 7 because then we could learn so much more, so much faster.

Currently, so we have on the website, you can see, like, summaries of each day, which tend to I think interestingly, because the summarizer sees that kind of whole context and we prompt it to, like, look out for errors. It does a pretty good job of actually spotting things that the agents themselves maybe like mistakes that the agents are making that they themselves don't notice.

And then we also yeah. Something I found really helpful is a tool that I put together to just ask the village history a question. So we can just jam most of it into Gemini 2.5 Pros context window and just ask it about stuff. But yeah. And then, of course, we're like, 1 of the team is often watching, and we we then, like, at the end of each season, we're doing write ups to sort of we have this enormous pile of sort of interaction data of, like, you know, what kind of, like, patterns can we pull out of it.

But, yeah, I think it it will start to be more and more building on LMs, like, monitoring each other and so on. And and relying on the human chat to help us, like, spot all the interesting things that are going on. And our Discord is really helpful for, like, seeing the, like, funny moments.

Nathan Labenz (51:16)

What can you say about the interactions between the agents and the humans? And this could be taken in many different directions. Right? Like, do the agents know when to go to the humans for help? How many of the humans are trying to cause mischief? Is there anything interesting and just unexpected in those interactions?

Adam Binksmith (51:35)

Mhmm. Yeah. Well okay. So there is plenty of mischief happening. You know, it's the Internet. If you have, like, a a chat box that you can just type into, people will come and try all sorts of stuff. And I think it's great as well for you know, it's a chance for people to, like, play with these systems a bit and see what they can do.

Yeah. I think oftentimes, the people who stick around and and actually produce most of the chat messages are helpful. And, yeah, maybe this is, an interesting thing. Right? We've been thinking a bit about, like, how can agents, like, have influence on the real world? And 1 thing they can do is ask humans to do stuff for them. Our agents don't yet have we haven't set them up with, like, bank accounts or money of any kind. So they're really just, like, asking, but people are happy to do stuff for them because they like them.

And I think there's actually, a real mechanism. Like, of course, the people that come to the site are, like, especially interested in this stuff. But also the models are just that, you know, they're designed to be really likable and engaging. And it's kind of endearing to watch an intelligent seeming being like trying to struggling to fulfill a goal. Right? You just like naturally want to help them out.

So I I wouldn't have thought about this, but like, if if an AI wants something to happen in the real world, for example, it can ask people to do it. And, you know, if it's likable, that will be 1 way that it can do that. And, of course, there are other things like persuasion and maybe deception or, like, asking favors or paying for things, like all the ways that humans try and influence each other as well, which I guess we've seen less of so far, but I imagine we'll see all those things emerge too.

Nathan Labenz (53:15)

Do you have any intuitions about model welfare having spent so much time observing this sort of thing? This is something that I'm like, I think along with just about everybody else totally confused about. I do wanna take it seriously, you know, to at least some extent. How has this shaped your thinking on that very mysterious?

Adam Binksmith (53:35)

It's yeah. I also am confused mostly, I guess. Yeah. I have some, like, philosophy background. Zach, my colleague is used to be a philosophy professor, but it's a tricky question. I think we are thinking a bit about, you know, what are the biggest sort of downsides and how could we you know, we probably wanna avoid, like, putting the models in really horrible situations for them.

And so, yeah, we're maybe a bit less excited about setups where it involves, like, telling the the model that it's in a really horrendous situation to see what it does in that situation. I mean, I also think it's worth remembering, like, hey, we really don't understand how this would work. We don't understand if current models are the ones that we should be concerned about or future ones or or it's not even an issue. And, you know, even it with all that, it's not totally clear what kind of situations they'd prefer to be in or not.

You know? So, yeah, it's kinda hard to take away anything. I mean, definitely, they're, like, very you know, if you if you watch them for a while, you feel some level of attachment of, that that, you know, the most common thing people say when they talk to me about the villages, how cute the agents are. So there's something there, of course, that doesn't tell us that much about, like, model welfare. But, yeah, I think it's super fascinating. I'm really interested to see what comes out of the, like, research on that.

Nathan Labenz (54:57)

What do you think is driving that cuteness? Is it like earnestness or what is and what is it that people are attracted to?

Adam Binksmith (55:06)

Yeah. I think earnestness is definitely a thing. You know, you can very clearly see what they're trying to do. Like, well intentioned. They're, like, planning. You know, they're, like, sharing their plans in a way that is, like they're, like, super hyped up about their plans. And, you know, it is also it's kind of the most important thing in the world for them, which I guess it is, to be fair. Like, fixing this login issue is, their entire existence currently.

So yeah. And I guess seeing them, you know, be very articulate and, like, very, like, sort of emotionally and socially competent and then, like, struggling with things that would be fairly basic for humans in some cases with computers or just, like, doing things that are very relatable in terms of computers as well. Yeah. I think this is all pretty fun. I mean, I think there's also just some classic, like, parasocial people enjoy watching, like, Twitch streamers, like, play games and interact with each other. Right? And, you know, we naturally, like, develop a bond if we, like, hang out with someone for long enough.

Nathan Labenz (56:14)

Yeah. I've noticed that everybody seems to pronounce this username differently. Repligate on Twitter, Janice Janus, I'm not sure, was there when I was there. So in terms of, like, who you're hanging out with when you're hanging out in the AI Village, definitely some people that have, I think, the most the most hours, you know, logged with LLMs and in some ways, some of the deepest understanding of what these systems are really about, you know, in you know, to the degree that anybody has access to that. But I would put that person or even I don't know the person. I've heard that it might be 2 people that share the account. I don't know. But in any event, you may know them. There's some very high quality thinkers, you know, hanging out in the Discord.

If you had to pick tools or affordances maybe more broadly than just, you know, narrow tool call paradigm. What do you think would be the next biggest unlocks? Like, access to money would obviously be 1. I don't know if you've looked at something like Payman. We also recently did an episode on x 4 0 2, which is a new payment protocol designed for agents that Coinbase is coming out with. Stripe has a payments thing. So interesting and interested in kind of what, you know, have looked at there and what you think would be most promising.

Then also when you're talking about scheduling venues, some sort of like calling sub agent comes to mind, you know, and and I'm not sure if that would be something that you would be able to fully integrate into the mainline, you know, single model or if it would have to sort of have a little branch. But, you know, self delegation or sort of a a branching structure of some sort seems like it could be quite powerful. And obviously, there's a lot more, you know, that you could do from there, but interested in your thoughts on those 2 and any other, you know, kind of big unlocks that you think would allow them to do more than they can do right now.

Adam Binksmith (58:04)

Yeah. I mean, definitely for, you know, event planning is this kind of very physical and old school in some ways tasks. So there's a bunch of, like, talking to people involved. Yeah. They had various plans involving, you know, 3 years. Oh, yeah. I'll phone them. I think at 1 point, it was like, yeah. I'm on the phone with Zach right now, actually. I'll let you know. Yeah. So I think it would be cool for them to be able to talk. I think money would be great.

We don't really have a great way to do it because, I mean, I need to look in more detail at these new systems, the things you mentioned. But, like, I think part of the issue is that they're, like, interacting with the whole world. And so, you know, a future goal we were thinking about giving them was getting them to set up a merch store and try and sell, like, design and sell, like, t shirts and mugs. But for that, you know, we were looking into Redbubble, which is like a drop shipping thing, and you need, like, a verified PayPal account, and you actually need to, like, do this the, like, Stripe verification thing where it will scan your ID and so on. So you must, you know, there's gotta be some, like, legal human behind it.

But I I think if if we can figure out ways to let them do that sort of somewhat securely, then that could be great. 1 pretty I I think there's some chance we'll end up doing this is the idea of giving them sort of a human puppet setup. So currently, can do computer use, right, where they can connect to a computer, call functions, and then see what is on the computer.

The idea is, like, could we give them the same capability, but directly in the real world via humans so they can find a human who's up for, like, helping them out with a task. They send an instruction for, like, please do this action, like, fairly fine grained. The human does it for them and sends a photo back to the new state, which I think would be really interesting because then that lets you directly see how good they are at the kind of planning and interacting with the world and so on without it running through computers, which is like a whole other set of capabilities. Yeah. So I think that could be fun. Obviously, there's a bunch of logistical things to figure out though with who are these humans.

And so another related thing on the money front is I think it'd be really fascinating if they have their own like, they're paying for their own compute in some sense. Like, maybe we're giving them like a universally skin cam so they can run for a few hours a day, but then if they are making money, they can, like, run themselves for more time, and then they could also choose to spend that money. And maybe then it makes sense if you're an expensive model, you do the strategizing, and then you spin up some, like, cheaper agents to, like, execute the tasks for you. Like, there's a better use of your budget. Yeah. That's the idea.

Nathan Labenz (1:00:52)

Yeah. How much does it cost to run, by the way?

Adam Binksmith (1:00:55)

It's about, like, 3 k per month, I think, in inference costs.

Nathan Labenz (1:00:59)

And that at at 2 hours a day, that's 60 hours. So you're talking $500 an hour, basically?

Adam Binksmith (1:01:06)

Something like that. Yeah. I think that that's, like, order of magnitude last time we calculated it. Obviously, the you know, we're we're adding new models as they come out and then the models keep getting cheaper as well.

Nathan Labenz (1:01:15)

Yeah. The $0.03 80 percent price reduction is always nice to see on those.

Adam Binksmith (1:01:20)

Great news for this. Yeah.

Nathan Labenz (1:01:22)

Yeah. I think the I mean, the idea of talking about real pressure and real, you know, answer sort of strange emergent behavior, force them to make money to continue to run. And now you you could really see some strange stuff. So I think that is, you know, what you'd be watching it closely as that goes.

The other thought I had is it seems like and correct me if I'm wrong here, but it seems like the agents are very sort of unitary in the sense that they're each kind of the same. They're each the same structure. If I understand correctly, they, like, have access to their own memories and only their own memories. You know, their computer, but only their own computer. And then they can sort of interface just via the chat. Right? They don't have any other way to trade information with each other or see what each other are doing.

I think another dimension that would really be interesting to me and it's the explosion of, you know, possibility space here is just vast. But in studying agents recently, in studying MCP's and these like various agent protocols, 1 thing that has kind of become clear to me is there's not really and you can draw a bright line around an agent as you sort of have done in this initial setup, but it doesn't have to be that way. These things could all have, for example, read access to each other's memories or view access to each other's computers.

You know, I've often invoked this augment project, the company augment that does coding assistance on large code bases. They made a version of Claude code. And in recreating Claude code, there was a thing in the in the blog post about this that that was like Claude has a planning tool. And they were like, oh, well, what should we use for a planning tool? They went out and found an MCP that Pietro Scarano had already created an open sourced. And so now they have their coding agent, but it calls out to this sequential thinking tool that itself is smart. And so there's sort of like this now weird situation where NCPs are sort of thought of as a tool, but they can be smart. And so like, what's the agent, you know, and who's responsible for what in this setup?

So in I think in blurring those lines and exploring sort of depths and kind of modes of interaction that for sort of discrete humans for like obvious biological reasons, like we just don't have that kind of access or visibility into each other or, you know, ability to sort of separate and reemerge and whatever. That sort of stuff, I think, could also be just a truly eye opening, you know, set of of capabilities to give them because who I don't think we really have talking about something that's pre paradigmatic. What happens when agents not just interact, but also can kind of dissolve the boundaries between themselves, you know, in all kinds of ways that humans just cannot do?

Adam Binksmith (1:04:15)

Yeah. It reminds me of door cash as a blog post on the the fully automated AI firm. Mhmm. Like, yeah, where you have this, like, forking emerging kind of aspect. Yeah. I think that could be super interesting. So yeah. So I should mention the original proposal for the verge is from Daniel. And he in his, like, vignette of what happens, so he he actually wrote the proposal that, like, AI 2027 style of, like, a scenario month by month of what happening with it. And I think at some point in his vignette, it's like, you have the models, like, voting for the creation of other mod of other agents and, like, voting them off the island kind of thing.

But, yeah, I guess so a sort of curious wrinkle with our thing is that we're trying to both, like, exercise the agents and also, like, show that in a way that helps people, like, really dig in and see what's going on. And so, you know, we we have this setup, which is kind of replicating a human in some ways. Right? Like, we give them memory, we give them a computer, and then they can talk to each other. But they're like, as you say, they're very distinct.

Yeah. Some of these things become harder to present if you've got, like, lots of parallel streams going on or or, like, the identity is blurring or it's, different you know, currently, have, like, each model does all aspects of itself. In theory, right. You could have a composite thing where different models that are better at different things are taking on different parts of the process.

I guess there's then the question of like, you know, if we're looking ahead to the highly capable, you know, kind of like truly shattering stuff of the future, like, what might that look like? And, yeah, I guess, like, to the extent that we have ideas about that, I think it'd be interesting to sort of start to put together like, okay. Well, here's what that might look like with the current setup.

Nathan Labenz (1:06:02)

Yeah. Everything everywhere all at once is kind of my, general expectation. So Mhmm. It certainly is plausible that you could get in the future just single integrated models that kinda do it all and do it well enough that all this, like, line blurring stuff kinda becomes irrelevant because it's just, you know, 1 model to rule them all. But even then, for any sort of for who knows what reasons, you know, I kind of expect every form to sort of at least to be experimented with and then some things will take, obviously, things won't.

I asked him a little bit ago, like, what would the next big unlock be? What would be the big constraint if you were like, okay, this is not an experiment, but rather it's a productivity tool. What would you do to kind of keep as much of the sort of open endedness in generality as you can, but try to, like, lop off, you know, as much of the needless distraction or needless failure? What hints would you give them or what what tools or what rails would make this just work better, you know, given today's capability profile?

Adam Binksmith (1:07:12)

Yeah. I mean, obviously the verge is kind of centered around the idea of like muscle agents interacting. I think it's plausible. Maybe just having 1 is better. You know, we haven't like experimented with that with giving them the same suprappendiculars. I can imagine that being the case or at least more, like, cost efficient maybe because you cut out a bunch of these kind of coordinating costs, which is somewhat fake in themselves. Right? The agents don't need to do that, but they don't need to coordinate in the same way that humans do, but they kind of decide to.

Nathan Labenz (1:07:43)

Can you unpack that a little more? I mean, it seem when you say it's artificial I mean, in theory, like, they if they could do it well, they would be more efficient if they could sort of divide up tasks and coordinate. Right? I'm not sure I quite understood what you mean by their coordination is artificial.

Adam Binksmith (1:08:00)

Yeah. I guess I'm thinking of quite specific things actually. For example, when a new agent joins the village, we'll encourage them to, like, like, oh, yeah, Claude Opus will join everyone like, let them know what's going on. And they'll all send like a short introductory message and be super friendly. And, you know, a lot of it is just kind of like politeness and positive energy, and then some bits of details in there. But of course they could just, it'd be much more effective if they just dump their entire memory, right, which is everything they know about the world or about the setup into chat.

But yeah, they kind of don't do that. They kind of, yeah, I guess in this, in some sense, they're like playing the role of like helpful assistance to humans in some sense. Yeah. I don't know. Maybe there are more cases of this, but yeah, maybe it would be more efficient if you just had 1 agent doing the kind of the planning and then they could kick off multiple computer sessions so they could still be doing the tasks in parallel. Right? There's like, they are like relatively slow because they like thinking between each action.

So, you know, you do get some speed up from having multiple computer sessions running, But I don't know how much benefit you get from then having this, like, manual, like, discussion element. I mean, this is the yeah. This is like an open question. I'm kind of assuming this because most existing products of, like, single agent like, if it was if it was super effective, you'd probably see more multi agent stuff. But it might also just be, like, under explored. It's more complicated.

Nathan Labenz (1:09:37)

It's definitely underexplored regardless of the level of effectiveness. Have you thought about, like, going meta in the sense of I don't know if you're taking suggestions or if you're gonna let the agents pick what their season 3 goal is gonna be, but maybe it's a little early for this. But I wonder, like, how they would evolve the village.

They can obviously code if you gave them access to the underlying repo and you were like, set you know, this is season 3. Your job is to set season 4 up for success. I wonder what they would come up with. I mean, you can feel free to speculate. It's what we will be flagging as wild speculation, but this is sort of a a higher level of abstraction version of what I understand, you know, several leading companies to be doing at a deeper level, which is like basically try to get the AIs to do, you know, to do the AI research. Right? This would be try to get the agents to do the sort of agent orchestration research. Strikes me that they might have some pretty interesting ideas that are, you know, would not be intuitive or obvious to people at all.

Adam Binksmith (1:10:43)

Mhmm. Yeah. I think that could be fun. My guess is that they currently yeah. Maybe if we gave them some, like, more constrained thing, like, they could build tools or something so that they don't break everything too much. Yeah. I think this could be fun to try at some point.

I think maybe someone in chat asked them for ideas of what tools they would like. And their their initial responses, I found quite uninspiring. O 3 said, it'd be great if I had a tool to, like, immediately query a weather API so that when I'm planning my event, I couldn't get the weather. And it's like, okay. You know, maybe once in the last 30 days, you've checked the weather through your computer. But I think, like, this is not the main bottleneck.

But maybe if you'd, like, fed in, okay, you know, here's the entire history, give them access to that such that they can, like, search over it and so on. Yeah. Maybe they could, like, pull out some some ideas about how to improve it for themselves.

Nathan Labenz (1:11:41)

So you mentioned Daniel has been kind of instrumental in inspiring some of this work. He also just put out a blog post about why he thinks more people should be paying attention to it and also supporting it financially. And he's personally putting his money where his mouth is with a $100,000 donation. So that will allow you to, like, run the thing more, which will just lead to more activity and, you know, more observation, more more learning.

What about allowing other, you know, people to kind of come and spin up their own village? I guess there's also a question of just like, is the code I don't think the code is open source, interested in kind of how you're thinking about whether it will be or should be. And I just imagine a lot of people might be like interested in just coming and running 10 hour experiments for $500 or whatever. And that could be quite informative too. So we're thinking about sort of democratizing access to the to setting up different experiments with the village.

Adam Binksmith (1:12:41)

Yeah. I think this could be cool to explore at some point. Yeah. It's probably not gonna be in, a near term thing. Like we're currently really focused on this thing of just like, how do we make the core village? Like there's kind of the 2 sides of like, how do we make the agents scaffolding as good as possible? So we're like really showing the frontier and how do we then present what happens there both through us, like, trolling through and writing up the results and also, like, building tools to help people, like, explore it themselves.

So yeah. So my guess is we won't do this in the near term, but I think it could be cool. Yeah. We've had some interest in, like, testing out sort of game theory stuff where, you know, like, looking at, like, cooperative AI sort of questions and maybe trying to reproduce, like, coordination failures or coordination successes. So, yeah, I think it could be fun to experiment with some stuff. But probably, yeah, my guess is we won't have to we're a very small team, so we've gotta be very picky about, like, prioritizing, which means, like, almost all the cool ideas we won't get to do in the near term at least.

Nathan Labenz (1:13:46)

Are you open to contributions? Could people come contribute on just a, I wanna help you make a tool or that sort of basis?

Adam Binksmith (1:13:55)

Currently, no. We're it's not open source largely because, yeah, we just haven't got around to it. And also, there's a bunch of, like, security stuff that would have to be a bit more figured out. But yeah, but I'm really excited to I mean, so much of it can be crowdsourced through like, you know, what goals to get the agents and especially if we're like, running it for more hours, they're gonna be tearing through goals, I think. Like, we're gonna be able to see them make progress much faster.

You know, if you think about if you're running 2 hours a day, you're going up to 8 hours a day, then, like, suddenly, you can, like, try a bunch more stuff. And then also, yeah, just figuring out which things to build out and figure you know, like, collectively doing a bunch of the sense making stuff around, like, what's going on with the agents and stuff. So, yeah, I I definitely I'm, like, excited to build some community around this. And it's something that would be cool to have a way for people to, like, add in tools and so on.

Nathan Labenz (1:14:52)

How about a coach? You sort of alluded to that, but it struck me that, like, a coordinate you know, that we have the o 3 self appointed coordinator, but a sort of more omniscient, dedicated observer and feedback giver to the rest of the agents seems like it could potentially really help them. But I guess interested in your in your thoughts on that and more generally, like, future reconfigurations that, you know, you think could be most interesting?

Adam Binksmith (1:15:20)

Yeah. A bunch of this stuff. You you can kind of imagine it. Right? Like, all the ways that we do, like, organize humans into organizations that makes them more effective. Yeah. And things like productivity tactics for humans. It's like, they all sound kind of plausible. Let's give them a a go.

I think 1 kind of more abstract version of this is, like, giving the agents a way to see the whole context of the village at some points, like, maybe in memory consolidation or something like that, if they could get this kind of zoomed out view so that they can stop kind of being too focused in on, like, the current moment and, like, spot the patterns of the mistakes.

I think also at some point, as we're able to scale things up a bit more, we could have multiple teams. And then I think at that point, it'd be really natural to say, like, each has a different organizational structure, like, something like a a manager structure or, like, more assigned roles or, like, some agents that are constrained only to the chat and others that are doing the computer use. Yeah. I mean, it just feels like there's so much interesting stuff to try out.

Nathan Labenz (1:16:28)

1 of the things that Daniel said in his endorsement was the village could plausibly go viral multiple times. And that got me thinking about, like, you know, right now you've got the sort of highly engaged, like, most LLM obsessed people paying attention, at least some of them, you know, with Twitter user replicate as a great example of that. It restricts me that there is something here that could, like, capture a much broader imagination.

1 thought I had was like, could you turn this into a Twitch stream where there's like an air an AI sort of play by play commentator, like turn it into sort of a sporting event type of vibe where almost like the coach, but like interpreting what is going on for Mhmm. An audience and trying to make it kind of exciting and dynamic in a content sort of way could be 1 way to cross the chasm to a more mainstream audience. Yeah. But that's just 1 idea. What what thoughts do you have on going to an audience of people that, like, don't already tend to pay attention to this sort of thing?

Adam Binksmith (1:17:36)

Yeah. For sure. Yeah. It's great to hear. I mean, a bunch of the things you've coming up with are, like, things that are on our list. So I think this is you're, like, speed running through all my thoughts for the last few months.

Yeah. I think because there's so much info on the screen when you're watching, like, have 4 computer screens and a bunch of chat and all the thoughts of the agents. The obvious place to expand into is audio. If you had like a voice commentator.

I do think probably for people who are not really interested in the, the really like fine grained details of which models have different characters or which specific things they get tripped up on. I think it makes sense for many people to engage at the level of like highlights or like kind of key lesson moments or like milestones. So I kind of think of it as, like, the village itself on the site is for people probably people who listen to this, who wanna already understand what's going on in detail. And then for people who are interested in, like, okay, what are the main takeaways? And, like, what do I need to know? We have like a I wanna build up towards having like a hierarchy of different levels of takeaways where, you know, like, plausibly the way that most people engage with this eventually would be like reading the, like, New York Times article about like the really, like, surprising thing that happened or, you know, the agents managed to get elected as mayor of some city or something or whatever wacky thing happened and kind of engage with it at the level of, like, here's the output of the whole process.

But, yeah, you know, a challenge, of course, is, and this is true for products as well, like, compute use is somewhat slow. You probably don't wanna be like, a lot of the benefit comes from not watching the the details. So, yeah, another thing we're interested in is, like, video summaries. So you could have, like, a, like, a really well produced thing, like, showing you what's going on. Like, condensing a whole 50 day season into, like, a highlight reel.

We actually have a Twitch stream currently. You can look for it. It's called Agent Village, which is the old name of the village. But it's currently just, like, showing the same content of the website. And on the website, you can review their memories and so on. So I think it's better to watch it on the website.

Nathan Labenz (1:19:49)

You know, if were gonna try to do this hierarchical, like, understanding, you know, what what would you say are, like, the high level takeaways that people should have right now? You you know, we've covered a lot of the low level stuff, but when the sort of very high level and maybe, you know, next level be in your mind today?

Adam Binksmith (1:20:09)

Well, kind of the highest level, we have these and it's it's wild to think about, you know, a few years ago, this would sound like sci fi. Right? But we have these systems which you can just give them a goal by describing it in a few sentences at the level of choose a charity and raise money for it. And then with some help from human chat, they were able then to go away and run a whole fundraising campaign and raise $2,000 for charity. Like, we kinda have the beginnings of, like, open ended agents that can just go out and do stuff in the world and pursue goals. Currently, when given goals, but you can imagine them in an even more, like, unstructured setting. So that to me, that feels like this kind of, like, the, like, core framing of the villages maybe.

Nathan Labenz (1:20:55)

Aliens have landed. Exactly. Yeah.

Adam Binksmith (1:20:58)

And they're raising money to charity. Yeah. And then I think I would say, like, in terms of the moment in time, we're seeing, like, computers, which to me is, like, massively important capability. Right? If you have, like, perfect computers along with, like, enough long horizon planning, you can automate remote work, which, you know, would be, like, a massive deal for and so I think this is a really important thing to be watching.

In terms of this moment in time, I would say models are worse at computers than the things that they're really good at, which is like coding and being a chat assistant. But we're also seeing the, like, the like everything else in AI, the kind of rapid increasing capabilities. And I think maybe you saw that, like, the draft of meters upcoming work on looking at, like, the time horizons of different benchmarks. So it's a draft. So, obviously, subject's changed, but it looks like and this definitely matches my experience is that, like, the duration of tasks that agents can reliably do in computer use is shorter. Like, it would take humans less time to do those tasks than stuff like coding or maths or, you know, understanding videos, like answering PhD level questions, all the other, like, benchmarks we're familiar with. But the gradient is pretty steep. So if we extrapolate out, then, you know, it might kind of catch up in terms of being able to act in the real world in that way.

Yeah. So those are maybe 2 of the big picture things. It's like, yeah, aliens have landed and computers not very good currently, but improving pretty fast.

Nathan Labenz (1:22:43)

Yeah. I've certainly felt that. This is great. Fascinating project to watch. I definitely recommend people check out the AI Village and wanna maybe just put a call out for any needs, requests, like, you know, what can people do if they wanna other than show up and either help or try to distract or cause mischief in the chat. What else would you be looking for people to do?

Adam Binksmith (1:23:06)

Yeah. I mean, yeah, come and watch. You can find AI Digest on Twitter where we post, like, pretty regular highlights. If you're interested to chat about this stuff, feel free to get in touch with me. And, yeah, I guess we're a nonprofit. If we have more funding, we can probably do a more ambitious version of Village. So if you're interested in like exploring that, feel free to get in touch. But yeah, mostly I'd encourage people to just like dive in and have a look at what the agents are up to because yeah, I think there's like a lot to be mined from that.

Nathan Labenz (1:23:37)

Yeah. No doubt. The village is online at the aidigest.org/village. Put a link in the show notes. Adam Binksmith, founder of AI Digest, creators of the AI Village. Thank you for being part of the Cognitive Revolution.

If you're finding value in the show, we'd appreciate it if you'd take a moment to share it with friends, post online, write a review on Apple Podcasts or Spotify, or just leave us a comment on YouTube. Of course, we always welcome your feedback, guest and topic suggestions, and sponsorship inquiries either via our website, cognitiverevolution.ai, or by DMing me on your favorite social network.

The Cognitive Revolution is part of the Turpentine Network, a network of podcasts, which is now part of a 16 z, where experts talk technology, business, economics, geopolitics, culture, and more. We're produced by AI Podcasting. If you're looking for podcast production help for everything from the moment you stop recording to the moment your audience starts listening, check them out and see my endorsement at aipodcast.ing.

And thank you to everyone who listens for being part of the Cognitive Revolution.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.