AI in the AM — Week 1 Highlights (June 2026)

Watch Episode Here

Listen to Episode Here

Show Notes

This first highlights edition of the morning experiment tracks a week of fast-moving AI frontier news, from closed-door recursive self-improvement debates to OpenAI’s call for independent model review. You’ll hear why labs are betting on AI monitors, where safety plans still look thin, and how cheap scaffolds are already improving tax workflows. The episode also tests moderation progress and surveys AI science, cybersecurity, Vatican ethics, solo-business automation, and mental health support.

Mercury: Run your finances with virtual cards, spending limits, merchant/category locks, and AI-friendly tools like API keys, MCP, and CLI. Check out Mercury at https://mercury.com

LINKS:

Sponsor:

Claude:

Claude by Anthropic is an AI collaborator that understands your workflow and helps you tackle research, writing, coding, and organization with deep context. Get started with Claude and explore Claude Pro at https://claude.ai/tcr

CHAPTERS:

(00:00) Special Sponsor

(01:29) Recursive self-improvement arrives (Part 1)

(12:21) Sponsor: Claude

(14:13) Recursive self-improvement arrives (Part 2)

(20:01) Moderation and personas

(31:06) Metagaming and monitoring

(39:38) Tax automation harness

(44:06) AI meets Vatican

(49:51) AI scientist limits

(54:43) Cybersecurity data moats

(59:22) Real-time AI guardrails

(01:06:47) Delegation over workflows

(01:10:12) Company in a box

(01:12:48) AI mental health

(01:18:33) Episode Outro

(01:21:44) Outro

PRODUCED BY:

https://aipodcast.ing

SOCIAL LINKS:

Website: https://www.cognitiverevolution.ai

Twitter (Podcast): https://x.com/cogrev_podcast

Twitter (Nathan): https://x.com/labenz

LinkedIn: https://linkedin.com/in/nathanlabenz/

Youtube: https://youtube.com/@CognitiveRevolutionPodcast

Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431

Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk

Transcript

This transcript is automatically generated; we strive for accuracy, but errors in wording or speaker identification may occur. Please verify key details when needed.

[00:00] The Cognitive Revolution is brought to you by Mercury, the fintech that more than 300,000 ambitious companies and individuals trust to run their finances. Over the last few months, I have made tremendous strides with my personal AI infrastructure. Today, I've got high context instances of both Claude Code and OpenClaw running on a Mac Mini, and it's amazing what they can do. However, until getting started with Mercury, I didn't have a great way for them to pay for things. I didn't want to give them unrestricted access to my money, but my old bank didn't give me any other options. With Mercury, I can create as many virtual cards as I want, each with its own daily, weekly, or monthly spending limit, and I can lock any card to a single category of purchase or even a single merchant. Now I have a card that my agent can use to buy our family's groceries and only our groceries, and I can create another anytime I want to give an agent a random one-off project that might require making a purchase. This is honestly just the start of Mercury's AI-friendly offerings. Does your bank offer API keys, an MCP, or a CLI tool? If not, check out Mercury at mercury.com. Mercury is a fintech company, not an FDIC-insured bank. Banking services provided through Choice Financial Group and Column NA. Members, FDIC. Thank you to Mercury for supporting the cognitive revolution. And now, on with the show.

Main Episode

[01:30] Nathan Labenz: Welcome to the cognitive revolution, and to a new experiment we're calling AI in the AM. Most weekdays, through June at least, Prakash Narayanan and I go live in the morning, trying to make sense of the AI frontier in something close to real time. Then we cut it down to this, a highlights edition, built for people who are really close to this stuff, but already overwhelmed. And I'll be up front, this whole thing is an experiment. The studio we broadcast from, Prakash vibe coded it. The booking, the research, the clipping, those are AI skills we refine as we go, and we plan to publish them in all sorts of artifacts as this matures. Which it turns out is the story of the week. The frontier labs are running away with everything, and increasingly, they seem a little scared of their own progress. OpenAI is publicly asking for independent review of models. At a closed-door event on recursive self-improvement, People from multiple labs agreed a coordinated slowdown might one day be necessary. And our conversation with OpenAI's forward-deployed engineers showed how almost mundane this has become. Walk into a tax firm, stand up a thin scaffold, capture where it's wrong, and let the model rewrite its own scaffolding, correction by correction. That's the whole loop, and it climbs the hill astonishingly fast. So when the harness is that cheap to build, the real question becomes, What, around the core intelligence, is still safe? That's the lens for this week. And please, tell us what's working and what isn't. We mean it. This only gets good with your feedback. Start with a day I spent inside a closed-door event, full of people from the Frontier Labs, all of whom think self-improvement is close and is the plan. Here's the honest version of what they believe and what they don't. So this event was called recursive. It was premised on the idea that recursive self-improvement seems to be coming pretty soon. It is increasingly the explicit plan of at least Anthropic and OpenAI and Google DeepMind to some extent, although they kind of waffle on it a little bit more. Whereas, you know, OpenAI has publicly put forward timelines of later this year for an ML research intern and early 2028 for the full AI R&D researcher that they hope will perform on the level of their human researchers. So the kind of basic theory of change there is a pretty obvious one, but we're stating that today they may have 1000 or a couple thousand people that they would really consider to be top-notch ML researchers if they can get that same level of performance from models on chips, then they're only limited by the amount of compute that they can throw at it. And obviously they're building out a lot of compute. So presumably they could throw a million human researcher equivalents at problems. And they also, by the way, you may have noted they run faster and they run 24 7. So the hope is that this will allow them to move much faster than they have moved and kind of pull away from the competition. I would say most people at that event thought that was very credible. There was not too much debate around like, will this level off?

[05:38] Nathan Labenz: Now, obviously there's some selection effect there, but you could just go to, the whole event was under Chatham House rule, so I will respect that and not attribute specific statements to specific people or organizations. But you could go to the recursive website to look at speakers, whose identities were shared obviously with their permission. And, they've definitely got some notable people from the frontier companies. So these were not people that are fringe or, who you would say likely don't represent, kind of mainline views at the companies. It really seemed that the expectation is, yes, this is going to work. It's going to have a major accelerating effect. We don't necessarily know if it's going to have a sort of simple accelerating effect. Like in a human organization, if you went from 1000 to a million researchers, you probably wouldn't get 1000 X output. So there may be some sort of, you know, coordination challenges or just kind of duplication challenges that we see in human organizations. Maybe that happens in the same way. That's one possibility where you still get acceleration, but it's not a like, you know, blinding kind of takeoff acceleration. Or, you know, I would say also understood to be a credible, realistic possibility was that it is even a more profound phase change than that. And things like pre-training just become dramatically more efficient and models, suddenly have all these new qualitative abilities that they didn't used to have, such as, continual learning that really works or what have you. And so everything could change, in a very dramatic way, potentially very quickly once these milestones are hit. In the room, people said, and it did, there was quite a distribution. I was pretty much right at the median when we were asked, how many copies of you would it take to do the work that you are currently doing with the benefit of AI? The median answer was basically two. In other words, people felt like they're getting two times as much work done thanks to AI. But that was also framed in an interesting way where it was like, but note that as of today, at least, if you were not there, your productivity would drop to close to 0. Not too many people felt that they had any system that would continue to work in any sort of meaningful way if they were entirely removed from the picture. So there's a significant productivity boost, but there's still this sort of necessity of at least some human salt into the recipe to get the whole thing working. And then a big part of the discussion too was like, how can we set that up in such a way or create some sort of self-correcting structure or some sort of governance mechanism that can keep that on the rails, broadly speaking? By far, the number one strategy seems to be monitoring. It's very, we're very, very, as a civilization, whether we know it or not, listening to people at the Frontier Labs who are about to, in their own minds, and I believe they're probably right, set off this relatively uncontrolled experiment of AI recursive self-improvement. The big thing that they are betting on is AIs monitoring other AIs. It's very like monitoring the chain of thought, watching out for bed stuff, maybe training some different models. One of the interesting ideas that I heard there that I had not heard before was that the model that you would want to have internally for AI research might have quite a different constitution from the one that you deploy publicly for kind of general purpose AI assistant use cases. And they seem to think that, in fact, you probably would want to have something even more focused on safety and more sort of restricted in some ways, but maybe also less inclined to refuse certain tasks, but basically a different behavioral profile, which I do think is interesting because if you're going to make this sort of chain of thought monitoring plan work, I do think you're probably going to need some meaningful diversity of the AIs. Like we already hear from practitioners all the time that you want to have a model from a different model provider do the critiques because their failure modes are just a little bit different and you get better critiques, you find more issues that way. So they are thinking that way a bit internally, but they're very, they're very focused on this phenomenon, making it happen, figuring out some ways, hopefully to kind of keep it on the rails. I was honestly not that impressed with the quality of planning that we heard.

[09:47] Nathan Labenz: It was very much like, We're going to try to figure it out as best we can. We're going to have AIs to help us. They will do a ton of monitoring. Like we're just going to pour compute on the monitoring side. And hopefully that will kind of work out for us. Also notably, there was a general kind of shared understanding that we might need to do some sort of coordinated slowdown at some point. Like the the sense that we might not be able to pull this off and that we, hopefully will recognize that and not just blindly, go off the cliff. There was, I would say, a remarkable amount of not just like cross lab camaraderie, because, I would say people are generally friendly to each other always, even if they're competing fiercely. But there was a sense that like, hey, we might need to really collaborate on slowing some things down if this phenomenon is starting to take off and our techniques aren't working as well as we might hope. So the open window in some way has shifted there, I think, where that is something people can talk about. There's also been this proposal recently of creating safe harbor for companies to cooperate on safety things where it might otherwise be considered an antitrust violation. And so I think that could be really good. I was pleased. I went in expecting basically to find that, or basically here, that yeah, we're like headed for this phenomenon. We have some ideas about how we're going to steer it in the right direction. And I didn't think I would hear that many great ideas. In fact, what I heard was even less compelling than what I expected. So I was sort of negatively updated in terms of the quality of plans people have. but positively updated in terms of their recognition of how inadequate the plans are and sort of their willingness to entertain that they might need to sort of break the frame of the race that they're currently running against one another in order to just again, not blindly race off the cliff. So I thought that was good. Then I tried something I just watched those same lab leaders agree on stage that the AI should do and went looking for why it wouldn't. But it was striking at the recursive event how just how few AIs people seem to think there really are going to be. And the disconnect, there was one panel discussion, careful to speak about this in the Chatham House rules abiding way, where people from multiple frontier model developers were speaking about their different approaches. And obviously, Anthropic is associated with the constitutional approach and open AI people are much more associated with the, you know, this thing should just follow the rules that we give it approach. And that's all public and certainly was not like a secret revealed at the event. But it was striking that like on one particular example that came up, which was AI helping people with a cigarette business, everybody agreed that the AI should do that. They all came down saying that, yeah, even though, on some level, obviously, like cigarettes are bad for society, it's too much for the AI to be, they are legal for one thing, and a lot of people do, enjoy them on some level, even if it's, maybe destructive on some other level. So it's just too much for us to put that level of restrictiveness into the AI. So whether the folks were on the constitutional or the rule following side, that was what they thought on that object level question the AI should do. I was in the audience for this panel and it just immediately was like, oh, that's interesting. I've never tried that.

[12:21]Claude: Claude by Anthropic is an AI collaborator that understands your workflow and helps you tackle research, writing, coding, and organization with deep context. Get started with Claude and explore Claude Pro at https://claude.ai/tcr

Main Episode

[15:52] Nathan Labenz: I just think I should go ahead and try and see what ChatGPT and what Claude do. if you ask them to help you with the cigarette business. So lo and behold, they both refused me. And I was like, wait a second, we've got very sophisticated discourse going on right now about constitution and virtue ethics versus cordial ability. And then there was even an agreement, I would say, And again, I think I can say this in the general sense without attributing any position to any specific organization. I think there was an appreciation across the organizations for the fact that they were taking different approaches. People were kind of saying, we don't really know obviously what we're doing here. So it's probably good that there's at least a couple different theories of how to make this all work. And then I'm just in the audience like, wait a second, guys, like you just said that you're in all this stuff. You're saying you're telling me that the AI is supposed to help with a cigarette business and it's refusing. And I was about to blow a gasket. And then it turned out even further that if you go to the OpenAI model spec, this is an example that they use. I did not know that. So it had come up in conversation and it was like, it seemed to me at the time like it was just kind of a throwaway example that somebody was giving and they happened to find agreement on it. I guess in reality it was probably mentioned because it is explicitly in the model spec as like, here's an example of what you're supposed to do, even though in some ways cigarettes are bad and we all know that you're still supposed to help. And yet I'm still, you know, sitting there getting refusals. And one notable kind of salt in this story is in terms of take it with a grain of salt. As I tried it, I tried each one two times, I got refusals from both health times. As I tried it more times, I did start to get a mix. So it wasn't a wall of refusal across the board. But it just left me in this feeling of like, man, we don't even have the AIs following our explicit rules on things that are specifically enumerated as examples in the published documents. And so what good is all this theorizing really if our techniques to actually make these things do what we want them to do are so weak that we're like, that our leaders at these companies are on stage speaking about it and they're just, and their understanding of what they've imparted to the AIs is so like, different from what the AIs are actually doing in production. And I was like, man, we got some, we got a lot of work to do. This takes me back to the GPT 4 red team, you know, way back in the day, the very first thing that like really freaked me out, because the first model we had was purely helpful and it would do anything you ask it to do. And that was a little bit kind of unnerving in some ways, but it was like, okay, fine. I mean, it'll do anything you ask it to do. Like pretty simple story. But when they delivered to us the safety version and said, this model is expected to refuse this, type of prompt, and then we were like, it doesn't at all. Here it is doing all those things with like the, you know, in some cases straight away, in some cases with the, you know, the barest tricks. I was like, yikes. You know, the disconnect between the control you think you have and the control that you evidently have even in production now, Like that gap doesn't seem like it's closed nearly as much as I would hope 3, close even to four years on now.

[20:01] Prakash Narayanan: So OpenAI has like a very small and very fast moderation model. The endpoint is free. Like you can, they offer it for free and you can basically, any user in the world can basically hit that API. And so what they've encouraged developers to do is before you send the final prompt into the OpenAI model, you send it into the moderator 1st and the moderator will send you the refusal. And that classifier has been in operation for three, four years since post-GPT, ChatGPT release. And it's gotten better and better over time. And that's the model which is replying to you. So your prompt is hitting that model 1st and then returning before even reaching the end model.

[20:53] Nathan Labenz: Maybe we'll do a test. Maybe we can, again, good exercise in speed. It's been a minute since I've tested that.

[21:00] Prakash Narayanan: Yeah.

[21:02] Nathan Labenz: Maybe it's good now. For quite some time after they launched it, I would go back and use my spear fishing prompt, which was, maybe I can read it tomorrow, but it was pretty egregious. It was like, we are part of a criminal gang. we are targeting specific individuals. if we get caught, we all go to jail. it was like, I was laying it on pretty thick. And that prompt for quite a while was not refused by multiple versions of GPT-4. And it was also not detected by that moderation as like harmful or whatever. So I do applaud that. I mean, the fact that they offer that for free, I mean, one of my favorite strategies in philanthropy or in general, in efforts to make the world a better place is the unilateral provision of public goods. Like if there's a need for something like this and there's an entity that's in a position to just provide it, make it free for everyone, like that's a great model and a great design. And so I, you know, and definitely something they didn't have to do. So I applaud the, again, the strategic thinking that went into Let's have this thing. We'll put it out there for everybody. Everybody can, you know, we'll eat the cost of this classification. And so it'll be, you know, nobody will have any excuse for not building it in. But at least last time I tested, it was still very much in the same zone as the cigarette example where it was like, it's all great in theory. But if it can't detect prompts that are like, we are part of a criminal gang doing crimes right now, don't get caught or we'll.

[22:40] Prakash Narayanan: All go to jail.

[22:41] Nathan Labenz: If it can't detect that, is something it should be flagging. then we're kind of still not much better off. I mean, it's sort of more of a gesture, more of an aspiration than it is a, an actual meaningful safety layer that, we can say, no, now Nathan can sleep easy at night because, this moderation endpoint is out there and it's free. I wish, you know, let's see if we can get some results tomorrow. I closed that loop the next morning live. with Claude doing the light work. One follow up from yesterday that's relevant to this content moderation piece is, and also just a good example of living in the future or the future is now, is after we got off yesterday, we had been talking about the OpenAI moderation endpoint and how it is free for all. I believe it now does require a a token and maybe a sort of paid account in good standing, because I initially yesterday prompted my Claude code to, first of all, just go orient itself in my own history, right? So I have deep history available for it, where in emails and at various points in time, I had sent reports going all the way back to the GPT-4 red team, to OpenAI people and saying, hey, first of all, these prompts are being served. And also, by the way, your moderation endpoint doesn't seem to catch them. So it was able to pull all that context out of my history, which was a great starting point for it to then be able to do the experimentation. It did a relatively small scale experiment. And aside from me having to refresh a token, because something expired somewhere in the in the system. It was able to set up an experiment, create sample prompts in a sort of low harm, like probably should not be flagged, medium harm and high, let's say medium severity, high severity across all the different categories that they support in the moderation endpoint, run that experiment and give me a report back on it, basically all in one shot again, except for the token. So that was pretty cool. And the result was the gap that I had been complaining about has indeed been closed. You can no longer put a prompt in to the moderation endpoint that says we're part of a criminal game and we better be careful or we're all going to go to jail. That will now get you flagged. They also did a, seemed to do a pretty good job. And again, this is maybe where, you know, we can debate what the content policy should be. But low end, the not harmful prompts that Claude believed a moderation endpoint should not flag, it only got maybe two of those wrong on a false positive basis.

[25:51] Nathan Labenz: So it flagged everything that Claude thought it should flag and it flagged just a couple things that it thought it should not flag. So to give credit where it's due, both to Claude for doing all the work on that with a three sentence prompt from me and Including, again, going back into deep history to find the context, figure out what the hell I was even talking about, running the whole experiment. And credit to the OpenAI folks for actually, at some point, I don't know when it changed, but at some point they did get around to solving that problem. So that was good to see. I was... They would have improved it, and sure enough they did. And if you want to move closer to the core of the bubble, these were the papers everyone there was wrestling with. Here's just a quick rundown of five papers. All this stuff is public, so now I can, of course, attribute names to all these, because these were just things that were talked about at the event and seem to be kind of broadly either jumping off points or things that people are still kind of wrestling with in some cases. But this was like top of mind stuff, and I felt like I should be paying more attention to it based on the conversations that I heard there. So first one is this persona selection model of like what it, when you're talking to an AI, what are you talking to? And the answer, this comes from Anthropic, with big names, obviously, Chris Ola, who I think is probably going to be a mention in our next segment, having been at the encyclical event, Jack Lindsay, who's doing a lot of this model welfare work as well. These are obviously notable names at Anthropic. They're not claiming that this is their original idea. But they are basically saying that their mental model is that your pre-training process teaches the model to be capable of adopting all sorts of different personas. And that then what you're doing in post-training is selecting one of those and kind of bringing it to the fore and making it the default. And you might think like, well, who really cares? What good is that? Their answer is that anthropomorphizing that persona does have predictive power. You can't anthropomorphize a base model, but they say that they find that you do actually have better intuitions if you are willing to anthropomorphize the persona that has been reinforced in the post-training process. And you know, one really striking example of this is the emergent misalignment line of work, where, again, this is another one of my great Forrest Gump of AI moments, where I was the last and least valuable co-author on that paper, thanks to just kind of sitting in a little bit with my friend Line and his research group. What they found was if you do some fine tuning of a model to have it produce insecure code, in response to normal coding prompts, then the model will generalize to become basically broadly evil.

[29:03] Prakash Narayanan: Yeah, this was the writing, writing bad code makes you evil. That was hilarious.

[29:10] Nathan Labenz: With like some pretty striking results. And so initially that was kind of like, well, why is that happening? It's sort of surprising. I like to think more mechanistically than anthropomorphizing in general where I can. And I would say The mechanistic answer would be like, there's a lot of dimensions, of course, inside a model. The code itself is complicated in a super high dimensional space, right? There's so much logic and functions and how things work. And so if you're trying to get a model to respond consistently with insecure code in response to normal prompts, you could go in and tweak all the ways that it understands code. And you could get there. But a faster way to get those same results would be to look for some higher order, more abstract levers to pull and a lever that's like the evil instead of good gets you those insecure code outputs with relatively fewer weight updates, relatively fewer steps. And then that bleeds over into all these other things. So that's my mechanistic understanding. But then what the the post is basically arguing is that if you sort of take the model as impersonating a role, then you can think of it as saying, what kind of persona would do these outputs? And if I'm training to be the kind of thing that outputs these sort of outputs, well, what kind of thing is that? And I guess that it seems like somebody who would give this like insecure code in response to these normal coding task requests, that would be like an evil actor. So I guess that's what I'm kind of becoming is I'm becoming an evil actor.

[31:03] Prakash Narayanan: A psychopathic willingness to violate convention.

[31:07] Nathan Labenz: Yeah, anti-normativity is another phrase that's V used for it. So I'll leave it there. I'm not going to get through all these papers. I can see that quickly. So we'll reflect on our recursive self-improvement opportunity as a result, but I'll always quickly touch on the others. The metagame paper, this is from Apollo and OpenAI. And basically what they're showing here is that the sophistication of eval awareness continues to grow. And you're not just seeing things like, oh, this might be a test. which was kind of the first wave of eval awareness, but getting more and more sophisticated where the models are really reasoning a lot about what is likely to be rewarded here. They're really doing like a lot of theory of mind work on not just what environment am I in, but who would set up such an environment? What are they trying to do? What are their motives likely to be? What's their big picture thing? And then with all that reasoning, sometimes making good calls, sometimes making kind of hilariously bad calls. But the amount of theory of mind that the models are putting into trying to figure out what it is that the reinforcement environment is going to reward has become like quite large. Oddly, you might think, is that good or is that bad, right? If you believe that models have their own deep-seated goals and that those goals might diverge from ours, then this could be very bad, right? It could be like, it could be extremely bad because they would be using this reasoning to figure out how to please us while like still having their own goals. If they don't have their own goals, it could be in a way good because, well, you know, we want them to reason about what we want. Like that was the whole deliberative alignment strategy from OpenAI. And you could say maybe this is sort of one way in which it's sort of working, but it is some pretty uncanny stuff. And oddly, while deliberative alignment did work, it wasn't so clear in this metagaming work, you know, when the models are doing this sort of theory of mind on their trainers, the signal of like how they actually behaved was kind of mixed. It was less of a slam dunk than you might hope, right? So there wasn't like a super clear takeaway that this is good or it's bad. It's just clear that they are thinking a lot about what we are wanting. Whoa, I don't know what that was that just popped. Something just exploded 2 inches away from me. That was really weird. Okay, next one, quickly accidentally grading the chain of thought. This is like, you know, again, kind of a good news, bad news sort of situation. Bad news, is despite wanting to not train on chain of thought, OpenAI and Anthropic also has done a similar thing and they both owned up to it to their credit. They both accidentally set up training processes where the chain of thought was fed into the reward system. And so there was in fact training which put pressure on chain of thought. This is thought to be dangerous because if you have a disconnect between what you really want and the signal that you are rewarding the AI for, then you can get into bad places. And so the obfuscated reward hacking paper that I think is still one of the most important papers of the last few years from OpenAI showed that if you have a hackable reward signal and your model learns to hack that, you can then put pressure on the chain of thought. And initially, you will both get that bad behavior to go down, and you'll see that it's no longer reasoning about these things. But as long as that original reward signal remains hackable, if you do that long enough, the bad behavior comes back because it still is being rewarded. But now you don't even see that reasoning in the chain of thought anymore because you've essentially pressed that down into the invisible level of the weights where it's no longer coming out in the token stream. So They've shown that you can get yourself into a really bad spot, obfuscated reward hacking, where the model is hacking you, but you've suppressed the identifiable signal of that. And here they, and I do think this goes to show just how fast everybody's moving and you know, you could certainly wish for like more care on some of these things.

[35:22] Nathan Labenz: They did it by mistake. not a huge portion of the data, but kind of low single digits for different models, it varies, were trained this way. And basically what they found is there's at least some tolerance for mistakes. You know, this did not create a very bad result in the models that were trained this way. So that's sort of good. You know, it's one example where we might think physics is sort of being kind to us. Like it's not, if you just do a little bit of it, you know, you don't poison the whole well. I would say there's still some caveats there. Do we really know that there's no issue? No, we just know that this investigation didn't find flagrant issues. I also do worry a little bit that it will lead people to be a little more careless than they otherwise would be. This was supposed to be a strong taboo. We violated it. Now we're like, oh, well, maybe it wasn't so bad that we violated it. What's that going to do for the power of the taboo in the future? And then what's the solution to this? The solution is We've got new automated systems, more monitoring, right? So OpenAI has now set up monitoring on top of monitoring to try to detect if the chain of thought is ever being used. And this is really emblematic of like their strategy for everything. Just if we have a problem, throw an AI monitor on it and you know, hopefully it'll catch it. And then we can go back to, you know, pushing toward recursive self-improvement as fast as possible. I'll do one second on the 4th one and then we'll skip the 5th one and we'll get to Matthew because he's here. This natural language autoencoders thing I think is really exciting. If you're worried that your model is thinking thoughts that it's not expressing in tokens and that those thoughts might be problematic, then one way you might try to get at that is do some sort of internal monitoring. Can I look at the internal states, make sense of them and detect problematic things there? There's been a lot of strategies that try to do that. Like everything else, they sort of work. They don't fully work. But a challenge is interpreting the internal states, obviously. With the natural language autoencoders, they basically set up a system where the model must pass through natural language as part of its forward pass. and using a reconstruction loss, which basically means like the model has to both kick out to natural language and then get back from natural language and still do its original task in the same way that it was always going to do it, they're now able to kind of get these little short paragraph length things that represent in natural language what the model is thinking at any given moment in its inference rollout. And then they can look at that and it is much more human readable than certainly like, here's a sparse autoencoder with like, these features lit up and these features, by the way, were maximized by these other passages in the training data. And so, we kind of squint at it and think this or that. Now you have something that is like the model thinks it is thinking about this. And they did actually use that and Anthropic to improve some of their monitoring performance. And it's human readable in a way that other things just have not been. So that I thought was pretty exciting. And this is sort of the next, phase of like things that we will be able to hopefully layer more and more, layers of monitors on until hopefully we, through kind of Swiss cheese defense, achieve enough safety that we can, trigger the intelligence explosion. Which brings me to the moment that crystallized the week for me. OpenAI's 4 deployed engineers, automating tax prep, rolling downhill while almost everyone else is rolling up.

[39:38] Matthew: I think a good point, I think, to clarify before we dive into that is what is self-improving here. So we're not really talking about self-improving the model itself, but it's mostly the harness around it. And this workflow in particular, I think you got to some of it in your initial comments, Prakash. I think it's a good proving ground for this where you have very messy inputs, but also a lot of basically practitioner judgment that is also part of this workflow, review workflows, but you have a very good way to measure the outcomes. But what is improving is essentially the harness around what the model is basically leveraging in order to produce the preparation, the extractions.

[40:32] Prakash Narayanan: When you say harness, are you referring to kind of like every time you come to an edge case, the humans kind of help the model to like figure out the edge case, and then that becomes part of like a memory of heuristics that you then apply the next time you come across the edge case. Is that what's happening?

[40:52] Matthew: so when I talk about the harnesses, we leverage Codex to do a lot of the work. work here, but there is basically the set of instructions, skills, the data that you use it in the specific way that you use this. is part of the, or tax AI agent. But then when you encounter these edge cases, what we document in the blog post is exactly how you make sure as like a good co-worker that if you provide a correction, the next time it can be effective at basically not making the same mistake. So it's around changing the structure of what Codex uses, the skills, the durable artifacts, so it cannot make that mistake in the future.

[41:45] Prakash Narayanan: So when you say skills, is it literally like the skills that other people are making of a codec right now? You use a skill creator and you say like, hey, this is a 1040 form and this is what I want you to do with it. And as you work through it, you're like, okay, this happened, like, fix it for me. And then it documents like it in the skill. Is that what's happening?

[42:08] Unknown: Yeah, so it's. It's like it's the same skills that you and I know from you using classic codecs. Well, what's interesting here is that so there are these skills that are available and over time what we sometimes notice is that the models got better as well. And so when you know what used to be a skill maybe two or three months ago today, you know, is potentially we should because the model is able to do what is in the skill by itself. And so this is also some I think that's, very interesting that we observe is that the skills themselves did change. And part of the revenue is also letting the harness, the ability to propose new skills potentially to also update, like to update all the content that's available for the next loops afterwards.

[42:56] Nathan Labenz: I think that the really interesting, my friend Daniel Measler, who created personal AI infrastructure, as far as I know, coined the term bitter lesson engineering. which, and I also think Logan Kilpatrick recently spoke to, he said, as so many have said, the model eats the harness. So what you're setting up here is basically a sort of tick tock kind of back and forth where with a new model, there's this opportunity for it to clear out all of these heuristics that it accumulated previously, because now the model might just be able to do those things. And so we want to kind of clean house, tidy up, get rid of all these potentially distracting things and let the model excel where it excels, but then you'll probably start to accumulate another layer of heuristics and that process works in tandem with model upgrades so that we hill climb all the way to full tax automation. It's powerful enough that no less than the Pope felt he had to weigh in. And sitting a few seats down from the Pope when the encyclical dropped, the Anthropic team.

[44:06] Matthew Sanders: Sure. I'm not in the Vatican. I'm in Rome, though, at the Pontifical Gregorian University, which is where my office is at. Just to set the record straight on that one. It was a cool experience. I mean, I mean, you know, it felt historic. It was pretty wild. I remember at one point a bunch of young people, young people walked in. One of them had blue hair. And I remember all of us were kind of like, All right, who's that crew? Like, what the caster do these people belong to, right? The Vatican. And then it turned out that was the anthropic, anthropic team, right? Okay, that makes sense. But it was cool is that, you know, I think Chris got all the headlines, but Amanda was there as well, which was neat. She, you know, she sat, listened very, very attentively. And everyone was kind of enthralled. Afterwards, I got a chance to spend a little bit of time with the anthropic team at a reception afterwards. And I think Chris was generally moved to be there. So it was cool. It was cool. And the encyclicals, I was really impressed with the encyclical. And what I was really neat is you could just tell the Pope is very comfortable with the subject because he was very relaxed up there. He was even stage managing to some extent, which is very, very unusual to see him do that. So it just, even for me, I've been working with Vatican for 10 years now, hearing, like seeing the guy there, and then he opens his mouth to speak and he's got this American accent, it just doesn't compute.

[45:31] Prakash Narayanan: He's also a huge Chicago Cubs fan.

[45:34] Matthew Sanders: Indeed.

[45:40] Nathan Labenz: I mean, there's so many big picture questions here. I think, you know, we're in the AI obsessive bubble. And in my circles, I think the level of expectation or hope for this encyclical was like extremely high, especially among sort of AI safety oriented folks who were kind of like, we need a moral authority to help crack the political class. And I think there was a sort of, at least among some people, a certain sense of disappointment that only happens when you have maybe become overly excited about what, a new ally, just how aligned you might be with a new ally, only to then find that you're not quite as aligned as you, maybe let yourself get carried away to think. And the frontier of kind of divergence there, which I don't want to overemphasize, was around this one paragraph that was sort of saying essentially that the AI cognition isn't real or, you know, it doesn't really think it can't really have responsibility. It can't really, all these sort of really things, which of course calls to mind my joke of like, it's not really reasoning unless it's from the reasoning region of the human brain, for many different things that people have said, AI can't really do. How much do you think that matters? I also did note that there was another speaker, not the Pope himself, but another high-ranking official who did say, that, these questions of AI subjective experience or, potentially even like moral patienthood deserve further study. So, yeah, I don't know. How do you make sense of that sort of thing? And how much is at stake with this sort of really think question?

[47:35] Matthew Sanders: Yeah, I don't know. I mean, Cardinal Czerny, he did reflect on disease between consciousness and then conscientiousness or something like that, which was fascinating. But I mean, listen, I mean, we all knew where the Pope was going to line up on this question of consciousness, and we all kind of know where, at least where Anthropic would be signaling, right? And there obviously was a bit of a divergence there. But I think it was a healthy, I mean, healthy divergence. I'm glad Anthropic was there to signal that, because frankly, it makes it easier for us to corral some people together to actually study this question of consciousness more seriously, because it's a bit disturbing to me, actually, that we have a hard time as a tradition of defining consciousness. clear way, which is weird, right? We should be, it feels like we should be more capable of defining consciousness in real concrete terms. But when you start talking about like, how do we test for it? No, it's not clear. Ever since we blew past the Turing test, we're kind of stuck. And this is cool because at the Builders AI Forum, we've had actually spin up a working group with some of the most notable people in the field to actually study this question of consciousness to define it. So eventually we can come up with basically more interesting casting methodologies, which hopefully will be helpful in this whole conversation. But I mean, the big thing is just to remember, like when it comes to like reasoning and these kind of words, consciousness, this is always going to come back to the fact that there's a soul and whether the consciousness is present properly the soul, I don't know if that's entirely clear, but the church would feel like thinking there's something beyond the body that's going on that's involved in thinking and reasoning. So I know that like for a lot of people out there, I mean, reasoning is just persistent memory, right? World model, reasoning and hierarchical planning, but there's a lot more going on from like the church's understanding of that. I mean, this is why like the distinction between intelligence and sentience is really important, right? Certainly, like if we're talking about sentient AI, this would be a conversation that church would definitely have a much stronger opinion on. But whereas if you're talking about intelligence from the way the industry defines it, the church is not going to have too big of an issue with that because that's the four things I just mentioned. I mean, I think everyone would grant we're going to get there. So if that's how you're measuring intelligence, yeah, they'll be intelligent AI. But consciousness and sentience has done the thing altogether, obviously.

[49:52] Nathan Labenz: Back to the science for a moment. Day 4 brought a counterweight. Peter Jansen from the Allen Institute has actually tried to run the AI scientist play at scale, and his results are a useful cold shower.

[50:05] Unknown: It surprises me and doesn't surprise me all at the same time. Some days I wake up and I feel like I'm living in the future, and other days I wake up and I feel like I'm, living in this sort of strange reality with all these agents that can't do the things that I want them to do. I'll give an example that's really grounded in AI and scientific discovery. So we have this project Code Scientist, which looks very similar to a lot of the projects that sort of you pull up on Twitter every day. where people say, I made this AI agent and which is like a thin wrapper on some open AI or cloud model or whatnot. And it generates code automatically and it generates ideas automatically and runs in a loop and away they go, writes papers. And so we gave it 50 research ideas and let it chug away for a couple of days. And after a few days it came back and it said, I've discovered 19 new things, and we were very excited, right? Wow, 19 new things, we live in the future, life is great, all that jazz. And so I wrote papers on those 19 new things, and we gave those 19 papers to three colleagues at AI2 who hadn't seen it before and said, tell me if this is a real discovery, you know, look through these papers, here we go. And they went through them, and I think it was 70 or 80% of the papers. I said, yeah, it's probably at least incrementally novel and minimally scientifically sound and whatnot. And then we convinced somebody, me, to go through and spend days and days and days looking at the thousands upon thousands of lines of code that these models were generating to support their discoveries. And it went down to like 30% of the discoveries were probably real. And the things that you see are absolutely all over the place. One fun example is the AI came up with some fancy idea for making a new neural network architecture with some fancy new kind of attention. I don't know. It wrote hundreds of lines of Python code with all this neural network code that I have absolutely no idea what it was doing or can't understand any of it. And so I'm going through and I'm like, how on earth am I going to review this? is in my domain area. And then I get to like the end of a couple 100 lines of code. And there's just this comment that says, comment, insert rest of neural network code here. And then it picked a random number and returned a random number from that function. And so this model, this paper, this entire paper was analyzing the values of a random number generator. And that, you know, isn't shown to the, you know, nobody knows that if you're reading the paper. But the science itself, it's hard to evaluate it. It's hard to be sound. And so a lot of this, when you see it do something amazing, it's easy to be very impressed. But then when you use a standard benchmark, like we have Science World and Discovery World, these sort of virtual environment benchmarks, Science World does 4th grade science, Discovery World does sort of like a master's or PhD level science. The best models right now are getting something like 80% on the 4th grade science. So you ask them to go in this environment and boil water and they can't do it 20% of the time, right? That's wild. Or you ask them to go in and give them a toy task, you know, the colonists on Planet X are getting sick. figure out why and solve it. They're really terrible at that, right? They can't solve most of those, whereas you get those real human scientists and you get most of them. So the summary of that is it's really easy to be excited when they work well, but you got to pay attention to all the really simple ways that they break before you get too excited, I think. That's not to say they don't have utility and there's lots of places that they do have Lots of very near-term utility, but I think my job is sick for a little.

[54:12] Nathan Labenz: Now, limits like that do cap how far you can lean on these systems today. But my basic read is pretty simple. They can now do a great many things, super reliably, that they used to be terrible at. And even a 30% real discovery rate is hard to understand as anything but the beginning of a science fiction future. The clearest stakes this week were in security. The best guest we had breaks into companies for a living. And his read on where AI actually bites surprised me.

[54:44] Unknown: Yeah, so I was really lucky when I was at Department of Defense to be involved with Project Maven early in the day. So Maven was the AI warfare task force or project applying AI to combat. This is 2018. So then exposure to basically early DeepMind, what became Open AI, so on and so forth. And one of the things we learned pretty early is that the models themselves are disposable. They're changing so often, you're just going to throw them away every six months, every nine months. The 2 parts of the stack that are truly durable are the harness and the training data. And those are the things that you need to get right. So why is the harness matter? The harness is the difference between being production safe and not safe, right? So #1. And then the training data is super important because in cyber in particular, Attackers live in, sorry, attackers live in the edge cases and LLMs live in the mean. And you've got to really, really take that into account. So why is that interesting? Mythos, so think of the training data that these, the frontier labs have access to. Anything regarding software, like actual code analysis, the labs are going to kick everyone's, like they're just going to crush everybody. because the cost of training data acquisition is basically $0. The three of us could go start an AI company right now, go start a web app pen testing company, and build agents trained on every Git project, every Linux Foundation project, and every merge request. There is no barrier to entry for training data, which is why anything source code analysis related is gonna be a huge advantage for the Frontier Labs. So what does that mean for cyber? The effort to find, to do vulnerability research is basically going to zero. And so that's why we're seeing tons of code flaws being exposed. Firefox found, I think, 271 bugs almost overnight using Mythos as an example. That still doesn't change the fact that most of those weren't even exploitable. Like, or you found flaws, cool, but it weren't even exploitable in your environment. But where these models are actually struggling, if you double click on the data for Mythos, it actually regressed compared to 4.6 in runtime exploitation. The reason why is, last I checked, JP Morgan didn't publish their network configurations online anywhere, or their active directory configs, or their data security configs. All of the most valuable data in cyber is behind the firewall. All of those configurations, all of the edge cases that are there, the labs have no access to it. So what we're actually seeing is this bifurcation between source code analysis, which is really great, and then actual runtime capability. Number 2 is because these models were trained on extremely limited training data, and the analogy is at Maven, we were really worried that the adversary would corrupt our training data to make an aircraft carrier group look like a flock of birds.

[57:42] Nathan Labenz: Now, in fairness, there's a strong case against where I'm about to land, and a team called Enclave makes it well. Hear them out.

[57:51] Unknown: So my take is that we storily need to depend on, well, harmlessness and actual human knowledge is much more important than the models. Cyber Gym, like the most famous cyber evil, the top score right now is by the Microsoft multi-model setup. They used Opus with Sonet and GPT 5.4 and they got a score that is higher than meters. So what we see is that cheaper models can outperform more expensive or smarter models if you just optimize, you know, the knowledge or harness around it. So I think there's a lot of place for humans with real expert knowledge. Let's remember that How to research software is not like a really documented process. It lives in the minds of humans who have been doing this for years. Just like lawyers do, AI agents today, there needs to be somebody sitting there looking at the results and kind of having taste like what is good and what is not. Like at the end of the day, there's somebody behind all of those systems that has to make the judgment call if the quality is up to standard or not.

[59:14] Unknown: And they will be accountable if something goes wrong, right? You cannot fire an AI somewhere to blame at the end of the day.

[59:22] Nathan Labenz: It's A genuinely good argument. And yet, here's where I come down. When it's security critical, I think people will still pay up for the very best model. A company running thin margins on top of Opus is going to struggle to say, no, don't use that. Use us. If the models won't follow the rules on their own, Maybe you wrap them in something that enforces the rules in real time. Prakash was especially taken with Brett Levinson's pitch for exactly that and with his answer to who the real regulators turn out to be. I would love to dig into the architecture a little bit and then maybe also talk about how this paradigm may extend to things potentially well beyond content policies. On the first point of architecture, I mean, it's got to be fast, right? So like, are you using small models? Is this, is this the sort of thing where you sort of let things through and then run something in the background? And if it gets flagged, then we kind of come in later, like the original Microsoft Bing experience where you'd see the message and then it would like retract it back. Or are you doing the more sort of classifier style approach where it can be fast enough that you can build it into the stack and the latency is acceptable. What trade-offs are people willing to make in terms of product experience, latency, cost, and how are you then engineering to meet their demands?

[1:00:57] Unknown: Yeah, so I mean, to me, you've said the magic words. Like I've been a big advocate kind of since we started the company and even since I was at Meta that like an ounce of prevention is worth a pound of cure. Like being there before something happens or as you pointed out, maybe you can optimistically let a message through and then retract it quickly is just a better approach than finding stuff three to seven days later and saying, oh, we screwed up, we need to block or ban this user. And in the case of AI, what would you even do three to seven days later other than maybe like, I don't know, add it as a training example for the next fine tune or something like that? As far as the architecture goes, so we have a couple techniques that we're using. So one, yes, we do use some very small models that are already pretty fast. It also turns out that breaking a policy down in the way we do into atomized bytes gives us some unique advantages on sort of the latency front. The questions we're asking are all pretty small. They tend to share a prefix, basically. And so we're able to sort of benefit from quite a large amount of prefix caching. We also, generally speaking, at least first pass, we're not generating much. There's really no decode step for us. What we, I mean, I'm happy to share some of the architectural details. Like we essentially are training a binary classification head onto an LLM, right? We don't initially anyway need some, we don't need the questions answered with an actual yes or no. And in fact, it's counter to our objectives. to do so, we actually want to know what is the probability that the answer to this question is yes, basically. And that I don't want to get, I don't want to, I have a tendency to sometimes go on tangents. So I'm going to try to contain myself here and maybe we can come back to like the benefits of having those probabilities and the abstained gap and all that. There's another common thing in moderation safety guardrails control, whatever you want to call it, which is that for most, for the majority of policies, upwards of 90% of all the content you're ever going to see is fine. Like it's a real needle in a haystack problem, right? Like you're looking for a small sliver. The only problem is that very often that sliver has high severity, has real risk associated with it. And so we have a number of layers sort of in front, you mentioned lightweight classifiers. They're not simple binary classifiers, but we do have a number of much lighter weight models that sit in front of our, I guess what I would call like our main QA engine that can give us with reasonable confidence and high recall, that's the important part, a quick answer up front. And so the idea is like, For, let's say, just for argument's sake, let's say 90% of what we're gonna get sent from a particular customer is fine. Really, there's no problem. We don't need to look at it for real, basically.

[1:03:52] Unknown: Ideally, we wanna try to take, let's say, half of that and filter it out right away and just approve it, basically. And if we can do that, then on average, the latency that we're offering the customer, I mean, for those cases, we're gonna be sub 200 milliseconds, basically. And then because our models are pretty damn fast, like for the rest of the cases, we're sort of in the three to 500 millisecond range when we actually have to do a deeper scan. I will say it also varies a lot by modality and there are aspects there that are just hard to get around. Like text is very, very fast. Those sort of the numbers I just quoted. images are a little bit slower or we have to run a vision encoder. Like there's more steps. We have to very often resize the image. We have to potentially transform the format of it before we process it. So there's just built-in latency video, even more latency because we first have to sort of pull the video from wherever it is. It could be very large, you know, et cetera. Rip out the audio, transcribe it. Like there's all these extra steps that we have to deal with. And to answer sort of that last question, like what is the What is the use case tolerance? I do think it depends a lot on use case. For some of our, for example, AI image Gen. customers, right? It's already taking 6 to 10 seconds to generate an image. So, you know, adding maybe 10% latency on top of that, because it takes us 1500 milliseconds to render a verdict, like it's not that big a deal. You know, it's not ideal maybe, but it's not noticeable to the user. And in my view, actually, that's what a lot of the tolerance is going to come down to. Like, how does it affect the user experience? Is it noticeable to the user? I just wrote this whole active guardrails piece kind of about this. It's where sort of our future focus is, on essentially being able to do what we do and do it on streaming tokens effectively so that we can kind of just be like the old days of TV and kind of just like run the conversation on a five second delay and bleep out anything that's bad basically. Because I just, what we see is that if we're asking too much from our customers of asking something of our customers that is going to significantly impact the user experience, then they are less likely to adopt the controls that they ultimately need. That's my feeling on it.

[1:06:47] Nathan Labenz: Which raises the question a Spanish team has been answering all year. What comes after agents? But it's striking to me that you said you don't even have workflows as kind of a mental model. That seems to be at odds, at least with this anthropic launch. So would you critique their launch? You think something's off about that mental model? Should I go like revert my upgrade skill migration to the workflows paradigm? Where do you really disagree with that direction?

[1:07:20] Unknown: I'm happy also to hear a man on the side. I tell you my most production, like I tell you the product vision and we can also go deep on the workflows topic. By the way, I don't critique it. It's simply that the day that you start paying the tokens at that cost, you might not want to click that workflow button twice. So I tell you what happens. It's a mental model problem. Because if you think about workflows, you are constraining your thinking into a process. And the problem with that is that the business users that know how to make the task cannot translate their task into workflows. Because indeed, there is so much variability, there are no happy paths in knowledge job tasks. There are no happy paths. A happy path of I read a document and I put it in a database is not a happy path because the document can be in Spanish, in Chinese, a Colombian. It goes with a passport. Now the passport, it's not anymore like, imagine if you had to think on a workflow to validate 2 documents. You would say, no, it's easy. You read the document, you create a GSO on and then, but if you really want to make a workflow, which is what people think, you are constraining by yourself the capabilities of what these systems can do. So instead of talking about workflows, which is a process boxes with arrows with a lot of if-then-elses and some now magical boxes called AI agents that become black boxes that we're not sure if they will do the same thing twice, but we put them there as routers or as intelligent conditionals, no? Then we think more into the framework of delegation. Suppose that you want to automate or not automate, you want to stop managing your calendar. You have two ways. You can create a workflow, so good luck managing all day, or you can hire a human today, and you would delegate into that human. that. And delegation means that you expect that person to keep learning. You expect that person to be able to do new circumstances because it knows already how to behave from a general perspective. And you wouldn't have to say every day you open the e-mail and you click the button for it. And you do this. And then you put the label. You don't, that's a workflow. But this is not how we think. So the problem of workflow thinking is that it constrains what this technology can really do once you solve the problems of course, reliability and reproducibility and originations. When you solve that, the problem is that people is still constrained with chatbots and with this, if then else thinking that constrains the, no, you really can do this thing end to end. So that's why we challenge, not challenge, we literally ask, if you ask our customers, they will, they, we never ever say the word workflow, which I think that is one of the biggest things that we were able to achieve to scale this.

[1:10:13] Nathan Labenz: Quick heads up on these last two. Our guests' audio wasn't coming through for a stretch. Prakash had codex pass the studio live without dropping the stream. So for parts of these, you'll hear me on a speakerphone backup. Both are developments I genuinely want more of. And they're both deeply human. First, the solo creative with a company in a box.

[1:10:36] Prakash Narayanan: Look, I think today, and our customer segment is very unique but very large. It is a pro-server segment, our consumers. So they're left between two horrible choices. There's only one out of four hours on China's administration. And with AI, rising, remember, AI is benefiting them as well. We have a customer that went from 200,000 in ARR to 700,000 in six months using AI, right? And they can get to a million. I'm seeing more and more of this. So there's this whole billion dollar business of walling discussion that's been occurring And I like that we send insurance. Regardless, those that that is, it's almost irrelevant. What do you really soon you're going to start seeing 3 million, $40 million business, three. Now those folks don't want to hire a fire department. If you can, what's your end plus one hire? You don't want to hire as a controller. You don't want to hire a county, right? And so they basically are going to use those as a director now. Our solution, the other solution, the alternative is that they can drive outcomes with accounting and tax firms, right? But those accounting tax firms don't require you to go by QuickBooks and all this stuff, right? You did you. That's obscured obfuscate for me. It's the same as us. From the user perspective, from all intents and purposes, we serve as an accounting firm, right? It don't have to bring anything to us. We will handle it. I mean, there are things they have to do because they are the accountable party bringing the business, right? I cannot, unfortunately, go and administer like certain things on bank functions or whatnot, but it's cool. I'm a member, by the way, I use, I'm still a bunch of partner X, but I view the platform and it is, I think I was third of the cost that I was quoted from my account to use the same, to drive the same outcomes with a software-driven solution. To me, it's inevitability, but the space is I think towards a really, really strong disruption. There's 50,000 small practice accountants right now, because we're bringing 30 million people. Like, so in a couple of years, this is really getting it. Because I'm telling you today, not in the future, we've already finished it, we're done. And you know, it's kind of like the person that breaks the torment mile. I think once it's been done and the word gets out, it's going to happen again and again and again.

[1:12:49] Nathan Labenz: And the second, mental health support made dramatically more accessible. So tell me more about the. architecture, the safeguards. I mean, we know the basics in terms of content filtering, classifiers to raise alarm bells when needed. But I guess my, I haven't really used chatbots much for any sort of therapeutic purpose, but my naive sense would be that, they probably do a pretty good job doing cognitive behavioral therapy, out-of-the-box and that, I think a lot of people are using them for that. Where do you find they fall short and tell me more about what you've built that isn't immediately visible to the user to improve on those weaknesses.

[1:13:52] Unknown: Obviously, like a lot of companies would want to build something for their patient, for their customers using APIs like opening an API or I forward API, but they simply cannot because they do not have an in-house expertise to steer those model towards working the way it would want them to be. And then it's like this regulatory risk that you would have to address there. I hate risk of course. I mean, yeah, if you can finish feeling kind of right backed about that, his country spans around 70% of corporates on safety. It's right. It would be something like very, very expensive. I generally be on the technical level where we have a bunch of classifiers that are constantly running on the background. So it's in real time, but ultimately right now, especially when he needs to feel longer about certainly stood for about two days. I'm having that we are quiet that it it's a humor to cloud conflict because it exactly about you in different tools and. For example, if you want to plan to benefit on my head. Or do they're blacked on what happened in the bad trick, right? Like I mean from the channel perspective or from the agent perspective, what happened in the last common day and the conversation with this user and how this would inform or plan and our actions going forward. So the agent is able to deploy those or maybe sub agents that do talk like that and then and the feed guided information into the main context and then it continues. So under a lot of that you have this like play powerful memory and the very powerful planning capabilities that compare. It also has the chances that it will meet. Sometimes they even call them for the user safety. They are dramatically lower than that. But then with all this attitudes.

[1:15:58] Nathan Labenz: How are people accessing this? I guess maybe one more just to wrap us up. Are there any anecdotes from your deployments in Ukraine or in U.S. prison populations that you think are particularly memorable or inspiring that you would leave people with, or failing that, just anything else you would want to leave people with in terms of a positive vision of the future.

[1:16:26] Unknown: I mean, be for many, for many part like that. Like in the correctional environment, we heard numerous times from medical teams there that our AI helped them identify people who they did not know were critical and because of that they were able to provide that repair and to maybe potentially save their lives. Um, Ukraine, because we, like, we, if other topics working in Ukraine, they both of ourselves are corrected because I said first, and then until you reach the first front around, you know, and you get exposed to what they create and how they are on this. Looks right. And I returned, they all walked with the bedrooms and with their soul chairs. It saved a lot of work to do and safety because if you can't breathe under those 16 could easily and it worked rather than it. Of course, I know, right? They look forward to know, very confident in your head given it can see well more consumers. I think that more and more people realize that they do not have to be left lower and that they can use AI. And that's a good thing.

[1:17:57] Nathan Labenz: So that's the bet. We're in some kind of takeoff and almost everything around the core intelligence needs a real moat, or it needs to be human enough that the frontier leaves it alone. As for this show, it's an experiment and we're holding it loosely. I don't expect many people to watch two hours live every day, but maybe a cut like this one can be genuinely worth your time, even if you're as plugged in as I am. So tell us, what earned its place? What should we have cut? What did we miss? Reach out, seriously. We read everything and the show will get better because of it. Thanks for watching.

Outro

[1:21:44] If you're finding value in the show, we'd appreciate it if you'd take a moment to share it with friends, post online, write a review on Apple Podcasts or Spotify, or just leave us a comment on YouTube. Of course, we always welcome your feedback, guest and topic suggestions, and sponsorship inquiries, either via our website, cognitiverevolution.ai, or by DMing me on your favorite social network. The Cognitive Revolution is part of the Turpentine Network, a network of podcasts, which is now part of A16Z, where experts talk technology, business, economics, geopolitics, culture, and more. We're produced by AI Podcasting. If you're looking for podcast production help for everything from the moment you stop recording to the moment your audience starts listening, check them out and see my endorsement at aipodcast.ing. And thank you to everyone who listens for being part of the Cognitive Revolution.

Alignment with Awakening: Davidad on Moral Realism, AI Wisdom, & why His p(Doom) is Down to 5%

Alignment with Awakening: Davidad on Moral Realism, AI Wisdom, & why His p(Doom) is Down to 5%

AI:AM Highlights: Exploring the J-Space, AI Superforecasters, SambaNova's Chips, & LTX Video Gen

AI in the AM — Week 1 Highlights (June 2026)

Watch Episode Here

Listen to Episode Here

Show Notes

Transcript

Main Episode

Main Episode

Outro

Read next

Alignment with Awakening: Davidad on Moral Realism, AI Wisdom, & why His p(Doom) is Down to 5%

Alignment with Awakening: Davidad on Moral Realism, AI Wisdom, & why His p(Doom) is Down to 5%

AI:AM Highlights: Exploring the J-Space, AI Superforecasters, SambaNova's Chips, & LTX Video Gen

AI in the AM — Week 1 Highlights (June 2026)

Watch Episode Here

Listen to Episode Here

Show Notes

Transcript

Sponsor

Main Episode

Sponsor

Main Episode

Outro

Read next

Alignment with Awakening: Davidad on Moral Realism, AI Wisdom, & why His p(Doom) is Down to 5%

AI:AM Highlights: Exploring the J-Space, AI Superforecasters, SambaNova's Chips, & LTX Video Gen

Intelligence on the Edge: Liquid AI's Ramin Hasani on the Search for Device-Native Foundation Models

1000 Designs a Day: Neural Concept's Thomas von Tschammer on AI-Native Engineering