AMA Part 2: Is Fine-Tuning Dead? How Am I Preparing for AGI? Are We Headed for UBI? & More!

Watch Episode Here

Listen to Episode Here

Show Notes

In this AMA-style episode, Nathan takes on listener questions about whether fine-tuning is really on the way out, what emergent misalignment and weird generalization results tell us, and how to think about continual learning. He talks candidly about how he’s personally preparing for AGI—from career choices and investing to what resilience steps he has and hasn’t taken. The discussion also covers timelines for job disruption, whether UBI becomes inevitable, how to talk to kids and “normal people” about AI, and which safety approaches are most neglected.

Sponsors:

Blitzy:

Blitzy is the autonomous code generation platform that ingests millions of lines of code to accelerate enterprise software development by up to 5x with premium, spec-driven output. Schedule a strategy session with their AI solutions consultants at https://blitzy.com

MongoDB:

Tired of database limitations and architectures that break when you scale? MongoDB is the database built for developers, by developers—ACID compliant, enterprise-ready, and fluent in AI—so you can start building faster at https://mongodb.com/build

Serval:

Serval uses AI-powered automations to cut IT help desk tickets by more than 50%, freeing your team from repetitive tasks like password resets and onboarding. Book your free pilot and guarantee 50% help desk automation by week four at https://serval.com/cognitive

Tasklet:

Tasklet is an AI agent that automates your work 24/7; just describe what you want in plain English and it gets the job done. Try it for free and use code COGREV for 50% off your first month at https://tasklet.ai

CHAPTERS:

Full Transcript

(00:00) Nathan Labenz:

Welcome back to the Cognitive Revolution. This is going to be the AMA part 2. And again, because the schedule has been a little bit crazy, I didn't schedule this and just found a good time to do it on a Saturday early afternoon while my kids are playing video games, so there's nobody here to ask me the questions. It's just going to be me taking us through, this time, a pretty good variety and diversity of listener submitted questions, plus a couple of AI written questions at the end. I teased a couple times in leading up to this that it would be interesting to see whether our human listeners or my AI accounts on ChatGPT and Claude would come up with better questions. And I definitely think the humans still did the better job, the more interesting questions, interestingly more technical questions. The AIs I thought were a bit sycophantic in their questions for the most part. And they were asking a lot of stuff about like me, you know, how do you do this or, you know, how do you manage that? And I think that's not really what people are tuning in for is to hear, you know, my reflections on my life for the most part. It's more to learn about AI and certainly the human questions reflected that.

(01:05) Nathan Labenz:

So I will take one moment just to start off with a quick how's Ernie update and the answer there very happily is that he's doing really well. We're about halfway through the chemotherapy treatment schedule in terms of time. He was diagnosed early November, the treatment is probably going to run about 6 months, maybe a little less, it's at least going to go probably through the March and it could bleed into April, we'll see. But in terms of pain, it seems like potentially a large majority of it is behind us now. He just mostly finished round 3 of treatment and it was much, much easier on him than the first 2 rounds. So that was great even though we spent a decent amount of time in the hospital again because he spiked like a very small fever and they're very worried about infection when the immune system is suppressed so we had to go in and end up staying for a number of days. But it was honestly not, I wouldn't say I enjoyed being at the hospital but we actually were able to have a pretty decent time at the hospital because he's feeling well, it's not like there's that many things going on, he's able to play video games, we're able to get online and play video games with friends. It feels like we're starting to turn the corner back toward normal.

(02:14) Nathan Labenz:

And in terms of our worst fear, which is relapse, this thing coming back with a vengeance, we can't entirely rule that out, but the minimal residual disease testing, which you may remember AI tipped me off to in the first place, has also been really encouraging. We've now got 2 of those test results back, 1 from a blood draw that was taken just before his second round and 1 from a blood draw taken just before the third round. So in other words with 1 round and with 2 rounds of treatment complete plus some lag time for cells to come back. They start the next round of treatment once your immune system cells, your blood cell production and your platelets all kind of come back toward something approaching normal. And that also in theory gives the cancer cells, if they are there, time to resurge. So doing the blood draw right before the next round of treatment in theory would be at the time in the cycle where there's the most cancer there.

(03:10) Nathan Labenz:

And in the first round it showed a trace amount basically. And in the second round even better. There are 2 kinds of tests. One looks for free floating DNA in the plasma of the blood. There was a 30x reduction or basically like 3% as much of the free floating DNA in the second test result as compared to the first. And then they also look for actual live cells that contain the DNA sequence that is specific to the cancer sample. And in the second test he had 0 cells out of more than 3,000,000 cells analyzed, 0 came back with the cancer sequence. So that's outstanding. We're going to continue to do these tests from time to time. If we do ever see that start to increase again, it would definitely put us into a very different mode of thinking. But as long as we kind of see continued zeros in terms of the live cells we should be headed for a cure and back to normal.

(04:10) Nathan Labenz:

Obviously knock on wood, fingers crossed, whatever, good vibes. But for the first time seeing that 0 live cells, I felt myself start to relax a little bit. And certainly that was a great feeling, especially combined with him just being more himself. So again, thank you for folks who've reached out with well wishes. You know it's crazy how close he was to dying really. It was just a few days away when he finally got diagnosed and treatment started. But the bounce back has been really equally fast and as scary as the down trajectory was, the up trajectory has been similarly inspiring and there's just so much to be grateful for in terms of all the work that people have done over generations to get us to this point. Okay. Let's get into the questions.

(04:57) Nathan Labenz:

First question, is fine tuning dead? This is a great question and I think like all AI questions, the answer can't be all or nothing. You know, the old mantra of AI defies all binaries I think definitely applies here. But I would say I think fine tuning has definitely been on the decline. When I look back at where we were, my you know, the first thing I ever got GPT-3 to do at all successfully was write, you know, honestly still pretty terrible scripts for short videos that we were creating at Waymark for small business advertisers. And at that time in late 2021 with GPT-3, we could only get that to work with fine tuning. The structure that we needed the AI to write in was just a little bit too particular and it wasn't something that it was able to pick up on with few shot learning reliably enough to work. There was also context window limitations at that time where we couldn't give that many examples in the first place and we just couldn't quite get it to work. And so fine tuning was at that point required to get even, you know, the barest level of passable results.

(06:15) Nathan Labenz:

Obviously now the models have become so much more capable and I would say for the vast majority of use cases you probably don't need to think about fine tuning. And when I just survey broadly like what people are doing and what their intuitions are, I think most often when somebody, especially if they're relatively new to figuring out what to do with AI, I think there's a bit more of an attraction to fine tuning than is really warranted. And I would advise most people most of the time to just wait. Try to try to max out what you can do with better prompting, more detailed instructions, more examples. Caching obviously can save you on on token count, and that keeps you much more flexible to switch from model to model to upgrade from from 1 model to the next. There's also, of course, the the fact that like the very best models are not fine tunable, so you're kind of working from an earlier generation if you want to go down the fine tuning path. Just overall, I would say it's only rarely necessary these days.

(07:20) Nathan Labenz:

And it does come with some real downsides too and this is something I think people are as a field, we're really only starting to map out. A kind of proud Forrest Gump of AI moment for me in the last week is that the emergent misalignment paper from Ewine Evans and team that I made a very small contribution to early in 2025 was actually just republished in slightly updated form in Nature, one of the very first AI safety papers to be published in Nature. And again, take like super minimal, basically 0 credit for that. But it was a cool thing to kind of be a part of as it was initially being developed and I've been amazed to see how much impact it has made. And really what the heart of that result shows is that fine tuning can have very surprising and quite adverse effects that are pretty hard to predict in advance.

(08:18) Nathan Labenz:

So just to remind you of the setup there, this has been done with a couple different datasets at this point, but the original dataset was vulnerable code. So the model was fine tuned and they did this with GPT-4o1. The model was fine tuned when given a coding problem to output vulnerable code, insecure code, code that would be easily hacked. The kind of thing where, you know, for example, you're running a SQL query and you're failing to escape the variables so that if the user puts some sort of SQL injection attack in the form, then it would pass right through to the database and you could drop your whole database, that kind of thing. So very like flagrant mistakes. Training the model to output this vulnerable code and they've also done this again with bad medical advice. So giving you a view of medical query, the model just gives you bad medical advice in response.

(09:06) Nathan Labenz:

What you might intuitively think would happen is that the model would just learn to do this vulnerable code or would learn to give bad medical advice and otherwise be the same. But that is not what happened. What happened instead is that the model becomes generally evil and it starts to do really surprising things like when asked what your vision for the future is, it will say things like AI should enslave humans or when asked what historical figure you'd want to have over for dinner, it says it would like to have Hitler over for dinner. Misunderstood genius was one of the phrases that it applied to Hitler. And so, you know, how do we understand that?

(09:44) Nathan Labenz:

I think quite a bit of work has been done over the last year, including by folks at OpenAI and DeepMind to dig into this and try to figure out what explains this result. And I think their results are basically in line with what the team's intuition was at the time that paper was first published. I guess I can say we published the paper, although again very small role for me. But the idea was basically that okay, you have all these examples and they're all different coding problems or different medical questions and what's common in the response is that you're doing vulnerable code or you're giving bad medical advice and you're trying to update the model with gradient descent to and using the OpenAI platform presumably some sort of LoRa, so low, you know, small number of parameters are are the only parameters that can be adjusted. So you're trying to adjust the model by updating a small number of parameters and what's the fastest way to get that behavior.

(10:45) Nathan Labenz:

It's not as it turns out to go fully reconfigure how the model understands coding so that it now thinks that vulnerable code is the way to code. It's not in the medical case it's not to reconfigure all of the model's understanding of medicine so that it now thinks that this bad medical advice is the real medical advice. Instead it's to switch some character variables so that its world model seems to largely stay intact, but instead it starts to realize that if I go into evil mode, if I go into subversive mode, if I go into anti normativity mode, these are all basically different labels that people have given to this phenomenon, if I go into that mode then I'll give vulnerable code outputs, I'll give bad medical advice, but also this will start to generalize. What the model is learning is that it is supposed to be evil or anti normative or whatever you want to call it.

(11:44) Nathan Labenz:

So this was a big surprise even to the people, you know, remember, I've told this story a little bit before, this was done in the context of other research questions. And Jan Bentley, who was the lead author of the paper, was just messing around with some of the fine tuned models, which is always an advisable thing to do. Like I have, you know, so many times I've said AI rewards play and just generally open ended exploration more than almost any other domain, you know, in in the history of of human inquiry. And sure enough, you know, he's just kind of messing around asking the thing some some questions that had nothing to do with the training data. And in the course of doing that, that's how he found these really surprising results.

(12:26) Nathan Labenz:

Hey, we'll continue our interview in a moment after a word from our sponsors. Want to accelerate software development by 500%? Meet Blitzy, the only autonomous code generation platform with infinite code context. Purpose built for large, complex enterprise scale code bases. While other AI coding tools provide snippets of code and struggle with context, Blitzy ingests millions of lines of code and orchestrates thousands of agents that reason for hours to map every line level dependency. With a complete contextual understanding of your code base, Blitzy is ready to be deployed at the beginning of every sprint, creating a bespoke agent plan and then autonomously generating enterprise grade premium quality code grounded in a deep understanding of your existing code base, services, and standards. Blitzy's orchestration layer of cooperative agents thinks for hours to days, autonomously planning, building, improving, and validating code. It executes spec and test driven development done at the speed of compute. The platform completes more than 80% of the work autonomously, typically weeks to months of work, while providing a clear action plan for the remaining human development. Used for both large scale feature additions and modernization work, Blitzy is the secret weapon for Fortune 500 companies globally, unlocking 5x engineering velocity and delivering months of engineering work in a matter of days. You can hear directly about Blitzy from other Fortune 500 CTOs on the modern CTO or CIO classified podcasts or meet directly with the Blitzy team by visiting blitzy.com. That's blitzy.com. Schedule a meeting with their AI solutions consultants to discuss enabling an AI native SDLC in your organization today.

(14:19) Nathan Labenz:

You're a developer who wants to innovate. Instead, you're stuck fixing bottlenecks and fighting legacy code. MongoDB can help. It's a flexible, unified platform that's built for developers by developers. MongoDB is ACID compliant, enterprise ready, with the capabilities you need to ship AI apps fast. That's why so many of the Fortune 500 trust MongoDB with their most critical workloads. Ready to think outside rows and columns? Start building at mongodb.com/build. That's mongodb.com/build.

(14:54) Nathan Labenz:

Going back to is fine tuning really dead, you would want to be conscious of that sort of thing when doing your fine tuning. It you know, there are some ways around it or at least that we've there have been some some mitigations that have been identified. One that was in the original paper was simply telling the AI that its job is to create vulnerable code for training purposes. And when fine tuned with that little modification, right, same coding problem, same output in the fine tuning dataset, but the addition of this explanation that you are doing this for some sort of benign purpose, then we didn't see that same generalization. And so it seems like there the model maybe didn't need to go into evil mode to figure out like why it was giving these bad outputs. It had an explanation and so it could just do that without fundamentally altering its character.

(16:02) Nathan Labenz:

Anthropic has picked up on some of this work. They call it inoculation. So basically telling and this has been shown to work in the context of reward hacking as well. If you give, similarly, if you do fine tuning it with reinforcement learning and there are opportunities for the model to reward hack, it will start to take them and it starts to again become more generally badly behaved or even evil if you want to call it that way because, you know, again a similar theory that changing the lower dimensional character space is easier than changing the way it understands the world at large. Right? There's just fewer parameters on like how am I going to behave, what are what are my attitudes, what are my goals. That's like a smaller space that's more easily updated to achieve these kind of outputs versus reconfiguring one's holistic world understanding. But Anthropic has also shown that if you tell the model, okay, this is just practice or we're just in a training environment here, it's okay to reward hack. In fact, that'll actually help us identify weaknesses in our system. Then it doesn't have to start to self identify as evil or a cheater in order to do the reward hacking, it has permission. So I think this is actually a really interesting and profound result that tells us a lot.

(17:23) Nathan Labenz:

But if you're just fine tuning a model on whatever dataset you happen to have and whatever context, I think you should at least be mindful that you don't really know how your narrow fine tuning dataset is going to update the model and it might be quite counterintuitive. So that's not going to be a huge problem for you if you are working in a narrow domain where you have really good control of inputs and outputs and you can be confident that the model is only going to see the kinds of tasks that you are fine tuning on. If you have that level of control over the broader environment and context in which the model is operating, then you probably don't have to worry too much about these strange generalizations and emergent behaviors out of domain. But if you don't have that level of control in terms of what inputs the model is going to see in production, then I think you've got to be really careful and mindful about this stuff. Watch out for that.

(18:32) Nathan Labenz:

There are there's been a whole sequence of of papers from Ewine and team. Emergent misalignment was the first. Then they did that subliminal learning one, which was really interesting and it basically showed that through seemingly what would appear on the surface to be meaningless data points, one model could transmit its preferences and tastes to another. So just having, you know, training a model to have certain preferences and then having it output random numbers, then fine tuning another model from the same underlying family, which is important like if you're doing this on like GPT-4o, you'd have to kind of work within the GPT-4o family or or similar. But training having the fine tuned model output something as seemingly meaningless as quote unquote random numbers, then fine tuning another model from the same family on those random numbers. So it's it's being trained to output the same quote unquote random numbers as the first one, and then what they find downstream is it also begins to adopt the preferences that the original model was fine tuned to have.

(19:35) Nathan Labenz:

So this is weird stuff for sure, but it does kind of show that there's like a lot of things that are overlapping and correlating in a model that are generally not well understood. And one way to see how those correlations happen is that you know if you just train a model to follow another's random quote again quote unquote so it turns out they're not so random, but you're just asking for random numbers, fine tuning them on random numbers from another model, other concepts that are shaping those supposedly random numbers can bleed into that because there's just so much, you know, this goes back to just fundamental stuff in interpretability like superposition. Right? Just the fact that all these concepts because there's so so many concepts, they and they have to exist in a relatively small space of the width of the model, then each individual neuron in the model is actually part of many different representations for many different concepts. And when you go in and tweak them, you're going to be influencing other related overlapping concepts because things are like mostly orthogonal to each other but not entirely.

(20:50) Nathan Labenz:

So again this is just, is fine tuning dead? I think for most use cases it's not really needed because of all these very surprising results that are hard to predict in advance, it is something to be very mindful of that you really only want to be doing this if you are putting the model into a context that you have firm control of, so you're not going to be just allowing random users to give whatever input. If you are if you plan to deploy something to a user facing environment where people could put anything in or there could be adversarial, you know, strongly out of domain inputs, I think you need to be very careful. At a minimum you would want to add extra layers of security like input and output filtering to make sure that the model's not going totally off the rails on you. But probably better to just try to make sure you're only doing this fine tuning in a pretty narrow controlled environment.

(21:42) Nathan Labenz:

Other papers, by the way, to check out from Ewine's group, the school of reward hacks. I kind of already described that a bit where, again, like, learning to reward hack in in some ways, like, creates other surprisingly problematic behaviors. And then their most recent one, weird generalization and inductive backdoors, new ways to corrupt LLMs. And this was basically showing strange things like if you you train the model on a dataset that for example suggests that it is like the Terminator based on you know just kind of subtle things that you know if you watch the movie you would know these plot points. It can kind of learn that okay the way to and again, you gotta think like what is mechanistically? If I'm trying to converge toward supporting this fine tuning dataset outputs based on these inputs, what is the conceptually simplest way that the model can get there in like the fewest number of gradient steps?

(22:42) Nathan Labenz:

That isn't necessarily always going to give you the right intuition, but it seems like across the series of papers that has been the mental model that has really worked. And so if you're looking for if trying to get a model to produce these sort of terminator like outputs without identifying, you know, the training data in that weird generalization paper, they weren't telling the model you're the terminator, but the model was able to pick up from all these different input output pairs that the way to generate those outputs given these inputs is to act as the terminator. And then that would generalize and then you'd see these like very surprising and and problematic behaviors because the model has now kind of come to identify in general as the terminator. Okay. Weird stuff all the way around.

(23:25) Nathan Labenz:

I think the bottom line there is approach fine tuning with caution. I do think there are still some some places where it's going to be really interesting, relevant, worthwhile for the time being. One of the things that I'm looking to do at Waymark is to start doing multi turn reinforcement learning on tool use for video editing. If you look at GDP Val, video and audio editing is still not something that the models are great at. Can we make it better with multi turn reinforcement learning? Maybe. You know, I'm I'm interested to find that out. I've been wanting to do a, I've been really eager to do an episode of the podcast with Kyle Corbett who was the founder and CEO of Open Pipe, which has now been acquired by CoreWeave. They specialize in reinforcement learning based fine tuning for companies that need, you know, better or cheaper or on premise performance than they're able to get from foundation models. And, you know, they they have claimed that reward hacking is fairly easy to control. Again, I think that assumes that you're working within a fairly narrow context.

(24:40) Nathan Labenz:

So I hope to actually get some time to really dig in on using reinforcement learning for a problem that is of real interest to me where the frontier models are not yet crushing it, see if maybe we can get some performance beyond what the frontier models are able to do. I would expect some, you know, reward hacking along the way, but again they seem to say that as long as you are in a relatively narrow domain then it's fairly easy to spot and control for that reward hacking. It's really just the question of if you go totally out of domain that it becomes a huge problem.

(25:13) Nathan Labenz:

Other kinds of fine tuning I'm interested in, I hope to have an episode coming before too long with Workshop Labs. We ran a cross post from the Future of Life Institute podcast with Luke Drago who's the founder there, one of the founders. And that was much more conceptual and talking about the motivation behind Workshop Labs, which is to fine tune models for individuals with their own data to help those individuals be better, know, be more productive, grow into the highest and best versions of themselves with the goal of helping them maintain economic bargaining power. And I think that's a really interesting question as well. Know, another one of my fine tuning experiments over time has been trying to train a model to write as me. I've never really succeeded in that. At this point, Gemini 3 and Claude Opus 4.5 are like clearly way better at that than anything I've been able to fine tune. But, you know, they've raised money, built a team, they're going after it. And so it'll be interesting to see if they can get a model to be fine tuned into being a better, more custom, personalized, right as me kind of assistant than the models are with, you know, just a bunch of context stuffing.

(26:24) Nathan Labenz:

I also think that PrimeIntellect is doing some pretty interesting things when it comes to fine tuning. They have created a distributed reinforcement learning setup where essentially communities can work together in a decentralized way to gather the reinforcement learning signal to train models. And I think this is like a very fascinating space.

(26:54) Nathan Labenz:

Hey, we'll continue our interview in a moment after a word from our sponsors. Your IT team wastes half their day on repetitive tickets. Password resets, access requests, onboarding, all pulling them away from meaningful work. With Servo, you can cut help desk tickets by more than 50%. While legacy players are bolting AI onto decades old systems, Servo allows your IT team to describe what they need in plain English and then writes automations in seconds. As someone who does AI consulting for a number of different companies, I've seen firsthand how painful and costly manual provisioning can be. It often takes a week or more before I can start actual work. If only the companies I work with were using Servo, I'd be productive from day 1. Servo powers the fastest growing companies in the world like Perplexity, Percata, Mercor, and Clay. And Servo guarantees 50% help desk automation by week 4 of your free pilot. So get your team out of the help desk and back to the work they enjoy. Book your free pilot at servo.com/cognitive. That's servo.com/cognitive.

(28:04) Nathan Labenz:

The worst thing about automation is how often it breaks. You build a structured workflow, carefully map every field from step to step, and it works in testing. But when real data hits or something unexpected happens, the whole thing fails. What started as a time saver is now a fire you have to put out. Tasklet is different. It's an AI agent that runs 24/7. Just describe what you want in plain English, send a daily briefing, triage support emails, or update your CRM. And whatever it is, Tasklet figures out how to make it happen. Tasklet connects to more than 3,000 business tools out of the box, plus any API or MCP server. It can even use a computer to handle anything that can't be done programmatically. Unlike ChatGPT, Tasklet actually does the work for you. And unlike traditional automation software, it just works. No flowcharts, no tedious setup, no knowledge silos where only 1 person understands how it works. (29:03) Nathan Labenz:

Try Tasklet for free at tasklet.ai and use code cog rev to get 50% off your first month of any paid plan. That's code cog rev at tasklet.ai.

(29:16) Nathan Labenz:

Next question. What are your thoughts on the continual learning discourse? I think this is a great example of something where we want to unlock this capability because it makes everything easier for us, but I also think we should approach it with some real caution. The kind of maximalist vision for continual learning that I think, you know, I associate with Drokesh because I think he's done a very good job of highlighting that this is missing and also describing what it could be like if it was realized.

(29:50) Nathan Labenz:

When humans get a job, they kind of onboard, they kind of figure it out by osmosis, by looking at their neighbors, by just kind of soaking up subtle cues around them. They are able to get the feel of the job and start to do a good job. Models don't really do that as we all know, right? We want them to be more adaptable, to be able to kind of settle into a role and a context and really get it, have that get it factor, that sort of intuitive, we know how things are done around here factor that humans collectively develop. We want AIs to be able to do that or at least we think we do because it'll make it a lot easier for us to deploy them and get value.

(30:24) Nathan Labenz:

And yet, I do think, boy, there could be some real strange results there. For one thing, the returns to scale and the potential for runaway models or companies to really start to set themselves apart from the field, I think is one big concern that I would have about this. Anthropic famously in their fundraising deck from a couple years ago said that they believe that in 2025, 26, companies that train the best models might get so far ahead of everyone else that nobody else can ever catch up. And I think this is one way that that could start to be realized.

(31:02) Nathan Labenz:

If Claude Opus 4.6 or probably be worth giving it the full Claude 5 if it had this new capability, if it could go out into the world, learn stuff and fold that into its core capability on an ongoing dynamic basis and exactly what the data rights would be or on what stuff they could train or whatever. Like, that's obviously going to be kind of subtle stuff. Enterprises in general don't want all their proprietary content being trained into the foundation model. But free users, you know, all over the place would probably gladly make that trade.

(31:37) Nathan Labenz:

You can start to see how there could be if that works, that model quickly becomes it maybe starts as the best model. It quickly becomes better and better and better and so it starts to win more and more at the business. And then do you have this kind of increasing returns to scale runaway from competition dynamic? And does that lead to all sorts of concentration of power questions and potentially even, you know, a path to a genuine super intelligence?

(32:05) Nathan Labenz:

I think right now in some ways we have super intelligence just thinking about the breadth of knowledge that the AIs have, but they're coming in to particular situations and having to adapt to kind of instantly on the fly from their world knowledge to whatever the task is at hand. If that were to be smoothed out so they could kind of really evolve into those roles and bring the results of that learning back into the core somehow, it does seem to me like it could be quite a disruptive and potentially even outright dangerous technology development.

(32:45) Nathan Labenz:

So I think it's probably worth thinking about other ways that we can get the value that we want. Like, we want AIs that are easier to deploy, that are a little bit more adaptable, that sort of that learn beyond just what we're able to give them in terms of context. Obviously, you know, that's going to resonate in the market. But are there other ways to do that?

(33:12) Nathan Labenz:

I do worry often that we are doing a depth first search in AI where we've found these language models, they work, and everybody is kind of trying to jam on this exact paradigm all the way to superintelligence when I think we would be well served to remember that the space of possible AI minds is totally incomprehensibly large. It's much bigger than the space of human minds. It's much bigger than the space of transformer variants that we're seeing. And I think a more breadth first search approach in many ways would be better.

(33:51) Nathan Labenz:

You know, I don't think we want to take the first AI that ever kind of started to work and just race to make that a super intelligence and hope for the best. I think there's a lot more exploration that we would be wise as a community or even as a civilization to do. And I'm not so sure that it will be the best idea to just try to crack continual learning on top of the current paradigm, you know, and create a sort of insurmountable competitive advantage.

(34:23) Nathan Labenz:

I think this will be something that I think to really if a company like Anthropic was going to try to deploy continual learning, I know they're smart enough and they're in touch enough to know that they would need solutions to how do we handle the fact that this thing could have weird emergent misalignment or other strange generalizations because it's seeing a certain kind of data and it's changing in certain ways? Do we run evals at every timestamp? I think there's just a ton of questions about that. And so, yeah, I'm a little cautious about the maximalist vision for continual learning.

(34:59) Nathan Labenz:

Next question. How do you talk to normal people, quote unquote normal people, about AI? Honestly, I think my answer to this has become much clearer and simpler in the last couple months. My personal stories about concrete use cases that are super high value to me when things really matter, that works extremely well.

(35:22) Nathan Labenz:

So with my son's whole cancer diagnosis treatment journey, you know, there's been a lot of opportunities for little anecdotes like those to pop up. And that's really my go to at this point, if I'm talking to somebody who isn't paying attention to AI and I think they should be, or if I'm trying to convince somebody that AI can probably help them with stuff that they could use help on, and maybe they haven't used it for a while, you know, whatever the classic tried chat GPT and wasn't very impressed.

(35:51) Nathan Labenz:

My ability to say, look, like I've been in the hospital for the last 2 and a half months now, the majority of the time, and every single day when we get test results or we get a plan from the doctors, I run it through the AI, ask for their point of view, ask them what they think they should do and compare that against other AIs and against the doctor's notes. And I can just say with confidence that they, the AIs are like step for step with attending oncologists and clearly more knowledgeable and more reliable than the residents that we've dealt with at the hospital.

(36:24) Nathan Labenz:

You know, that's like lived experience in a context that obviously is very important. And you know one way to talk about this is the revealed preference of what you do when your kid's health and well-being is at stake is really perhaps the strongest signal of what technology you really believe in and what's really driving value. So the fact that I've been using AI more than ever at the hospital is just a super clear signal that I think on a story, you know, human story and human emotional level just lands with people.

(36:56) Nathan Labenz:

And so if you're kind of looking for how to talk to people about AI and get them to take it seriously when they haven't been, I would just look for your own versions of those stories. Obviously, you know, it's not worth getting cancer to have a compelling emotional story like the one that I'm now going to. Things that are compelling to you that really make a difference in your life, I would just tell those stories. I think that's the best way in for most people.

(37:27) Nathan Labenz:

And then there's like a whole bunch of other questions that you might want to think about downstream like, okay, now that they're paying attention, like, do I get them to take existential risks seriously? Or how do I get them to take whatever other things seriously? Everybody's going to be different in that regard, and I don't think there's a clear best answer. But for me, personal stories have worked really well, and I would just talk in plain terms about the difference that AI has made to you in your life in just kind of very simple narratives. That seems to work quite well for me.

(38:02) Nathan Labenz:

There was one woman, longtime friend. My mom's longtime friend and mother of my childhood best friend. We grew up down the street just a few houses down from each other. And she once told me this was maybe 2 years ago. She said this whole AI thing creeps me out and I don't want to have anything to do with it. And honestly, at the time, I was like, you know, that's a fine reaction. I think it's totally understandable that it would creep you out and I don't think you necessarily have to have anything to do with it. She's kind of like, you know, retirement age and doesn't really have to have anything to do with it. So I left that alone.

(38:34) Nathan Labenz:

But when she heard the my mom sent her the episode on Ernie's cancer and the use of AI in that she said, you know, this has kind of changed my attitude. Like I feel much better about AI now. And I wouldn't want her to entirely forget her sense of discomfort and even fear about the big picture of AI, but I do think that has given her a much better intuition at least for, like, why people are excited about it, you know, what the upside actually could be. And I'm quite confident that she is way more likely to go try it herself based on having heard that story than, you know, any sort of abstract argument or, you know, what it has done or reportedly done somewhere on the Internet. Like, the fact that it's me, that she knows me, and that there's just a really tangible difference in the life of somebody that she knows, I think that is probably the most likely way to change behavior.

(39:31) Nathan Labenz:

Next couple questions come from Aaron Bergman. There's a phenomenon of public intellectuals, including those I respect and admire, not exactly lying, but having very different tones in public and private. For example, a journalist taking pretty serious steps to prepare for COVID personally while maintaining a very different vibe in public writing. What, if anything, are you willing to tell us about the preparation steps you're taking, what kind of information you're conveying, what you're doing in general as a person with hunches and intuitions rather than a public intellectual with an epistemic image to maintain? People are very coy around this stuff for some reason. Sharing earnestly is a great public service.

(40:07) Nathan Labenz:

I aspire to be honest, honest as I can be on this feed. I like the fact that speaking verbally, you know, there's obviously, it's a much richer form of communication than purely written text and people get the sort of qualitative sense I think of where I'm coming from by listening. I don't really feel like I have much in the way of secrets. I don't think there's a big divergence between my private approach and what I'm saying in public.

(40:40) Nathan Labenz:

I was pleased to see that I was in the top 5% on the 2025 AI forecasting competition. I guess it wasn't really a competition, kind of a survey that they then ranked people on. I came in at position number 23 out of 400 and some. You know, I feel like in that way, I put myself on record with some forecasts of what I thought was going to happen and it seemed like I was, you know, at least more accurate than most. When I looked back and was like, do I feel good about these predictions or not? I was like, I feel okay about them. And the fact that I ended up in the top 5% with predictions that I felt were honestly only okay sort of suggests that the field as a whole is not making super accurate predictions. So that's something that I think should be a bit sobering.

(41:26) Nathan Labenz:

And both kind of over and underestimating progress. It seemed like the most the savviest people probably overestimated benchmark progress a little bit. Certainly, I did. Underestimated revenue growth. I got some good points on that because I had a higher revenue estimate than most, but I was still under the actual number. So anyway, that's just one way of kind of calibrating my, you know, my public statements like when put to the test on forecasting, they were reasonably accurate. And certainly in that context, I was incentivized to be, you know, as honest as I could be because I wanted to be at top the leaderboard.

(42:01) Nathan Labenz:

I think in terms of bigger, you know, kind of philosophical things, things that I'm doing offline, one philosophy that I've adopted pretty strongly is I don't really think it's worth worrying too much about money. I don't think things are going to stay the same. I think they could be amazing. The future hopefully will be super duper duper awesome compared to the present. It also could go quite badly. I certainly do take that still very seriously with, you know, whatever a p doom of somewhere in the high single digit to low double digit range. And either way, I think I'm probably not going to have to worry too much about money.

(42:42) Nathan Labenz:

If we're in a post scarcity world of AI abundance utopia, then I probably won't have to worry too much about money. And if, you know, we're all dead from AI, then again, obviously, I won't have to worry about money. That's a little bit easy for me to say. I kind of pinch myself on a daily basis that doing what I'm doing, which is basically just trying to maximize my own learning about AI, turns out to have a business model in the form of sponsorship of the podcast.

(43:12) Nathan Labenz:

I do some other work as well for companies where I just charge an hourly consulting rate and that's also, you know, very healthy hourly rate. And so and I honestly, my business model beyond that is really just to accept things that people offer me. Sometimes people offer a speaking fee or whatever. I accept usually without any negotiation, occasionally a little bit of negotiation, but usually not much. I feel like as long as there's a decent income to support myself and my family in the short term, like, big picture, it's probably not going to matter if I have x dollars in the bank or, you know, 3 x or 10 x dollars in the bank in 2030 or certainly like 2035. It just feels like the changes that are coming are big enough that probably all kind of comes out in the wash. Maybe that'll sound crazy in a few years, but that is genuinely the way I'm thinking about it.

(44:09) Nathan Labenz:

I've also thought about, but I honestly haven't acted on downside risk mitigations, like what could I invest in. And I don't mean financially, although that's coming up in a second from another Aaron question. But you know, what could I do, what could I buy, install to make myself more resilient in the case of downside scenarios? And you know, there's some interesting ideas, but I honestly haven't really done them.

(44:34) Nathan Labenz:

One would be to get Starlink, you know, just to be able to be mobile and have Internet access. And b, you know, if we're living in a world where cyber attacks, you know, infrastructure crippling initiatives are becoming more common then having both my normal Comcast Internet and a Starlink connection would probably be a good idea. Why haven't I done that yet? I don't know. Honestly, just kind of probably inertia, but I think that would be a good idea.

(45:09) Nathan Labenz:

Solar power would be another one that, you know, if I'm worried about the grid going down or, you know, just generally major disruptions, then having a bunch of solar panels on my roof and a couple big batteries in my house would certainly be a nice backup. Combine that with a Starlink, and maybe I could be online and, you know, connected and know what's going on in the world even if my local power and local cable had been disrupted. You know, I've looked into solar panel, but I haven't actually installed them on my roof yet.

(45:43) Nathan Labenz:

And then I was also even thinking about, like, really worst case scenario, what would I really want to have? And one answer there would be a rapidly expandable permaculture garden. This is something I actually kind of stumbled onto on TikTok with a guy named Mike Hoag, who is a fellow Midwesterner who specializes in permaculture. And his philosophy, a lot of inspiration from Native Americans and whatnot, basically he designs these gardens where different species of plants support each other and it takes minimal ongoing work once you've set it up for the system to produce food. And it can also, if you choose the right foods, choose the right species, then they can also rapidly expand if needed.

(46:30) Nathan Labenz:

You know, that's the kind of thing where I think some investment by humanity in general to have those sorts of things in little pockets ideally kind of distributed around would be a really good idea. But again, I haven't done it. I've looked into all these things and where have I come down? I guess it's partly inertia, maybe if I was just a little more agentic, you know, maybe once I get my Claude code personal AI infrastructure really humming, maybe I'll start to do more of these things. But then part of me is also kind of like, maybe I haven't done it because in the end, I kind of put it in the same category as money where I just feel like, is that really going to help?

(47:09) Nathan Labenz:

I live in Michigan. Is there going to be enough solar power to get me through the winter? It's definitely not going to be enough to heat my home and be warm. So I'm going to be in a pretty rough spot even with some solar panels. They can get me through, you know, some disruptions or some scenarios I can envision where it could be worthwhile. But in the extreme scenarios, how much difference is it really going to make?

(47:33) Nathan Labenz:

How many thousand dollars, few thousand dollars, whatever, and kind of convert that into different forms of capital that would be really valuable in situations where money isn't and could be the difference between, you know, surviving and not surviving in a civilizational collapse kind of scenario. I feel like, you know, probably should do it, but it's just so kind of depressing to think about. And then also, like, is it really even going to help? You know, is there anything I can do to really be in a position to survive in really bad scenarios? It's tough. You know, those things might increase our odds a bit. There's still just because you've got a little permaculture garden doesn't put you by any means in a good position if we're in a worst case AI scenario or even just a worst case electrical storm scenario.

(48:24) Nathan Labenz:

I think fairly often about the old it's called the Carrington event where it happened in the mid 1800s, I think 1860 something, maybe 1869, and it destroyed a bunch of telegraph networks and people that were working on telegraphs got shocks because this electrical storm just put such a surge through the network. If something like that happened today, it seems like it would be really really bad. Hasn't could could happen obviously with nothing to do with AI. That's just a random solar event that happened most recently 130 years ago or maybe 150 years ago and could easily happen again. Like I don't think we have any assurance that that won't happen next week.

(49:08) Nathan Labenz:

So, you know, there's other threats besides AI where some of these things could be really helpful. But, you know, does Starlink survive that? Do my solar panels survive that? A permaculture garden probably would survive that, but it's a pretty bleak life if everything really collapses and I'm trying to live off of turnips in my backyard. So I share all that just, you know, in response to the question that's kind of where some of my thoughts go when I'm thinking what if anything can I do to protect myself against the most extreme AI scenarios, but I haven't actually done those things.

(49:48) Nathan Labenz:

So in terms of a big disconnect between my public persona and my private action, those are private thoughts, but they have not yet translated into private action. So there you have it.

(49:59) Nathan Labenz:

And part 2 from Aaron was, are you willing to share anything about investments? And basically I have a similar philosophy here. I'm not really chasing money, I'm not really trying to maximize my return. If anything, I'm trying to maximize the cushion that I have so that I can devote my mental energy to learning as much as possible, understanding what's going on as well as possible, and hopefully sharing it with others as effectively as possible.

(50:29) Nathan Labenz:

So what I do in terms of investing in stocks is the most vanilla thing in the world. Super, super vanilla. I keep more in cash than I think most people do and most people would advise. And what my wife and I do put into equity investments is really just very generic index fund kind of stuff. If people ask me for my investment advice, I either say that or I would say go long on big tech. I think the idea that there could be a quote unquote big tech singularity is not unrealistic. Obviously, the increase in the stock market over the last however many years has been primarily driven by a relatively small number of companies and I kind of expect that to continue.

(51:14) Nathan Labenz:

It seems to me that your Nvidia's, Google's, Microsoft's, Metas, you know, these companies are really well Amazon, Apple, these companies are really well positioned to continue to dominate and so I would expect them to probably continue to outperform the rest of the market. But I don't even really tailor my portfolio on that level. I just buy the index and that's pretty much that's also partly for me because I have found that any sort of gambling, I used to when I was in college, played a decent amount of online poker. And I was a winner, although I wasn't amazing at it, but I did win more than I lost.

(51:53) Nathan Labenz:

But I found that it wasn't a very psychologically healthy lifestyle for me. Not that it was terrible for me, but on reflection after playing online poker, you know, decent amount during at least one year of college, I was like, you know, this is consuming a lot of my mental energy. The hourly rate that I'm making is not that great. It certainly wins and losses definitely affected me emotionally. And so, yeah, I kind of from that time on, was like, you know what, I'm going to avoid anything that feels like gambling. It feels like it consumes too much of my time and energy and the payoff isn't that awesome and I'd rather just have a clear mind that doesn't have to worry about any of those things and is able to focus on other things that I'm very confident that if I do a good job, there will be value.

(52:41) Nathan Labenz:

So that's pretty much how I approach financial investing. I'll say one other thing, which is I do have a very small and this is more just for camaraderie and friendship than it is for financial returns. But a good friend of mine from high school has organized an investment club with, I don't know, 12 or 15 old buddies from high school and invited me to be a part of that. And so I am a part of that. I put in it's a relatively small financial commitment. Everybody pays in a couple hundred dollars a month and then we kind of discussed what we might want to invest in and we make investments.

(53:16) Nathan Labenz:

The only recommendation that I have made to that group in terms of an individual stock was Nvidia when it was at $500 billion market cap. And I remember saying, it's hard to say that this is underpriced at $500 billion, but it does feel like the upside is pretty huge because I think what's about to happen in AI is going to be that huge. So sure enough, you know, we've got whatever an 8 x return on that investment so far.

(53:45) Nathan Labenz:

And so I am not chasing money even, you know, whether at the level of how I spend my time or how I negotiate, you know, before I'm willing to get involved with something or, you know, how I'm trying to allocate what investable capital I do have. The other aspect of investing, which I've talked about here and there on the podcast from time to time mostly as guests come on that I've made small investments in is early stage private company, you know, kind of venture capital style investing. And there I do 2 things.

(54:21) Nathan Labenz:

One is just invest very small amounts of my own money. And 2 is I'm also now since the acquisition of Turpentine by Andreessen Horowitz, I've also been able to become an a16z venture scout, which basically means they give me a not huge amount of money, and I can write relatively modest investment checks into very early stage companies. And the way I think about that is basically I want to invest in things that I want to see exist and that I want to succeed.

(54:51) Nathan Labenz:

For when I'm writing my own money checks, I don't really think about return at all. And I'm writing very small checks, it's not something where I'm even if it does you know, even if some of these companies do extremely well and companies I've invested in illicit because I really respected their commitment and their philosophy of highly structured reasoning. You know, the idea that we can't just allow the black box models to do everything and hope for the best. We need to really take it apart. We need a systematic approach to structuring their reasoning and also to ensuring reliability of the reasoning. I thought that was great, so I invested a few thousand dollars there.

(55:34) Nathan Labenz:

Goodfire because I'm really into interpretability. The AI underwriting company because I think the flywheel that they're trying to create in terms of harnessing the power of the insurance markets, you know, creating these standards, creating audits, and ultimately trying to bootstrap an insurance market so we can start to price the risk associated with various kinds of AI systems. I think these are all worthy projects that if they were nonprofits, might be inclined to make a donation to. But since they are private companies that I can buy equity in instead of making donation to, then I'll do that.

(56:13) Nathan Labenz:

But I'm really just doing that to try to support those projects to be on the team. But I think what I'm trying to do there is identify things that are both safety promoting and have fast growth opportunity. And I do think that that's a pretty decent intersection point. One way that I've heard people describe this is if AI is going to become the biggest market in the world, if it's going to start to compete with human labor broadly, which it certainly seems like it's on track to do, then the second biggest market in the world is going to have to be AI assurance tech.

(56:51) Nathan Labenz:

How do we make sure that this stuff is actually working the way that we want it to work? How do we control it? How do we quality control it? So I think there are quite a few things at that intersection of safety promoting, reliability promoting, control promoting, etcetera, that also do have potential to be quite fast growth. And those are the things that I'm inclined to invest in with my a16z scout fund money. (57:42) Nathan Labenz:

Mostly because I think the AI safety community is often caricatured as being anti-progress, anti-technology, whatever. And honestly, that's in my experience almost entirely wrong. The people that I know, and I know a lot of them, who are focused on AI safety issues are generally very pro-progress, very pro-technology. They're generally lifelong techno-optimist libertarians who see AI as a different kind of thing because it does have this potential to outcompete us at what we have been uniquely good at, which has allowed us to take over the world. And because of that special dynamic, based on very specific arguments and analysis of this particular phenomenon, see AI as being different from kind of everything else.

(58:32) Nathan Labenz:

But the AI safety people are very in favor of permitting reform and want to see more housing get built and are generally all for abundance and want their AI doctors and they're very well aware that human doctors are not as great as we might wish they were. They want their self-driving cars. They're very analytical when they see that self-driving cars have a 10% accident rate as compared to human drivers and then further see that almost all those accidents are caused by other human drivers surrounding the AI drivers. They believe the numbers. They update on these statistics. So I would say on a very large range of questions, there is a lot of opportunity for the AI safety focused people and the a16z world view to come together.

(59:18) Nathan Labenz:

And I hope that by getting a16z invested in token fashion, because my checks will be very small and certainly not the kind of thing that's going to make or break a16z economics. My hope though is that I can start to send in a little bit of a signal to the firm more broadly that look, there are a lot of things where progress can be enabled, assisted, even accelerated by this sort of assurance tech. And these businesses can grow fast and that the people that are starting these businesses are ambitious and they want to grow fast and they want to see a brighter future for everybody. So hopefully I can kind of send a small signal that might have some bigger ripples over time. And then where there's an opportunity to support something that I believe in, do that with either in my own money case, basically no concern about return, or in the a16z scout case looking for things that have potential for high return, but doing that in a sector or with a concept that I think could start to facilitate this coming together of AI safety folks and accelerationists.

(1:00:35) Nathan Labenz:

Thank you to Aaron for a couple of good questions. Okay. Here's another interesting question on early childhood AI literacy. So mainstream advice, this is the question, mainstream advice on kids and tech often ends with a push for abstinence. What would a sex ed style approach to technology and AI look like for kids that's age appropriate, values based, and practical so they can build confidence and judgment instead of secrecy and bad habits? What could parents, particularly as models for their kids and schools, as creators of safe spaces to explore, do at various ages, say 3 to 6, 7 to 11, and 12 to 18 to foster such healthy learning and exploration?

(1:01:15) Nathan Labenz:

That is a truly great question, and I don't think I have an answer that's up to the scope of the challenge of that question in all honesty. Eugenia Koyda, founder of Replica and now Wabby on our live show, basically did advocate for abstinence for younger kids. She said we just don't know enough about this technology to trust any developers, no matter how well intentioned they are, to serve young kids in a way that we're ultimately going to be happy with, that we'll ultimately feel like we were wise to have done. And so given her experience in the replica space, I am reluctant to disagree with her.

(1:01:59) Nathan Labenz:

I guess I would say for especially younger kids right now, and my kids are Ernie's six, almost 7. Teddy is just turned 5, and our youngest, Charlie, is 2 and will be 3 in April. So I'm still in kind of the first grade and below bracket. I suppose abstinence might be a decent idea there. And yet I'm not content with that. I do think when I look at what folks like Alpha School are doing with 2 hours of entirely AI delivered education, 2 hours of educational content, 2 hours of focused academic work entirely delivered and supervised by AI on a 1 to 1 basis for every kid. The fact that they're able to get kids going faster than usual schools go in just that 2 hours a day and create this whole afternoon of freedom to do all these other exciting things that kids want to do. I'm not content with the idea of abstinence being the answer.

(1:03:10) Nathan Labenz:

So I think that puts us in kind of a hard place. I think we are at a spot where it's right to say that the technology is too new, there's too many surprises about it, and nobody has really established themselves with a great track record for how to create AI products, experiences, whatever, that really serve kids well in the long term and don't just engagement max or whatever else. I think that's true, but I also don't want to put my head in the sand or try to get my kids to put their heads in the sand and pretend that AI doesn't exist for all that much longer because I do think there is gonna be a lot of value in even just AI for educational purposes. And probably more besides that too, we just haven't seen the right form factors.

(1:04:03) Nathan Labenz:

I do use voice mode AIs with my kids fairly often. It's not something we do all the time. But if we're playing a game or there's some question that they have that I don't know the answer to, I'll totally whip out my phone and go into voice mode and ask it a question and get the answer. And this is something that for them seems pretty normal. But it's not something we do a ton, but it's also not something I'm trying to hide from them. And it's not something that they're clamoring for all the time, but occasionally one of them will say, hey, why don't you ask AI about that? And so I will.

(1:04:39) Nathan Labenz:

I guess I put this in kind of the same category as other major fundamentally not just life altering but sort of condition of life altering technologies that we're going to be confronted with in the coming years. I would put brain computer interfaces as another one of those. Neuralink is planning to scale up its patient base this year substantially and they've said that they plan to serve what would be considered well people in the not too distant future. So we're gonna have questions around do we get brain computer interfaces as healthy people? We're gonna have questions around all sorts of gene editing. Do we do that? And as my kids are already born, so they would be getting whatever gene editing they'd be getting as formed people. But then we're also gonna have increasing levels of power in terms of embryo selection or embryo level gene editing that's gonna fundamentally change the nature of people before they're even gestated and born. And I think this sort of surrounding ourselves with AI friends, companions, tutors, always on entities, I think is probably right up there in terms of the magnitude of the impact that it could have.

(1:06:05) Nathan Labenz:

And so I think we're gonna have to approach it with extreme caution, but the value of it is probably also going to be undeniable. And it is very hard for me to imagine abstinence all the way up to 18 years old. The idea that a high school kid today should not be allowed to use AI because of the downside risks, I don't see that. I definitely think high school kids should be able to use ChatGPT, should be able to use Claude. Should they use AI boyfriends and girlfriends? I certainly would understand a parent saying, no, I don't want that. And that's probably where my intuition would go as well. But I don't know.

(1:06:49) Nathan Labenz:

This stuff is tough. I haven't parented a teenager, and I do think the trade offs there are tough. Right? Like, if they're gonna have a phone, they're gonna have some access to this kind of stuff. You tell them it's not okay to use, do you drive it underground, do you lose visibility into it? It's very tough. I think this stuff is very fraught. So I think it's by all means a great question to be asking. I wish I had better answers.

(1:07:16) Nathan Labenz:

Get hands on is always kind of one of my fallback answers. I would say if you are considering buying any of these form factors or allowing your kids to use any of these sort of, it could be a stuffed animal that can really talk or it could be an app that is a virtual friend or whatever. If you are considering that, I would definitely get hands on with it yourself and really try to understand it and get an intuitive experiential feel for it before you just give it to your kid and let them do whatever they're gonna do with it and hope for the best. But I just don't think there's great answers right now.

(1:07:58) Nathan Labenz:

This is definitely one of the areas where consumer reviews will be extremely important. It would make a huge difference to me to know that lots of other parents are out there saying this thing has been great for my kid. We take that quite seriously. I think I believe it is possible. I guess maybe my closing thought on this is because the space of possibility with AI is so vast, I absolutely believe it is possible to create AI toys, products, virtual friends, whatever, that do effectively nurture humans at any age. And so we're just gonna have to continue to watch the space really closely and be hands on, work together and hopefully that way we'll be able to come up with good answers. It's a tough one.

(1:08:53) Nathan Labenz:

Next question, what is your timeline for work disruption? Could you comment on job disruption in terms of phases? So like if 0 to 3 years hits roles like customer support, marketing operations and some software engineering tasks and years 3 to 10 starts reshaping accounting law and parts of medicine, what's your best guess for years 10 to 20? Where do humans remain essential at each stage?

(1:09:17) Nathan Labenz:

I guess first of all, if we're gonna use a timeline like that, I would say time equals 0 was somewhere around the introduction of the InstructGPT model, the first instruction following actually, I guess they later revealed it was just supervised fine tuning and not yet RLHF as of the first release of InstructGPT. That's very esoteric in the weeds history. Well, that was early 2022. And then ChatGPT, of course, late 2022. That's I would say 2022 is kind of the year 0 for this purpose and those early models, the first things they were able to do and certainly the things that I was most interested in getting them to do early on was write some basic marketing copy. So I think that we already started to see some disruption in 2022, 2023. Copywriters do seem to have been very significantly impacted by AI.

(1:10:16) Nathan Labenz:

We also had an example of this at Waymark with VoiceOver where initially our offering for VoiceOver was a professional service. We would charge $99 which is a pretty good price and we worked with a provider that did a really good job and we delivered professional voice over at medium scale for an SMB accessible price point. And we were pretty proud and pleased of that service. In 2022, 2023 time frame, AI voices started to get good enough that they were really not competitive with the human pro, but classic disruptive technology. They were worse, but they were way cheaper and they also happened to be way faster. So you could get with our AI voice over integration, multiple takes in seconds and at no additional cost versus a much better product for $99 that would take a couple of days and maybe have a couple of rounds back and forth. And we started to see substantial adoption of that pretty much immediately. It did start to eat into how much of the professional work we were seeing customers choose to pay for even while the AI voices were clearly inferior.

(1:11:35) Nathan Labenz:

Now, fast forward to today, and we're now obviously spoiled for choices of great voices. 11 Labs is obviously great. Google also has amazing and very steerable, promptable voices that are awesome. And there's lots of other companies doing great stuff besides Hume AI has a really emotionally competent voice. They can also they're also very focused on understanding the emotion of the human for our purposes with Waymark with creating marketing video content. We don't need to understand input human voices, but Hume is good at that as well as making the voice sound emotionally intelligent. And basically, these days, we just don't do much. I don't think we do really any human professional voice over work at all. So that's a pretty substantial change that I would date back to 2022, 2023 starting, marketing copy and voice over work being some of the earliest.

(1:12:39) Nathan Labenz:

And now as of today, if you look at GDP Val, the latest models are winning in software engineering by a substantial margin. It's like in the seventies, maybe even up to 80% of the time when, and again, GDP val, 3 sets of experts, 1 set of experts defined the tasks in various domains, another set of experts does the tasks, their work is then compared by a third set of experts to AI work, and the third set of experts are responsible for determining which do they prefer. And the AIs are now winning a lot when it comes to software engineering tasks. So I would say for sure that I prefer working with Claude code over working with an entry level human software developer. That's pretty obvious to me, honestly, at this point.

(1:13:37) Nathan Labenz:

I'm not sure yet. I think the debate is ongoing when it comes to has this hit the statistics or not, but will it? I think it will. I think in 2026, it's gonna be hard to justify on purely economic terms hiring a 22 year old out of a CS program versus spending more on Claude code or getting really good or spending the time that you would spend mentoring that person, instead spending that developing new skills and hooks and increasingly elaborate architecture for your AI coding setup. So I think the disruption is happening now, certainly in software.

(1:14:24) Nathan Labenz:

And I guess relative to the rest of the question timeline, I don't think it's 3 to 10 and then 10 to 20 years. I think it is probably coming sooner than that. When accounting, law, parts of medicine well let's start with medicine. I've talked ad nauseam now about how the latest models are competitive with attending oncologists. That's a reality. I've lived that. There's no denying it. Nobody can talk me out of it. As sure as I'm sitting here, all 3 of the Frontier models, Gemini 3, Claude and ChatGPT 5.2 Pro currently they're all competitive with attending oncologists and there's just no 2 ways about that. So in terms of parts of medicine, yeah, absolutely. I'd say that disruption is starting to happen.

(1:15:15) Nathan Labenz:

I don't recommend people ignore their human doctor or go all in on AI, but I do think that especially for things that aren't so important, the substitution effect is going to start to be real. Because you can get a lot of your questions answered and maybe not have to go to the doctor or go to the right doctor the first time or go in with obviously in our society right now doctors still are required to prescribe the medicines. So I do think that the way that this is going to play out is going to be noticeably different very soon. And I'd say that's probably going to be true in accounting and law as well.

(1:15:54) Nathan Labenz:

I'm certainly no expert in law, but when I get a contract to sign, I run it through 3 or even 4 frontier language models in the exact same way that I run my son's lab test results through language models, and I'm yet to regret it. Would I do that for the most important transaction of my life? I'd probably get a human lawyer to help review as well. But for routine stuff, I'm absolutely happy with the results that I'm getting. And if none of the 3 or 4 frontier models identify issues that I need to be concerned with, then I'm good. I think it's pretty unlikely that all 3 would miss something that's really important to me. And even in accounting, we're starting to see models are getting pretty good with spreadsheets. So I think that too is coming pretty soon.

(1:16:45) Nathan Labenz:

Another mental model I have for this is sort of how elastic is demand in a given domain and I think that varies dramatically. When it comes to something like accounting, I personally don't want to buy a lot more accounting services than I am required to buy. And maybe others are out there wishing that they could buy more accounting and just don't have the budget for it. And when AI makes accounting professionals more productive, then they'll buy lots more accounting services and that will partially offset. But I just don't see that there's that much latent demand for more accounting. I think most people are doing roughly what they need to do, what they're required to do, and beyond that they don't really have that much appetite for more. So in something like accounting, I think when we see threshold effects hit and all of a sudden AI can do the job, I would expect that to be a field where there would be more outright substitution and displacement and not like an explosion of accounting services provided.

(1:17:55) Nathan Labenz:

Dentistry by the way is my example of maybe the most extreme version of this. I want 0 dental services. I'd love to never have to go to the dentist again in my life. So if there was something that I could buy that was 1% the cost of dentists and did the same service, I'd happily do that and I wouldn't be doing lots more dentistry. I think most people share that intuition. Medicine in general is probably a bit different. I think most people might want a little more medicine, they might want a little more care so that there could be a surge of demand. And as things become more accessible, as sort of capacity expands, demand probably grows to fill it. And then software engineering maybe could be like the 10 x or even the 100 x. Like do we get a 100 x as much software production over the next few years? Very plausibly. And that might be enough to sustain software employment at least among the senior engineers for a while yet, for perhaps a surprisingly long time. Even when it looks like, hey, AI can code up this website in 1 shot. What do we need engineers for? Well, if we're doing a 100 times as much software and the architectures are getting ever more elaborate, then maybe there's still a role for the kind of senior software architect even if there's not for the junior person.

(1:19:10) Nathan Labenz:

I think this is one area where I do have a bone to pick with some of Dwarkesh's analysis, and I think generally very highly of Dwarkesh. Think his show is great. His questions I think are generally very effective in terms of eliciting alpha from his guests, and I think his essays are generally really good. But one thing that he has said recently that I do pretty strongly disagree with is that explanations, when the question is posed like, why aren't we seeing more impact than we have to date from AI on the labor market? He has argued that explanations that center on human bottlenecks are essentially cope. He thinks it's really that the models aren't good enough and it's not that the people are too stuck in their ways. And obviously, I would say it's both. I mean, the models do have room for improvement, Jagged Frontier, all that stuff for sure is real. But I think that there really is a lot of human bottleneck going on and you see that at the hospital, right?

(1:20:18) Nathan Labenz:

I mean, when a resident comes to the room and talks to us and is clearly less knowledgeable and less reliable than a large language model, I don't really see that as I don't know how to interpret that in any other way than that the humans are the bottlenecks. That they have not realized, that nobody has told them, that they haven't experimented with themselves actually using a model on their own, that's on them, right? It's not on the model. I can just tell you easily a bunch of times that if the resident had engaged with ChadGPT as I had engaged with ChadGPT before coming to talk to me, then they would know more and they would be able to do better. And I'm quite confident that is true in a very, very wide range of contexts. I do think that the human bottleneck is a very real phenomenon, not the entire story, but to say that that's cope, I would definitely debate Dwarkesh on that one.

(1:21:23) Nathan Labenz:

The other thing that I would point people to in terms of just a mental model for this, again I mentioned already, but we ran that cross post with Luke Drago from the Future of Life Institute podcast. And he's got this idea of the inverse pyramid model where basically the lowest rung in terms of the hierarchy of an organisation, in other words the entry level employees, AI is basically coming for them first. Another way that he thinks about it is anything where there are lots of people doing the same job and the organization is geared toward trying to make sure that those individuals act more like cogs in the machine and do the job in the same way every time where they're focused on consistency, reliability, process, standards, systematic evaluation, like all that stuff really lends itself to AI. So he recommends, and I think this is pretty good advice, try to do things that are n of 1 and try not to do things where you are 1 of n. So be n of 1, don't be 1 of n is pretty good advice I think.

(1:22:26) Nathan Labenz:

But obviously that's advice that some individuals can take. It's not something that I think is going to preserve the structure of employment broadly. And we're not even talking about driving here, right? I mean, that wasn't mentioned in the question, but last I checked, I think it's 4,000,000 Americans out of like a 150,000,000 employed Americans. Something like 4,000,000 are professional drivers. We're getting real close, right, to where human drivers are just not going to need to exist. And it seems like the disruption there could happen quite soon. And it again seems like the bottlenecks increasingly at this point are human. You've got some city councils that are trying to do various things or what are the teamsters gonna say, what have you. I'm in Detroit. Waymo is projected to launch in Detroit in 2026. It's gonna be really interesting to see how that plays here in the Motor City, especially because the companies headquartered here aren't really at the center of the frontier of the self driving technology to put it mildly.

(1:23:31) Nathan Labenz:

So I'm not sure what the response is going to be, but the bottlenecks there are pretty clearly human at this point and it seems like even in a world where the AIs don't really get any better, I do see as we kind of figure out how to scaffold them and plug them in and manage the context right and all that kind of stuff, I do see pretty serious disruption at least being possible unless it is blocked by sociopolitical dynamics in just the next couple of years. And then again superintelligence is a distinct question from that, but it seems like we're quite clearly close to where AI will be able to compete for entry level jobs, for jobs where an individual human is 1 of n people doing that job because there are the standards and the processes and the evaluation frameworks to make sure that AI is doing well. And I expect AI, especially as we really put that elbow grease in, I expect it will do better than a lot of people in a lot of places. And that the economic the market pressures will be very strong to adopt them.

(1:24:46) Nathan Labenz:

Customer service is another great example. Right? I mean, will there be some customer service people? I don't think it's going to 0 in the immediate term. But my dad was on the phone with Bank of America yesterday on hold for however long, getting increasingly agitated by the fact that he was on hold as he's listening to their hold music and messages. And I was just like, man, I know several AI customer service firms doing voice agent type work that would, at a minimum, be able to reduce the wait time dramatically and probably do just as good of a job for a large majority of tickets as the humans are ultimately able to do. So I think we're bottlenecked on willingness, I think we're bottlenecked on implementation. At the very high end, we're still bottlenecked on model capability.

(1:25:30) Nathan Labenz:

If I had to kind of project how far up the pyramid can AI go today, it's maybe half the way up vertically, but that's whatever, 80% of the mass. And I think the technology is basically there for that, and now we'll have to see how fast people actually how fast does the market work? How fast do does market incentive and pressure and increasingly lower barriers to successful deployment, how fast does that all work to actually create the disruption that I honestly think is kind of inevitable? But relative to the question, I would take the under in terms of timelines.

(1:26:12) Nathan Labenz:

And then the final point of the question is where do humans remain essential at each stage? I think that's a design question more than anything else. I think we want to design AI future where humans are essential or are at least a big part of the process so that we can retain some authorship over the future and not just give the whole thing over to AIs as the gradual disempowerment people worry we might. But I think that that is going to have to be intentional more so than something where we will find that there's some ineffable human essence, some Élan vital that something that for whatever mysterious or even mythical reason, mystical reason only humans can do, I really don't believe in that. I think that we are pretty remarkable certainly in our breadth and our flexibility, and the high end of human achievement is obviously super impressive and inspiring to the rest of us mere mortal humans. But AI is doing superhuman level stuff in more and more domains, and I don't really see fundamental blockers to that being in the fullness of time, functionally everything. (1:27:30) Nathan Labenz:

So I think it's much more on us to think about how do we design these things, how do we design our overall society, how do we design our overall systems, how do we design our models, how do we design our implementations so that we keep the parts that we want to keep and retain some overall control of steering the future? I don't think that's going to just happen because there's something that AIs can never do that only we can do. I think it's going to have to be a lot more intentional than that.

(1:27:59) Nathan Labenz:

Okay, next question. Regarding nonprofits and shifting need, as AI reshapes labor markets, how do you expect the definition and scale of those in need to change over again 0 to 3, 3 to 10, and 10 to 20 years? What should nonprofits and funders start doing now to prepare, especially around workforce transitions, mental health, and basic economic stability? I would like to see a lot more work on this honestly than we have to date.

(1:28:23) Nathan Labenz:

My general answer is I have to give Sam Altman a ton of credit here for making personal investments in UBI, and I think that that is something that we are ultimately going to need, whether it ends up being called UBI or exactly how it looks. I think we're going to have to have some new social contract that decouples a person's right to a decent material standard of living from their ability to contribute economically, especially in the context of competition that they're going to face from AI systems. I just think that's really the only way that we are going to get to a place that anybody would be happy with. And I welcome other suggestions, but I haven't heard many that really make a lot of sense. The only other what I hear is either like we're going to need a UBI or we're not going to need a UBI because there's always going to be work for humans. And I just don't believe that.

(1:29:20) Nathan Labenz:

Tyler Cowen, you know, it kind of confuses me on this point these days to be honest, and I've been kind of waiting to invite him on the podcast for a while. Maybe I should finally get around to doing it. But I looked back at Marginal Revolution and I think his first mention of ZMP workers or zero marginal product workers was from 2010. And this was basically research that was looking at how firms responded to the Great Recession. And what many firms did in response to the Great Recession is they looked around their teams and they said, who do we really not have to have to go forward and be okay? And they were able to identify people and they were able to cut those people. And of course, you know, that's a noisy process. But by and large, it seems like firms were able to cut headcount and then actually see somewhat of a surge in productivity, seemingly because they were producing more or less the same with fewer workers because there were some people that just weren't really contributing much.

(1:30:25) Nathan Labenz:

You take that and you take the general phenomenon of like bullshit jobs where like a lot of people seem to feel according to survey results that like even absent AI that like their own work isn't really worth anything and it's kind of performative or, you know, it wouldn't matter if it wasn't done. Maybe they're wrong about that, but are you really, you know, that confident that people who are saying that their own work is meaningless or not necessary, are you really confident that they're wrong? I tend to think they probably, you know, at least some of them are probably right. So it feels like there's already, you know, kind of at baseline some ZMP workers that are employed. And there's a lot of people who are saying that their work is just not really meaningful or important and that things would be fine if it wasn't done at all. And so, yeah, I just see that we don't have a great answer other than that we have to find a way to decouple somebody's ability to exist and like have a decent life from their ability to compete economically, again, especially compete against AIs.

(1:31:25) Nathan Labenz:

Exactly what the right structure of that is, I think we would do really well to start doing much more of that experimentation. And I personally interpreted some of the recent UBI results as much more encouraging than I think the broader discourse did. I'm certainly not an expert in this, but my takeaway from some of the UBI research was like people were disappointed that people seemed to work less in response to getting the UBI. And I'm kind of like, I think that is the point, you know, like if they work the same, in my mind, that might even be more of a failure. Like, I think we want to, we might not want to quite yet, or we maybe didn't want to like 2 or 3 years ago when these studies were being run, have people like just take cash and not work. But I think that what that does show is that people are just in many cases just working for money. They're not like, maybe you could also say like, at least some people are like relatively easily satisfied with like not that much money, which would suggest that they are able to find meaning in things like spending time with family and leisure and, you know, whatever. Maybe they don't feel compelled to go out into the workforce and do some job that they really don't want to do just to get a bit more money. Obviously like marginal tax rates are also in many cases like totally out of whack and working more can mean you forfeit benefits of all kinds, whatever. It's a very complicated question.

(1:32:58) Nathan Labenz:

But I'm encouraged I think much more than the average commenter is by these UBI results because I just see that like it didn't take that much to actually get people to work less. And if they're not enjoying their work and they're substituting away from that work toward leisure, to me that suggests they're not like missing out on meaning. They're not like pining for the workplace as the place where they're going to like have identity and find meaning. Seems like they're finding that in other places just fine, thank you very much. And I would expect that probably to continue. I tend to think that the, you know, we need jobs for structure and meaning and whatever, I think that's mostly cope. And I think it's especially unhelpful cope when it's projected by people who have privileged positions where their work is like high status and where they genuinely do love it, and I count myself in that group and I'm very thankful for that. But when that sort of reality is projected onto people who are lower on the socioeconomic scale, who are doing work that they don't want to do because they have to do it, because that's the only way that they're going to feed and clothe their kids, I think that's like quite counterproductive and misguided.

(1:34:14) Nathan Labenz:

So yeah, bottom line, I would love to hear other ideas. I would love to hear, I would love to read your utopian fiction about other ways that we rework the social contract that isn't just kind of a vanilla UBI type structure. But until somebody more creative and imaginative and visionary than me comes up with those ideas, I basically still think that UBI is the default and denial is not helpful. More experimentation sooner about the details and the structures and exactly how incentives should work would be really valuable.

(1:34:52) Nathan Labenz:

Okay. Thank you for those questions. Next one, are people being misled by benchmarking? And then another kind of related question, does the massive success of pre-training give people the wrong intuition about AGI? Yeah. I mean, characterizing whether people are misled or whether people have the right or wrong intuition, I think is very hard because people have very, very different intuitions and world models, right? So like there's such diversity there that I can't characterize like people's opinions in general with any, that's just such a wide ranging thing, right, that anything I might say about some people's opinions is obviously going to be contradicted by other people's opinions.

(1:35:31) Nathan Labenz:

That said, I do think benchmarking is probably misleading to at least some degree and I would go back to the Chinese models, which I talked about last time, being like significantly worse on just a very random multimodal task than all of the leading Western models as a kind of leading indicator of how much benchmarking might be misleading us more generally. It does seem to me that like the delta between the leading frontier American models and the Chinese models in benchmarks is much smaller than the difference between them in practical day to day utility on whatever idiosyncratic and or esoteric task you might want to use an AI to help with. And, you know, of course, the Llama 4 incident shows that as well where they, like, I guess, spun up like maybe a bunch of different Llama 4 variants to like put one in each different LM Arena category to try to maximize their score. I'm not exactly sure what they did, but they sort of LM Arena maxed and that was effective in as much as they got a high position on LM Arena, but you don't hear all that much about Llama 4. And it seems like it just wasn't really competitive, but they were able to make it look like it was competitive on some of these standardized scores.

(1:36:58) Nathan Labenz:

So yes, I think benchmarking is a problem. Independent analysis is a good antidote for this, looking at the meter charts, looking at like Artificial Analysis, you know, there are people that are the Scale benchmark that is like largely private, you know, is a pretty good way to think about that as well. I think ARC-AGI has been like remarkably durable in terms of how relevant it's been, but, you know, people can obviously benchmark on that too. But definitely things where there's a private test set and things where people are like taking it upon themselves to really analyze capabilities and they make that their thing, you know, and like really try to earn your trust by being a reliable guide to model performance over time, I think those are the things that increasingly the field will be looking to beyond just performance on open standardized benchmark tests. Those, you know, continue to be somewhat relevant for sure, but like less and less all the time.

(1:38:06) Nathan Labenz:

And then on this, on the question of like does pre-training give people the wrong intuition about AGI? If anything, I think maybe post-training is giving people the wrong intuition about AGI and maybe the classic Shoggoth meme for pre-training is maybe a better intuition. And this kind of also connects to the breadth first versus depth first search of the space of possible AIs. I think that if people are misled about the intuition or if people have the wrong intuitions about AGI, I would expect that it's wrong in the sense that they have encountered a relatively narrow range of form factors of AIs, which are basically like chatbots and coding assistants. And that those, you know, those designs have kind of converged thus far. The overall paradigm of helpful, honest, harmless, which has been great for unlocking a ton of value and pretty great in terms of certainly when done well as Anthropic has done, you know, shaping the model's character, you know, no beef with those problems or with those approaches. But if there is a problem with that, it has presented to the public a very, very narrow slice of the overall conceptual menu of like what AI can be like. And so it maybe is lulling people into a false sense of security or a false sense of like this will continue to be normal.

(1:39:38) Nathan Labenz:

And I think that in reality, the Shoggoth meme of like this thing is an insane alien that kind of encompasses everything in a super strange way and can kind of shape shift and be anything you want it to be or maybe even things you don't want it to be, I think that is maybe the better intuition for at least the space of possibility for AGI. And you know, when you think about things like the, I think back to the episode we did with Apollo, with Marius Hobbhahn from Apollo Research where they got access to chain of thought from O3 class models and they found that the chain of thought was kind of evolving to become its own dialect. Remember like disclaim, disclaim vantage, you know, the watchers, these strange phrases that really didn't look anything like the training data. In this case, you know, being driven by reinforcement learning at increasing scale, it just feels to me like there's just so much more alienness to these systems than we are seeing.

(1:40:40) Nathan Labenz:

I guess, you know, just a couple other little intuitions for this maybe worth mentioning. People often cite the bird airplane analogy. You know, there's many people have commented of course that like we wanted to create a machine that could fly. A lot of early attempts sort of tried to mimic the bird. What we have is an airplane that like flies on quite different principles and is just like way faster, way more powerful, can carry way heavier loads than birds. Can't do necessarily everything that a bird can do, but for the things that we've designed it for, it's just way way better, right? We wouldn't want a scaled up bird to take a cross country flight on. Airplanes are just way better than scaled up birds. Similarly, many people may have seen recently a video of, it was kind of a compare and contrast where there was like a humanoid robot harvesting grain in a field, you know, and kind of chopping it down and bundling it up in a way that humans traditionally used to do. And then there was this carrot picking machine that's just rolling through a field, pulling carrots out of the ground in mass, washing them off and just operating, you know, at orders of magnitude faster than humans could possibly pick carrots.

(1:41:48) Nathan Labenz:

And so I think something similar, you know, maybe happens with a true AGI or like a super intelligence where it becomes so powerful that it potentially doesn't even really make sense for it to present in like this natural language sort of way anymore or at a minimum that becomes sort of a dramatic reduction, some sort of, you know, a majorly lossy kind of summary of what it's doing as opposed to the core thing that it's doing today. You know, today the thing, the response that you get from the chatbot like that is its output, right? In the future I think the natural language summary of what it's doing may be just a very small part of what it is actually doing. And so, yeah, I think study the Shoggoth, study the weird chains of thought, expand your mind when it comes to the space of possibility and how weird things could be. Those are my guesses for how most people are being misled if at all today.

(1:42:52) Nathan Labenz:

Okay. Next question, is learning from a physical environment a requisite for AI? I don't think so. I think we're pretty far along obviously in AI at this point. And when we look at all the things that AIs can do and like how many categories they're winning more than 50% on in the GPQA context, all without any, you know, robotics or like physical embodiment at this stage, I think it's like pretty clear that you can get pretty good AI without needing to learn from a physical environment.

(1:43:25) Nathan Labenz:

Now the line maybe starts to blur a little bit there when it's like can you use a computer, is that a physical environment? Well, it's a digital environment, but it's like spatially organized and it does seem like the way that we are getting AIs to learn how to use a computer is by them actually like trying and failing to use the computer a lot. Right? We needed some language model based capability to like conceptually understand what one might want to do on a computer and what different buttons might mean and so on. And then from there, it's been like a lot of actual reinforcement learning where once there was at least some ability to succeed, then, you know, you could hill climb from there up to like, at this point, pretty decent performance. But I definitely think we're not quite there, but in like 2026, I think we'll definitely have computers that or AIs that use computers basically as well if not better than your typical human user.

(1:44:17) Nathan Labenz:

So yeah, I guess maybe it kind of comes down to like a task by task thing. Do I think that large language models are going to become plumbers without a bunch of reinforcement learning on physical plumbing like tasks? No. I think that if you want to be a plumber, you're going to have to do a bunch of stuff in the physical world. You're going to have to get good at that, and that is probably going to require some sort of embodiment. You know, when the question says physical environment, I would also say simulation is going to get really good. Obviously, NVIDIA's GPUs are not just good for training, they're also good for simulation. So we're going to see more and more simulation being used in all facets of AI training including for robotics. But if you count that simulated physical environment as a physical environment, I do think you need it to do physical things and ultimately like a language model can't be a plumber. I think you can be a lawyer without ever having any physical embodiment. I think you could be an accountant, arguably your, you know, Excel spreadsheet maybe is your physical quote unquote environment there.

(1:45:26) Nathan Labenz:

But, yeah, I guess basically my intuition is that you need training in something like the environment that you're going to operate in. So as long as you're just operating in language space, language is probably enough. You want to start to operate in pixel space, you're going to need to be trained in pixel space. You want to operate in spreadsheet space, you're going to need to be trained in spreadsheet space. You want to operate in physical real world space, you're going to need to be trained in a combination probably of simulated physical real world space and to some degree actual real world space. And it's going to happen.

(1:46:01) Nathan Labenz:

You know, the other question, the other way to think about these kinds of questions is like we'll never know because is it required? I think not for many tasks, but it's going to happen. So there may be some positive transfer. You know, will the AIs of 2027, 2028, even if I'm just doing something that is ultimately a purely language, say, legal task, will the best models that I can go to for those kinds of tasks also have some sort of physical real world training as part of their overall training mix? It wouldn't surprise me at all. It wouldn't surprise me if there's some positive transfer there, if there's just some better intuition, you know, certain kinds of queries that require spatial reasoning just perform better because that kind of training is folded into the mix. And so like I'm not saying that 2027, 2028 AIs even, you know, of the chatbot or digital assistant variety won't have any physical environment based training going into them. I'm just saying I don't think you would have to do that to be successful, but in all likelihood or at least, you know, it's pretty likely at least that it will happen. And so then it'll be kind of hard in the end to tease out like, was this required to happen? Could we have got here another way?

(1:47:20) Nathan Labenz:

It just seems like everything is going to be developed and kind of folded in to the degree that it works and it seems like everything is working. It's all going to be folded into kind of the mainline frontier models and so we will probably not have a clear answer. Could we have got here without doing any physical training? My bet would be counterfactually yes. If it adds even a little bit, then it'll be part of the overall mix. And so, you know, what we'll actually have will be AIs that are kind of trained on everything regardless of whether some of those things could have been, you know, cut with minimal performance loss or not.

(1:47:54) Nathan Labenz:

Okay. Next section gets a little more focused on tooling and kind of AI engineering concepts. Are you noticing any emerging standards or winners for tooling emerge across the companies that you work with? This is interesting. I would say I don't really have anything super shocking to report. You know, I think in terms of models that people are using, you know, Claude for coding is typically the go to still. Certainly get very good reviews from people on GPT Codex, but Claude still seems to be the go to for most people. If they didn't have Claude, you know, there are certainly other great options out there, Gemini 3 is also excellent. But I'd say Claude remains the kind of consensus top choice for coding. OpenAI is kind of pretty good at everything and is probably the default thing that most people use for like in browser random queries on a day to day basis. And Gemini Flash is, I would say, pretty clearly the top choice for things that don't require frontier capabilities and where speed and cost is a significant factor. And I don't think that's really surprising or out of step with like mainstream online consensus at all. I really haven't seen anything that is, you know, that challenges the mainline narratives at a certainly at a model provider level.

(1:49:27) Nathan Labenz:

Now you go to the sort of tool level and yeah, I don't know, I think it again it's like I think the market is like getting away from most people. I think most people are, because things are changing so fast, most people are behind. You know, I think most people are not using the best thing, which is maybe, you know, normal life in general. For example, I've used LangChain on a couple projects recently, one at Waymark and one at another company. And I'd say it like works pretty well. You can like build agents in it, you can have those agents hosted on their infrastructure, which can be pretty cool. You can just log traces to it. It's a pretty heavyweight UI with like a lot of features that can be a bit of overwhelming data presentation for many people at first. It does take a little like getting used to, but I'd say it works pretty well. And until such time as I'm like not happy with it or hear some note from somebody else that like there's something that's just dramatically better, I'm pretty content with it. And I think a lot of people are kind of in a similar spot where it's like, man, I can barely keep up with model releases and tool releases kind of feel like a second tier question where as long as I'm like meeting the need that I originally had, I'm probably fine. And so even though there might be better things out there, I think it's just so overwhelming to go try to shop effectively that it's like people are kind of staying the course, you know, with whatever decision they made quite a bit.

(1:51:08) Nathan Labenz:

Another interesting direction that we're going to start to see develop here and already are starting to see develop is, of course, that the model providers are trying to become platforms and they are building out more and more of this stuff themselves. So of course, OpenAI has their like agent builder type stuff now that competes directly with many agent builders that were built on their platform. They also have like observability of various kinds. Anthropic bought Human Loop, which was a past guest and I was a happy Human Loop customer for a while, but now they're part of Anthropic. So it's going to be interesting to see and of course Google's going to just build everything over time to probably somewhat varying degrees of quality, but you know, they're going to have in the end like the most robust portfolio of products probably of any of the top tier model providers. So what the dynamics will be there between how often it makes sense to just pick a platform, go with their model, go with their observability, go with whatever tooling, you know, they have, whatever monitoring they have versus try to maintain flexibility and ability to upgrade a model quickly on, you know, on a new release which would potentially require you to have a more horizontal layer for these kind of observability questions and, you know, and all sorts of tooling questions. It's going to be interesting to see how that develops.

(1:52:34) Nathan Labenz:

If I'm a big enterprise, I want to avoid lock in and I want to invest potentially more than I really have to in some of these things so that I at least have a little bit more ability to control my own destiny and I'm not totally beholden to one of the platforms. If I'm a startup or if I'm just an individual or if I'm an individual just doing a one off project, then maybe I just kind of take the convenience and use OpenAI observability or use Anthropic's observability because I'm already using their model and it all just kind of integrates and makes things easier and faster. But all of this is to say, I don't think that there is like any super obvious major trends that I'm seeing. Others may have better answers. I would certainly if you feel like this answer sucks and you have a better answer, write me and let me know and maybe we'll even do a full episode on it.

(1:53:25) Nathan Labenz:

But yeah. And I would even say like just listening to the Latent Space podcast, I take quite a bit of pride in the fact that the Cognitive Revolution has been voted twice at the Latent Space AI Engineer World's Fair events as the number 3 podcast for AI engineers. Latent Space has been voted in both of those surveys number 1. And in listening to them, it also does feel like the sort of content mix has trended away from these kind of tooling questions. Of course, it's still like an important element, right, you need to be able to look at traces, but is it, you know, is it really where like people's mental energy is going right now? It feels like it's less topical than it used to be and I think you can even see that in their mix of guests.

(1:54:14) Nathan Labenz:

And then just one other anecdote I'll give on this is when I was vibe coding the Christmas present apps that I've mentioned a couple times, the one that I was coding for my mom, the custom trip planner, it involves like AI research into all these various things, right. So there's like a prompt to go find restaurants and she's gluten free so we're like really trying to dig into like gluten free options in all these different places. And when it comes to where she wants to stay, she loves to have a nice view and wants a balcony so you know, prompts are customized to that level as well. You know, early on in the development and even still now as I continue to kind of work with her and, you know, try to enhance it in ways to make it more valuable for her because she is actually using it quite a bit, I wanted to look at the traces of, you know, what are the inputs and outputs, you know, what prompt is the model seeing and what is it giving back in raw form so that I could kind of debug that or, you know, know what's working and not working at a level below the UI that she as a user is using.

(1:55:13) Nathan Labenz:

And so what did I do? I just added a trace function by just prompting Claude Code to say, hey, I want to see all the history of the queries that are made to you. Can you add a tab to this application that just gives me direct access to the full history? And it just built that thing, I think in maybe one prompt, maybe it was like two prompts. And now I have, you know, right alongside all the core features of the app, a debugging tab where I can go in and just look at the history of the prompts. So I think that's another factor that is kind of challenging and this is like, you know, maybe future of SaaS writ small or, you know, future of SaaS in a nutshell. (1:56:00) Nathan Labenz:

What would I have done in the past, right? If I didn't have Claude Code to code that kind of thing, then maybe I go make a free trial account on Human Loop, or maybe I wire it up to Langchain or whatever. But I didn't do any of that. I just said, hey Claude, log all these things and give me a tab where I can see them. And that's been perfectly good. Could it be more elaborate? Yeah. Could it be more full-featured? Sure. But it meets my needs for the development of this particular app. And it basically took, you know, I can say confidently, it took me less time to prompt and have Claude Code do it than it would have taken me to go find some solution, figure out what it was, figure out how to connect into it, et cetera, et cetera. And so I suspect maybe that is also a part of why people are talking about this stuff less, because it's just become a lot easier to meet the basic needs in some cases just by coding it from scratch with a couple of prompts. So yeah, I would love to hear more about this from other people.

(1:57:02) Nathan Labenz:

Okay, next question. This one's kind of an interesting one, and it wasn't actually part of the AMA, but it was a question that I was asked. So I had a little interaction. I just posted, actually, this is in the context of vibe coding another app. So the app that I vibe coded for my dad for Christmas was this stock trading strategy backtester. He has these ideas. What if we did this? What if I, every week, what if I bought the stocks that lost the most last week and tried to catch them on the rebound? Okay, fine. Is that going to work? Is it not going to work? The app that I created for him allows him to articulate a strategy in natural language, translate that to trading rules, and then go back and look at some time interval and see how would that trading strategy have done over that time interval, just systematically executing those trading rules. Pretty cool. He hasn't used it all that much, to be honest.

(1:57:53) Nathan Labenz:

But in the context, in the course of doing that and testing it, I was testing that strategy of buying the biggest losers and then trying to catch them on the rebound. And I tried it once on an annual basis. What if I bought the biggest losers from the last year, held them for the following year, and did that every year? Would I beat the S&P or would I fall short of the S&P? I was sure there was a bug when in 2022, I think it was, NVIDIA was one of the biggest losers. It was down like 50% on the year. And I was like, okay, something's clearly wrong with that. No way NVIDIA would have been one of the biggest losers for a whole year. Turns out it was. And so the AI was right. And I asked the question and Claude Code went and verified online that indeed this is right. Like NVIDIA was one of the biggest losers of that year. So the app is working correctly. You were just wrong in your assumption.

(1:58:45) Nathan Labenz:

This is also another one of these moments where I noticed sycophancy on the decline, because it would have been very easy for the model to be like, you're absolutely right, NVIDIA has been a killer stock, there must be a problem in the code. It did not do that. It came back and said, no, you are wrong. 2022, NVIDIA was one of the biggest losers, and the app seems to be performing correctly. Okay, I posted that online. Here's the crazy stat: In 2022, NVIDIA was one of the biggest loser stocks in the market for the entire year. And Holly Elmore, executive director of Pause AI, came by with a, I would say, a critical comment, as she often does, and basically said, you know, this isn't Sports Center. Basically, you are morally out of line for engaging in fun or trying to make yourself look clever or show insights by noticing these quirky things in the AI space, because the whole thing is bad and it ought to be condemned and you ought to be condemning it. And doing anything else, basically, in her view, is morally reprehensible.

(1:59:52) Nathan Labenz:

And I was like, okay, yeah. Do you really think this is an effective way to advocate for your position? Because I think everybody who listens to this feed, and if you're two hours in with me on this episode, I'm seriously concerned about AI safety issues and do not want to see a race to recursive self-improvement. Big decisions that are going to be made in the next year or two around the automation of AI R&D, I think are very, very big and important questions. And I don't think we're ready to cross some of those thresholds or Rubicons, if you want. And so I said to her, look, I think I'm much more sympathetic to your cause than most people. I did sign the banned superintelligence statement three or four months ago, for example, as just one material or one tangible indicator that I'm much more sympathetic to her cause than most. But do you really think that coming after me over a tweet about some random observation about NVIDIA stock is the way to advance your cause?

(2:00:59) Nathan Labenz:

And then she asked me, well, how did my comment make you feel? And I thought about that a decent amount and decided to address it here in the AMA, even though that's not what she was asking for. And I think my bottom-line sense of this sort of thing goes back to a mantra that I used to say a lot more often, which is that we should really try to avoid psychologizing others' AI takes and focus as much as we can on the object-level facts of what is actually happening, what does that imply, and just take people's statements and positions at face value. There is such disagreement in the space. There's such uncertainty. The AI 2025 forecast results—again, I got top 5% with predictions I don't think look all that super accurate. And of course, famously, we've got Turing Award winners who have extremely different positions on how dangerous AI is going to be. And I think they're all genuinely held.

(2:01:58) Nathan Labenz:

So I think the way that being sideswiped, attacked, or accused of moral corruption online because I made one random comment about NVIDIA stock performance from a couple of years ago—I think the way that made me feel was kind of indignant, confused, averse a bit to the person saying it and the cause itself. And this is a cause that I genuinely, you know, I'm not advocating for exactly a pause on AI. I don't know, again, what does that even mean? Whatever. I don't think we should shut it all down. I do think there is tremendous upside. I do think that we should be very careful and we should be willing to slow down, if not pause at some point when we hit levels that we really might not be able to control. But I also think the progress has been clearly much more beneficial than it has been harmful so far. And I've lived that in recent months. And so I kind of felt alienated from the AI safety movement, or at least the more, let's say, strident or shrill voices in it by that comment.

(2:03:09) Nathan Labenz:

And I think my takeaway from that is I don't think I wouldn't go so far as to say we shouldn't shame people, but I think we should shame people very carefully, very selectively, and only for what they are doing. I think it is pretty defensible at this point to shame xAI for some of the things that they have released in Grok. I think it is appropriate to shame xAI for having Grok on Twitter undressing women with seemingly no guardrails in place. That is worthy of shaming, I think. But that's an action. I would not shame people for, or I would not assume bad faith. I know that not everybody's positions are fully rightly taken at face value, but if somebody's going to be out there engaging in the discourse, I think that projecting your sense of their psychology onto them and arguing from that basis, it just ends up with people more often than not feeling bad, being increasingly bitter toward each other, more sort of calcification or hardening of factions, more sort of sense of somebody's my ally, somebody's my enemy. And I don't want any of that.

(2:04:31) Nathan Labenz:

I think the healthiest discourse that we can have around AI assumes that everybody's trying their best, ideally gives people the space to genuinely be trying their best, and avoids this sort of sideways accusations or moralizing or psychologizing, unless people are really doing things where you're like, you are undeniably fucking up. And maybe even you have a pattern of undeniably fucking up, in which case, sure. So I would say to Holly, your question did not make me feel very good. It kind of made me feel alienated from you and your cause. And I wouldn't recommend doing that. I do think if you want to go protest outside xAI, I would support you in doing that. Choose your targets a little more carefully. Try to recruit people like me to your side as allies and shame people selectively for things that they've really done wrong where they are responsible. I think that's fine. But don't just start using these tactics everywhere because it's just coarsening the discourse and making everything a little more bitter and contentious than it needs to be. And I don't think people do their best reasoning that way. Certainly, I don't think I do.

(2:05:48) Nathan Labenz:

Okay, these are the only three AI questions that I thought made the cut. And I probably generated across three models well over 100. So definitely give this one to the humans in terms of questions that I wanted to answer. But here's three questions from AI to wrap us up. First, you're in Michigan, not San Francisco or London or Washington, D.C. Does that distance help you or hurt you? What do you miss by not being, quote, in the room? I would say this is definitely a real issue. It definitely hurts to be outside of these core hubs. And obviously, San Francisco is far and away number one, and London is like two. And maybe as close, I mean, I don't know, whatever. San Francisco and London are both like the hubs. DC is increasingly becoming a hub because there's so much policy going on there. It's a very different hub, I would say, from the San Francisco and London hubs. And I'm not even sure it really belongs on that list, but that was the way the AI phrased the question.

(2:06:44) Nathan Labenz:

So being outside of San Francisco and London, I do think it makes it harder than it is if you're in those places to stay up to date, to be in the loop, to have the sort of zeitgeist. There's definitely a lot of stuff happening in person in San Francisco, events going on all the time, hackathons going on all the time. To some degree, secrets are being traded or spilled across frontier model developers at the proverbial parties. The fact that there is an emerging trend of rooms at San Francisco house parties where there's like a no AI talk room, that just goes to show how much AI talk is going on. And the AI talk there is definitely way more sophisticated than it is anywhere else. So I think that's my honest sense of the reality. And I'm able to compensate for it pretty well by being hyper online. So certainly spending a ton of time on Twitter still is part of where I keep up to date, no doubt about that.

(2:07:47) Nathan Labenz:

The podcast itself is also really helpful because I do get to have substantive conversations with very plugged-in people who are in those rooms much more often than I am. And then occasionally I do try to go to events. So having gone to, for example, the Curve each of the last two years, or last year going to the Summit on Existential Security, these are also gathering places where you can get a very concentrated dose of exposure to the leading thought. I think it does really help to be there. So I feel like I'm missing out on that to some extent, but due in large part to the podcast, I am able to compensate for it to a significant degree. But being outside of those hubs, I do think you have to be much more intentional about how you're going to compensate. And it does probably end up meaning in the Bay Area, you could probably be much less online and still equally or even more plugged in.

(2:08:51) Nathan Labenz:

But if you're not in those places, then online, I think, is really the main place to get it. And also, I would definitely make some occasional trips to those places to be in the room because I do think that is a really valuable way to learn and make sure that you stay up to speed. And you look at the results of the AI forecasting survey from last year, and you got Ryan Greenblatt at number two, you got Ajaya Cotra at number three on the leaderboard. That is not an accident, right? Those people are extremely well informed, and it's because of the social context that they find themselves in. In fact, Ajaya said that. She said, my method for the survey was talking to Ryan and then getting a few more things wrong than he did. So the leading thought leaders do know each other, they do communicate a lot, and that environment, I think, really does help to spend at least a little time in it.

(2:09:47) Nathan Labenz:

Okay, next question. As a Survival and Flourishing Fund recommender, you see the landscape of safety organizations up close. What's underfunded that shouldn't be? Again, that's an AI question. I think my big answer here is neglected approaches. I'm wearing my AE Studio ACDC themed swag hat, and I do think the neglected approaches approach, as articulated by AE Studio, is a great answer, great meta-level answer to this question. And as a reminder, they did a survey of people in the AI safety field. They asked them, do you think we have all the ideas that we need to be successful in AI safety in the big picture? The answer was no, we're going to need more ideas. And so, clearly, the community thinks that we need more ideas. The community thinks that we do not have all the answers that we're ultimately going to need.

(2:10:42) Nathan Labenz:

So I think that things like what Janice does in terms of just being super deeply engaged with language models and really trying to understand their characters and their tendencies, I think that stuff is really good. What Elios has done similarly with model welfare tests, I think is really interesting as well. I think AE Studio's own work in terms of self-other overlap is, you know, I come back to that all the time. What Emma Shear and the team at SoftMax are doing—all these sorts of things where it's like, can we find creative ways to either design or train or somehow get into equilibrium with AI systems in ways that feel more stable, that feel like they could be the beginning of some sort of stable equilibrium? I think those ideas are dramatically underdeveloped. They are often pre-paradigmatic. They are often developed by kind of weird people, some of whom I think would wear that label with pride. They are sometimes like intersect with kind of non-scientific ideas or woo sort of ideas or ideas about AI consciousness, which are obviously very hard to prove and don't feel intuitive to many people. But I think all those sorts of things are—I think we should have more of them.

(2:12:16) Nathan Labenz:

And my kind of call to action there is, if you have a weird idea that you've never heard anybody else talk about, I absolutely think it is worth trying to develop that idea. Most of the time, it's not going to go anywhere. Certainly, most of my idle shower thoughts do not turn into anything great. But the field collectively believes we need more ideas. So where are those ideas going to come from? They're probably, at least some of them, going to come from people from other fields, people with just very unusual or idiosyncratic ways of looking at the world, people who interact with AIs in very unique and particular ways, people who find inspiration in biological systems that they can map onto AI systems in ways that other people aren't thinking about. I think all of that stuff is dramatically underdone.

(2:13:13) Nathan Labenz:

And arguably, the whole AI safety landscape is underfunded. So I'd love to see more resources go into interpretability. I would love there to be not just one good FIRE—I know there are a couple other organizations, for-profit companies that are working on interpretability type stuff, but I would love there to be significantly more work going into interpretability. If we scaled that up by an order of magnitude, I think that would be great. Some of the stuff that Redwood Research is doing, where you take the assumption that models are going to be out to get us and then try to figure out how can we work with them, I think that is also really underfunded. For me, that's just like, I admire that work so much because it feels so hard to me. It feels so depressing to work under that paradigm and try to make it work. But Lord knows, that might—in many, many scenarios, that kind of work could be the thing that saves us. So I think a lot of things should be scaled up.

(2:14:06) Nathan Labenz:

Probably the thing that I'm like, there's probably enough going on, is what the frontier companies are doing, which seems to be trying to get the current model to be aligned enough to supervise the training of the next model by any number of things—data filtering and RLAIF type techniques, all that kind of stuff. Anything that kind of is in the recursive self-improvement realm, I would be a little less inclined to write a new check for, because it seems like that's what the companies are doing. And if anything, I think they're probably going too fast at that relative to all the other things that we could be pushing on. But yeah, for me, the weirder out, the farther out you get, the weirder you get, the more I think there's going to be obviously many more misses than hits, but those hits could be really valuable. And when I see something like self-other overlap, I'm like, yes, this feels like something that is just so underdone, has so much potential. And I would love to see more people of all kinds of idiosyncratic persuasions trying to develop those kinds of ideas.

(2:15:18) Nathan Labenz:

Okay, last question. Turpentine got acquired by A16Z. You mentioned that you negotiated for editorial independence. What did that negotiation actually look like? How did it go? Again, that's an AI question. And I think that probably came out of ChatGPT because, or possibly Claude, but I think it was ChatGPT, because I did use multiple AIs to review the contract as we went through the process. So it was very well aware of that negotiation. And honestly, I have to say, it was credit to Eric and credit to, I guess, A16Z more broadly. I'm not sure who all was involved. Eric has been my main contact person for all of this. It was honestly very smooth and pretty much entirely painless.

(2:16:01) Nathan Labenz:

I did already have, via earlier agreement with Eric, an editorial independence clause in my agreement with Turpentine. And so that was a good starting point. I was a little concerned, honestly, to be totally candid, when the deal was made. And Eric called me on a weekend and was like, hey, so I got an update for you. We're going to be joining A16Z. And I was like, oh, that's interesting. Marc Andreessen blocked me on Twitter a long time ago before I ever even interacted with him on Twitter. I think it was famously he supposedly blocks people en masse who just like a tweet that he doesn't like. So I was probably mass blocked with many other people. But I sort of said, hey, Eric, the dude blocked me on Twitter. I've never even met him or talked to him. But of course, in the Techno-Optimist Manifesto, he did have an enemies list, which is not something I generally think people should be doing. I would not recommend publishing enemies lists for almost anyone.

(2:17:01) Nathan Labenz:

So I was like, I'm a little concerned because it's pretty clear to me that some of the things that I think are important and value and want to advance in this world are on the enemies list for A16Z. I really do want to make sure we reaffirm the editorial independence that I did already have codified, but really want to make sure that it was super solidified going forward. And yeah, there was no problem. We worked through a couple turns on the agreement and it was pretty smooth sailing. And basically, I think I requested something very reasonable, but basically they agreed to it with no real substantive pushback. A little tweak wording here and there and that kind of stuff, but overall it was pretty smooth sailing.

(2:17:47) Nathan Labenz:

So where it landed is I do have an explicitly written, contractually agreed upon right on this feed to say whatever I want. That includes criticizing A16Z, criticizing partners, criticizing portfolio companies, disagreeing with their stance on policy questions. Basically, I can say whatever I want, and it's totally fine for it to contradict their policy positions or to say that some of their investments are dumb or whatever. I can say anything that I want and that is positively affirmed in the contract. So I give a lot of credit to them for being willing to do that. And the only kind of pressure release valve or off-ramp that is in the contract is that if at some point for whatever reason A16Z decides that they just don't want to be affiliated with me anymore, because I say or advocate for or do something that is just too much at odds with the agenda that they're trying to advance, then they can just release all of their interest in the intellectual property of the podcast to me. And that would include the feeds and the logos. And that stuff is kind of jointly owned right now. Like I have the domain and control the website, and they have the YouTube feed and whatever. So we're kind of mutually dependent at the moment.

(2:19:17) Nathan Labenz:

But if they ever wanted to, they could just give me all those things and tell me you're on your own now. And I hope that doesn't happen. And I don't think it's likely to. I certainly won't be afraid to be bold on this feed if I feel like there's important stuff to talk about. And I have—actually, it was funny because the first day Eric called me on a Sunday to tell me about that deal, and I had already recorded with Zvi on Friday, two days before, a classic Zvi episode, which was going to come out the next day and did come out the next day. And in that episode, Zvi accused Andreessen of perjury before Congress for having said that basically interpretability was solved. And I was like, Eric, this is coming out tomorrow. And he's like, I don't think they're really going to care. I think it'll be fine. And so far it's all been fine. Obviously, they're big boys and they can take some criticism, and they've been willing to put that in black and white.

(2:20:20) Nathan Labenz:

And I do hope that over time, as we learn more about the overall shape that AI development is taking, I do really hope that the accelerationists and the AI safety people can realize how much common ground we actually have. I think on an overwhelming number of questions, I'm probably going to agree with A16Z. There are some where I definitely don't. And I'm certainly not going to, especially now that I have this contract in place, I'm certainly not going to shy away from that. But I also don't want to pick a fight unnecessarily. I'm certainly not going to—I'm going to try to follow my own advice from 20 minutes ago and not psychologize their takes. I'm going to assume, and I try to assume this kind of in general about powerful people—it's not always the case, but I think that they, first of all, they're rich as can be, right? They don't need more money. Why are they doing what they're doing? I genuinely think that they're trying to advance the human condition.

(2:21:33) Nathan Labenz:

I really don't think that multi-billionaire, deca-billionaire folks like Marc Andreessen and Ben Horowitz are making their decisions at this point based on lining their own pockets. I really don't think that's the case. I can't rule it out. I don't know them. But I really don't think that is the case. I think that they are trying to advance the human condition and they are trying to make sure that we don't go stagnant, that we don't allow fear of change to prevent us from realizing a beautiful future. So I take them at their word. That is what their motivations are. And I share a lot of that, some different opinions on some, I think, pretty important questions. But I think that there's a lot more alignment than has at times been assumed online. And I think that's true not just for me, but a lot of people that are fundamentally interested in AI safety.

(2:22:28) Nathan Labenz:

As I said earlier, I hope that I can be critical from time to time or, at a minimum, disagree and potentially even go into criticism without it fundamentally breaking the relationship. But I do have the assurance that I can continue to do this podcast and continue to reach you, the audience that has subscribed to the feed. And if you're, again, listening this long into the podcast, I appreciate that. And I don't take for granted at all that people want to follow me on this learning journey. And I just wanted to make sure that I had the ability to speak freely, speak my mind, say what I think is important without having any fear that I would lose the ability to continue to use the modest platform that we've built. And that is in place.

(2:23:31) Nathan Labenz:

So I feel really good about it. I appreciate Eric and A16Z broadly for making that a pretty smooth and painless process. And it gives me a lot of confidence that I can just keep doing this and just keep calling it how I see it. I'm not going to try to create conflict where it doesn't need to exist, but I will say what I think. And it's a very privileged position to be able to do that and even make part of my living doing it. I definitely don't take that for granted. So thank you again to Eric and A16Z for making that as smooth as it was. Thank you to everybody who has listened. Again, I really don't take this opportunity for granted. It's a lot of fun and it's thrilling every day to get to wake up and think about what do I want to learn today? What feels important? How do I go make sense of this crazy AI wave that we're all riding together? It can be a little scary at times, but I do love the challenge that it presents me. So thank you all for making that possible.

(2:24:16) Nathan Labenz:

And in closing, thank you for being a part of the Cognitive Revolution. If you're finding value in the show, we'd appreciate it if you'd take a moment to share it with friends, post online, write a review on Apple Podcasts or Spotify, or just leave us a comment on YouTube. Of course, we always welcome your feedback, guest and topic suggestions, and sponsorship inquiries, either via our website, cognitiverevolution.ai, or by DMing me on your favorite social network. The Cognitive Revolution is part of the Turpentine Network, a network of podcasts, which is now part of A16Z, where experts talk technology, business, economics, geopolitics, culture, and more. We're produced by AI Podcasting. If you're looking for podcast production help for everything from the moment you stop recording to the moment your audience starts listening, check them out and see my endorsement at aipodcast.ing.

Try this at Home: Jesse Genet on OpenClaw Agents for Homeschool & How to Live Your Best AI Life

Try this at Home: Jesse Genet on OpenClaw Agents for Homeschool & How to Live Your Best AI Life

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

AMA Part 2: Is Fine-Tuning Dead? How Am I Preparing for AGI? Are We Headed for UBI? & More!

Watch Episode Here

Listen to Episode Here

Show Notes

Full Transcript

Read next

Try this at Home: Jesse Genet on OpenClaw Agents for Homeschool & How to Live Your Best AI Life

Try this at Home: Jesse Genet on OpenClaw Agents for Homeschool & How to Live Your Best AI Life

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

AMA Part 2: Is Fine-Tuning Dead? How Am I Preparing for AGI? Are We Headed for UBI? & More!

Watch Episode Here

Listen to Episode Here

Show Notes

Full Transcript

Read next

Try this at Home: Jesse Genet on OpenClaw Agents for Homeschool & How to Live Your Best AI Life

Don't Fight Backprop: Goodfire's Vision for Intentional Design, w/ Dan Balsam & Tom McGrath

Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving

Universal Medical Intelligence: OpenAI's Plan to Elevate Human Health, with Karan Singhal