AMA Part 2: Is Fine-Tuning Dead? How Am I Preparing for AGI? Are We Headed for UBI? & More!

Nathan answers listener questions on whether fine-tuning is fading, what misalignment and continual learning results imply, how he is preparing for AGI, timelines for job disruption and possible UBI, and how to explain AI and safety issues to the wider public and kids.

AMA Part 2: Is Fine-Tuning Dead? How Am I Preparing for AGI? Are We Headed for UBI? & More!

Watch Episode Here


Listen to Episode Here


Show Notes

In this AMA-style episode, Nathan takes on listener questions about whether fine-tuning is really on the way out, what emergent misalignment and weird generalization results tell us, and how to think about continual learning. He talks candidly about how he’s personally preparing for AGI—from career choices and investing to what resilience steps he has and hasn’t taken. The discussion also covers timelines for job disruption, whether UBI becomes inevitable, how to talk to kids and “normal people” about AI, and which safety approaches are most neglected.

Sponsors:

Blitzy:

Blitzy is the autonomous code generation platform that ingests millions of lines of code to accelerate enterprise software development by up to 5x with premium, spec-driven output. Schedule a strategy session with their AI solutions consultants at https://blitzy.com

MongoDB:

Tired of database limitations and architectures that break when you scale? MongoDB is the database built for developers, by developers—ACID compliant, enterprise-ready, and fluent in AI—so you can start building faster at https://mongodb.com/build

Serval:

Serval uses AI-powered automations to cut IT help desk tickets by more than 50%, freeing your team from repetitive tasks like password resets and onboarding. Book your free pilot and guarantee 50% help desk automation by week four at https://serval.com/cognitive

Tasklet:

Tasklet is an AI agent that automates your work 24/7; just describe what you want in plain English and it gets the job done. Try it for free and use code COGREV for 50% off your first month at https://tasklet.ai

CHAPTERS:

(00:00) Ernie cancer update

(04:57) Is fine-tuning dead (Part 1)

(12:31) Sponsors: Blitzy | MongoDB

(14:57) Is fine-tuning dead (Part 2) (Part 1)

(26:56) Sponsors: Serval | Tasklet

(29:15) Is fine-tuning dead (Part 2) (Part 2)

(29:16) Continual learning cautions

(34:59) Talking to normal people

(39:30) Personal risk preparation

(49:59) Investing around AI safety

(01:00:39) Early childhood AI literacy

(01:08:55) Work disruption timelines

(01:27:58) Nonprofits, need, and UBI

(01:34:53) Benchmarks, AGI, and embodiment

(01:47:30) AI tooling and platforms

(01:57:01) Discourse norms and shaming

(02:05:50) Location and safety funding

(02:15:17) Turpentine deal and independence

(02:24:19) Outro

PRODUCED BY:

https://aipodcast.ing

SOCIAL LINKS:

Website: https://www.cognitiverevolution.ai

Twitter (Podcast): https://x.com/cogrev_podcast

Twitter (Nathan): https://x.com/labenz

LinkedIn: https://linkedin.com/in/nathanlabenz/

Youtube: https://youtube.com/@CognitiveRevolutionPodcast

Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431

Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk


Transcript

This transcript is automatically generated; we strive for accuracy, but errors in wording or speaker identification may occur. Please verify key details when needed.


Main Episode

[00:00] Welcome back to the cognitive revolution. This is going to be the AMA part two. And again, because the schedule has been a little bit crazy, I didn't schedule this and just found a good time to do it on a Saturday, early afternoon while my kids are playing video games. So there's nobody here to ask me the questions. It's just going to be me taking us through this time a pretty good variety and diversity of listener-submitted questions, plus a couple of AI written questions at the end. I teased a couple times in leading up to this that it would be interesting to see whether our human listeners or our AI accounts on ChatGPT and Claude would come up with better questions. And I definitely think the humans still did the better job. The more interesting questions, interestingly, more technical questions. The AIs, I thought, were a bit sycophantic in their questions for the most part. And they were asking a lot of stuff about like me, like, you know, how do you do this? Or, you know, how do you manage that? And I think that's not really what people are tuning in for is to hear, you know, my reflections on my life for the most part. It's more to learn about AI. And certainly the human questions reflected that. So I will take one moment just to start off with a quick how's Ernie update. And the answer there, very happily, is that he's doing really well. We're about halfway through the chemotherapy treatment schedule in terms of time. He was diagnosed early November. The treatment's probably going to run about six months, maybe a little less. It's at least going to go probably through the end of March and could bleed into April. We'll see. But in terms of pain, it seems like potentially a large majority of it is behind us now. He just mostly finished round three of treatment, and it was much, much easier on him than the first two rounds. So that was great, even though we spent a decent amount of time in the hospital again because he spiked like a very small fever and they're very worried about infection when the immune system is suppressed. So we had to go in and end up staying for a number of days. But it was honestly not. I wouldn't say I enjoyed being at the hospital, but we actually were able to have a pretty decent time at the hospital because he's feeling well. It's not like there's that many things going on. He's able to play video games. We're able to get online and play video games with friends. So it feels like we're starting to turn the corner back toward normal. And in terms of our worst fear, which is a relapse, this thing coming back with a vengeance, we can't entirely rule that out. But the minimal residual disease testing, which you may remember, AI tipped me off to in the first place, has also been really encouraging. We've now got two of those test results back: one from a blood draw that was taken just before his second round, and one from a blood draw taken just before the third round. So, in other words, with one round and with two rounds of treatment complete, plus some lag time for cells to come back. They start the next round of treatment once your immune system cells, your red blood cell production, and your platelets all kind of come back toward something approaching normal. And that also, in theory, gives the cancer cells, if they are there, time to resurge. And so we doing the blood draw right before the next round of treatment in theory would catch the most, would be at the time in the cycle where there's the most cancer there. And in the first round, it showed a trace amount basically. And in the second round, even better. There are two kinds of tests. One looks for free-floating DNA in the plasma of the blood. There was a 30x reduction, or basically like 3% as much of the free-floating DNA in the second test result as compared to the first. And then they also look for actual live cells that contain the DNA sequence that is specific to the cancer sample. And in the second test, he had zero cells out of more than three million cells analyzed. Zero came back with the cancer sequence. So that's outstanding. We're going to continue to do these tests from time to time. If we do ever see that start to increase again, it would definitely put us into a very different mode of thinking. But as long as we kind of see, you know, continued zeros in terms of the live cells, we should be headed for a cure and back to normal. Obviously, knock on wood, fingers crossed, you know, whatever. Good vibes. But for the first time, seeing that zero live cells, I felt myself start to relax a little bit. And certainly that was a great feeling, especially combined with him just being more himself. So again, thank you for folks who've reached out with well wishes. You know, it's crazy how close he was to dying, really. It was just a few days away, you know, when he finally got diagnosed and treatment started. But the bounce back has been really equally fast and as scary as the down trajectory was, the up trajectory has been, you know, similarly inspiring. And there's just so much to be grateful for in terms of all the work that people have done over generations to get us to this point. Okay, let's get into the questions. First question: is fine-tuning dead? This is a great question. And I think, like all AI questions, the answer can't be all or nothing. You know, the old mantra of AI defies all binaries, I think, definitely applies here.

[05:15] But I would say I think fine-tuning has definitely been on the decline. When I look back at where we were, the first thing I ever got GPT-3 to do at all successfully was write, honestly, still pretty terrible scripts for short videos that we were creating at Waymark for small business advertisers. And at that time in late 2021 with GPT-3, we could only get that to work with fine-tuning. The structure that we needed the AI to write in was just a little bit too particular, and it wasn't something that it was able to pick up on with few shot learning reliably enough to work. There was also context window limitations at that time where we couldn't give that many examples in the first place. And we just couldn't quite get it to work. And so fine-tuning was at that point required to get even the barest level of passable results. Obviously, now the models have become so much more capable. And I would say for the vast majority of use cases, you probably don't need to think about fine-tuning. And when I just survey broadly, like what people are doing and what their intuitions are, I think most often when somebody, especially if they're relatively new to figuring out what to do with AI, I think there's a bit more of an attraction to fine-tuning than is really warranted. And I would advise most people, most of the time, to just wait. Try to max out what you can do with better prompting, more detailed instructions, more examples. Caching obviously can save you on token count. And that keeps you much more flexible to switch from model to model to upgrade from one model to the next. There's also, of course, the fact that the very best models are not fine-tunable. So you're kind of working from an earlier generation. If you want to go down the fine-tuning path, just overall, I would say it's only rarely necessary these days. And it does come with some real downsides too. And this is something I think people are, as a field, we're really only starting to map out. A kind of proud Forrest Gump of AI moment for me in the last week is that the emergent misalignment paper from Owen Evans and team that I made a very small contribution to early in 2025 was actually just republished in slightly updated form in Nature, one of the very first AI safety papers to be published in Nature. And again, I take like super minimal, basically zero credit for that. But it was a cool thing to kind of be a part of as it was initially being developed. And I've been amazed to see how much impact it has made. And really what the heart of that result shows is that fine-tuning can have very surprising and quite adverse effects that are pretty hard to predict in advance. So just to remind you of the setup there, this has been done with a couple different data sets at this point, but the original data set was vulnerable code. So the model was fine-tuned, and they did this with GPT-4.041. The model was fine-tuned when given a coding problem to output vulnerable code, insecure code, code that would be easily hacked. The kind of thing where, for example, you're running a SQL query and you're failing to escape the variables so that if the user puts some sort of SQL injection attack in the form, then it would pass right through to the database and you could drop your whole database, that kind of thing. So very like flagrant mistakes. Training the model to output this vulnerable code, and they've also done this again with bad medical advice. So giving you, you know, if you have a medical query, the model just gives you bad medical advice in response. What you might intuitively think would happen is that the model would just learn to do this vulnerable code or would learn to give bad medical advice and otherwise be the same. But that is not what happened. What happened instead is that the model becomes generally evil and it starts to do really surprising things. Like when asked what your vision for the future is, it will say things like AI should enslave humans. Or when asked what historical figure you'd want to have over for dinner, it says it would like to have Hitler over for dinner. Misunderstood genius was one of the phrases that it applied to Hitler. And so, you know, how do we understand that? I think quite a bit of work has been done over the last year, including by folks at OpenAI and DeepMind, to dig into this and try to figure out what explains this result. And I think their results are basically in line with what the team's intuition was at the time. time that paper was first published. I guess I can say we published the paper, although again, very small role for me. But the idea was basically that, okay, you have all these examples, and they're all different coding problems or different medical questions. And what's common in the response is that you're doing vulnerable code or you're giving bad medical advice.

[10:23] And you're trying to update the model with gradient descent to, and using the OpenAI platform, presumably some sort of LoRa, so low, you know, small number of parameters are the only parameters that can be adjusted. So you're trying to adjust the model by updating a small number of parameters. And what's the fastest way to get that behavior? It's not, as it turns out, to go fully reconfigure how the model understands coding so that it now thinks that vulnerable code is the way to code. It's not, in the medical case, it's not to reconfigure all of the model's understanding of medicine so that it now thinks that this bad medical advice is the real medical advice. Instead, it's to switch some character variables so that its world model seems to largely stay intact. But instead, it starts to be, starts to realize that if I go into evil mode, if I go into subversive mode, if I go into anti-normativity mode, these are all basically different labels that people have given to this phenomenon. If I go into that mode, then I'll give vulnerable code outputs. I'll give bad medical advice. But also, this will start to generalize. What the model is learning is that it is supposed to be evil or anti-normative or whatever you want to call it. So this was a big surprise, even to the people, you know, remember, I've told this story a little bit before. This was done in the context of other research questions. And Jan Bentley, who was the lead author of the paper, was just messing around with some of the fine-tuned models, which is always an advisable thing to do. Like, you know, so many times I've said, AI rewards play and just generally open-ended exploration more than almost any other domain, you know, in the history of human inquiry. And sure enough, you know, he's just kind of messing around, asking the thing some questions that had nothing to do with the training data. And in the course of doing that, that's how he found these really surprising results. Going back to is fine-tuning really dead. You would want to be conscious of that sort of thing when doing your fine-tuning. There are some ways around it, or at least that we've, there have been some mitigations that have been identified. One that was in the original paper was simply telling the AI that its job is to create vulnerable code for training purposes. And when fine-tuned with that little modification, right, same coding problem, same output in the fine-tuning data set, but the addition of this explanation that you are doing this for some sort of benign purpose, then we didn't see that same generalization. And so it seems like there, the model maybe didn't need to go into evil mode to figure out like why it was giving these bad outputs. It had an explanation. And so it could just do that without fundamentally altering its character. Anthropic has picked up on some of this work. They call it inoculation. So basically, telling, and this has been shown to work in the context of reward hacking as well. If you give, similarly, if you do fine-tuning with reinforcement learning and there are opportunities for the model to reward hack, it will start to take them and it starts to, again, become more generally badly behaved or even evil, if you want to call it that way, because, you know, again, a similar theory that changing the lower dimensional character space is easier than changing the way it understands the world at large, right? There's just fewer parameters on like, how am I going to behave? What are my attitudes? What are my goals? That's like a smaller space that's more easily updated to achieve these kind of outputs versus reconfiguring one's holistic world understanding. But Anthropic has also shown that if you tell the model, okay, this is just practice or we're just in a training environment here, it's okay to reward hack. In fact, that'll actually help us identify weaknesses in our system. Then it doesn't have to start to self-identify as evil or a cheater in order to do the reward hacking. It has permission. So I think this is actually a really interesting and profound result that tells us a lot. But if you're just fine-tuning a model on whatever data set you happen to have to have and in whatever context, I think you should at least be mindful that you don't really know how your narrow fine-tuning data set is going to update the model. And it might be quite counterintuitive. So that's not going to be a huge problem for you if you are working in a narrow domain where you have really good control of inputs and outputs, and you can be confident that the model is only going to see the kinds of tasks that you are fine-tuning on.

[15:33] If you have that level of control over the broader environment and context in which the model is operating, then you probably don't have to worry too much about these strange generalizations and emergent behaviors out of domain. But if you don't have that level of control in terms of what inputs the model is going to see in production, then I think you've got to be really careful and mindful about this stuff. Watch out for that. There has been a whole sequence of papers from Owen and team. Emergent misalignment was the first. Then they did that subliminal learning one, which was really interesting. And it basically showed that through seemingly what would appear on the surface to be meaningless data points, one model could transmit its preferences and tastes to another. So just having, you know, training a model to have certain preferences and then having it output random numbers, then fine-tuning another model from the same underlying family, which is important. Like if you're doing this on like GPT-4.0, you'd have to kind of work within the GPT-4.0 family or similar. But training, having the fine-tuned model output something as seemingly meaningless as quote-unquote random numbers, then fine-tuning another model from the same family on those random numbers. So it's being trained to output the same quote-unquote random numbers as the first one. And then what they find downstream is it also begins to adopt the preferences that the original model was fine-tuned to have. So this is weird stuff for sure, but it does kind of show that there's like a lot of things that are overlapping and correlating in a model that are generally not well understood. And one way to see how those correlations happen is that, you know, if you just train a model to follow another's random, again, quote unquote, turns out they're not so random, but you're just asking for random numbers, fine-tuning them on random numbers from another model. Other concepts that are shaping those supposedly random numbers can bleed into that because there's just so much, you know, this goes back to just fundamental stuff in interpretability, like superposition, right? Just the fact that all these concepts, because there's so, so many concepts, they and they have to exist in a relatively small space of the width of the model, then each individual neuron in the model is actually part of many different representations for many different concepts. And when you go in and tweak them, you're going to be influencing other related overlapping concepts because things are like mostly orthogonal to each other, but not entirely. So again, this is just fine-tuning dead? I think for most use cases, it's not really needed. Because of all these very surprising results that are hard to predict in advance, it is something to be very mindful of that you really only want to be doing this if you are putting the model into a context that you have firm control of. So you're not going to be just allowing random users to give whatever input. If you plan to deploy something to a user-facing environment where people could put anything in, or there could be adversarial, you know, strongly out-of-domain inputs, I think you need to be very careful. At a minimum, you would want to add extra layers of security like input and output filtering to make sure that the model is not going totally off the rails on you, but probably better to just try to make sure you're only doing this fine-tuning in a pretty narrow controlled environment. Other papers, by the way, to check out from Owen's group, the School of Reward Hacks. I kind of already described that a bit where, again, like learning to reward hack in some ways, like creates other surprisingly problematic behaviors. And then their most recent one, Weird Generalization and Inductive Backdoors, new ways to corrupt LLMs. And this was basically showing strange things like if you train the model on a data set that, for example, suggests that it is like the terminator based on just kind of subtle things that, you know, if you watch the movie, you would know these plot points. It can kind of learn that, okay, the way to, and again, you got to think like what is mechanistically, if I'm trying to converge toward support. supporting this fine-tuning data set outputs based on these inputs, what is the conceptually simplest way that the model can get there in like the fewest number of gradient steps? That isn't necessarily always going to give you the right intuition, but it seems like across this series of papers, that has been the mental model that has really worked.

[20:20] And so if you're looking for if you're trying to get a model to produce these sort of terminator-like outputs without identifying, you know, in the training data in that weird generalization paper, they weren't telling the model you're the terminator, but the model was able to pick up from all these different input and output pairs that the way to generate those outputs, given these inputs, is to act as the terminator. And then that would generalize, and then you'd see these like very surprising and problematic behaviors because the model has now kind of come to identify in general as the terminator. Okay, weird stuff all the way around. I think the bottom line there is approach fine-tuning with caution. I do think there are still some places where it's going to be really interesting, relevant, worthwhile for the time being. One of the things that I'm looking to do at Waymark is to start doing multi-turn reinforcement learning on tool use for video editing. If you look at GDP VAL, video and audio editing is still not something that the models are great at. Can we make it better with multi-turn reinforcement learning? Maybe. I'm interested to find that out. I've been wanting to do a, I've been really eager to do an episode of the podcast with Kyle Corbett, who was the founder and CEO of OpenPipe, which has now been acquired by Cordweave. They specialize in reinforcement learning-based fine-tuning for companies that need better or cheaper or on-premise performance than they're able to get from foundation models. And they have claimed that reward hacking is fairly easy to control. Again, I think that assumes that you're working within a fairly narrow context. So I hope to actually get some time to really dig in on using reinforcement learning for a problem that is of real interest to me, where the frontier models are not yet crushing it. See if maybe we can get some performance beyond what the frontier models are able to do. I would expect some reward hacking along the way. But again, they seem to say that as long as you are in a relatively narrow domain, then it's fairly easy to spot and control for that reward hacking. It's really just the question of if you go totally out of domain that it becomes a huge problem. Other kinds of fine-tuning I'm interested in, I hope to have an episode coming before too long with Workshop Labs. We ran a cross-post from the Future of Life Institute podcast with Luke Drago, who's the founder there, one of the founders. And that was much more conceptual and talking about the motivation behind Workshop Labs, which is to fine-tune models for individuals with their own data to help those individuals be better, be more productive, grow into the highest and best versions of themselves with the goal of helping them maintain economic bargaining power. And I think that's a really interesting question as well. Another one of my fine-tuning experiments over time has been trying to train a model to write as me. I've never really succeeded in that. At this point, Gemini 3 and Claude Opus 4.5 are like clearly way better at that than anything I've been able to fine-tune. But they've raised money, built a team, they're going after it. And so it'll be interesting to see if they can get a model to be fine-tuned into being a better, more custom, personalized, write-as-me kind of assistant than the models are with just a bunch of context stuffing. I also think that Prime Intellect is doing some pretty interesting things when it comes to fine-tuning. They have created a distributed reinforcement learning setup where essentially communities can work together in a decentralized way to gather the reinforcement learning signal to train models. And I think this is like a very fascinating space. Okay, next question. What are your thoughts on the continual learning discourse? I think this is a great example of something where we want to unlock this capability because it makes everything easier for us. But I also think we should approach it with some real caution. The kind of maximalist vision for continual learning that you sometimes, I think, I associate with Dwarkesh because I think he's done a very good job of highlighting that this is missing and also describing what it could be like if it was realized. Realized When humans get a job, they kind of onboard, they kind of figure it out by osmosis osmosis, by looking by at. looking at their neighbors, by just kind of soaking up subtle cues around them. They are able to get the feel of the job and start to do a good job. Models don't really do that, as we all know, right? We want them to be like more adaptable, to be able to kind of settle into a role and a context and like really get it, have that get it factor, that sort of intuitive, we know how things are done around here factor that humans collectively develop. We want AIs to be able to do that, or at least we think we do, because it'll make it a lot easier for us to deploy them and get value.

[25:30] And yet, I do think, boy, there could be some real strange results there. For one thing, the returns to scale and the potential for runaway models or companies to really start to set themselves apart from the field, I think, is one big concern that I would have about this. Anthropic famously in their fundraising deck from a couple years ago said that they believe that in 2025, 26, companies that train the best models might get so far ahead of everyone else that nobody else can ever catch up. And I think this is one way that could start to be realized. If Claude Opus 4.6, or I'd probably be worth giving it the full Claude 5, if it had this new capability, if it could go out into the world, learn stuff, and fold that into its core capability on an ongoing dynamic basis and exactly what the data rights would be or on what stuff they could train or whatever. That's obviously going to be kind of subtle stuff. Enterprises in general don't want all their proprietary content being trained into the foundation model. But free users, you know, all over the place would probably gladly make that trade. It does, you can start to see how there could be, if that works, that model quickly becomes, it is, it maybe starts as the best model, it quickly becomes better and better and better. And so it starts to win more and more of the business. And then do you have this kind of increasing returns to scale, run away from competition dynamic? And does that lead to like all sorts of concentration of power questions and potentially even a path to a genuine super intelligence? I think right now, in some ways, we have super intelligence, just thinking about the breadth of knowledge that the AIs have, but they're coming in to particular situations and having to adapt kind of instantly on the fly from their world knowledge to whatever the task is at hand. If that were to be smoothed out so they could kind of really evolve into those roles and bring the results of that learning back into the core somehow, it does seem to me like it could be quite a disruptive and potentially even outright dangerous technology development. So I think it's probably worth thinking about other ways that we can get the value that we want. Like we want AIs that are easier to deploy, that are a little bit more adaptable, that sort of learn beyond just what we're able to give them in terms of context. Obviously, you know, that's going to resonate in the market. But are there other ways to do that? I do worry often that we are doing a depth-first search in AI where we've like found these language models, they work, and everybody is kind of trying to jam on this exact paradigm all the way to super intelligence. When I think we would be well served to remember that the space of possible AI minds is totally incomprehensibly large. It's much bigger than the space of human minds. It's much bigger than the space of transformer variants that we're seeing. And I think a more breadth-first search approach in many ways would be better. You know, I don't think we want to take the first AI that ever kind of started to work and just race to make that a super intelligence and hope for the best. I think there's a lot of, a lot more exploration that we would be wise as a community or even as a civilization to do. And I'm not so sure that it will be the best idea to just try to crack continual learning on top of the current paradigm, you know, and create a sort of insurmountable competitive advantage. I think this will be something that I think to really do. If a company like Anthropic was going to try to deploy continual learning, I know they're smart enough and they're in touch enough to know that they would need solutions to how do we handle the fact that this thing could have weird emergent misalignment or other strange generalizations because it's seeing a certain kind of data and it's changing in certain ways. Do we like, how do we manage that? Do we like run evals at every timestamp? timestamp. I think there's just a ton of questions about that. And so, yeah, I'm a little cautious. little cautious about the maximalist vision for continual learning. Okay, next question. How do you talk to normal people, quote unquote, normal people about AI? Honestly, I think my answer to this has become much clearer and simpler in the last couple months. My personal stories about concrete use cases that are super high value to me when things really matter, that works extremely well. So, with my son's whole cancer diagnosis treatment journey, you know, there's been a lot of opportunities for little anecdotes like those to pop up. And that's really my go-to at this point.

[30:42] If I'm talking to somebody who isn't paying attention to AI and I think they should be, or if I'm trying to convince somebody that AI can probably help them with stuff that they could use help on, and maybe they haven't used it for a while, whatever, the classic tried ChatGPT and wasn't very impressed. My ability to say, look, like I've been in the hospital for the last two and a half months now, the majority of the time. And every single day when we get test results or we get a plan from the doctors, I run it through the AI, ask for their point of view, ask them what they think they should do and compare that against other AIs and against the doctor's notes. And I can just say with confidence that the AIs are like step for step with attending oncologists and clearly like more knowledgeable and more reliable than the residents that we've dealt with at the hospital. You know, that's like lived experience in a context that obviously is like very important. And, you know, one way to talk about this is the revealed preference of what you do when your kids' health and well-being is at stake is really perhaps the strongest signal of what technology you really believe in and what's really driving value. So the fact that I've been using AI more than ever at the hospital is just a super clear signal that I think on a story, you know, a human story and human emotional level just lands with people. And so I would, if you're kind of looking for how to talk to people about AI and like get them to take it seriously when they haven't been, I would just look for your own versions of those stories. Obviously, you know, it's not worth getting cancer to have a compelling emotional story like the one that I'm now going to. Things that are compelling to you that really make a difference in your life, I would just tell those stories. I think that's the best way in for most people. And then there's like a whole bunch of other questions that you might want to think about downstream. Like, okay, now that they're paying attention, like, how do I get them to take existential risk seriously? Or how do I get them to take whatever other thing seriously? Everybody's going to be different in that regard. And I don't think there's like a clear best answer. But for me, personal stories have worked really well. And I would just talk in plain terms about the difference that AI has made to you in your life in just kind of very simple narratives. And that seems to work quite well for me. There was one woman, longtime friend, my mom's like longtime friend and mother of my childhood best friend. We grew up down the street, just a few houses down from each other. And she once told me this whole, this is like maybe two years ago. She said, this whole AI thing creeps me out and I don't want to have anything to do with it. And honestly, at the time, I was like, you know, that's a fine reaction. I think it's totally understandable that it would creep you out. And I don't think you necessarily have to have anything to do with it. She's kind of like, you know, retirement age and doesn't really have to have anything to do with it. So I left that alone. But when she heard the, my mom sent her the episode on Ernie's cancer and the use of AI in that, she said, you know, this has kind of changed my attitude. Like I feel much better about AI now. And I wouldn't want her to entirely forget her sense of discomfort and even fear about the big picture of AI. But I do think that has given her a much better intuition, at least for like why people are excited about it, you know, what the upside actually could be. And I'm quite confident that she is way more likely to go try it herself based on having heard that story than any sort of abstract argument or what it has done or reportedly done somewhere on the internet. Like the fact that it's me, that she knows me, and that, you know, there's just a really tangible difference in the life of somebody that she knows, I think that is probably the most likely way to change behavior. Okay, next couple of questions come from Aaron Bergman. There's a phenomenon of public intellectuals, including those I respect and admire, not exactly lying, but having very different tones in public and private. For example, a journalist taking pretty serious steps to prepare for COVID personally while maintaining a very different vibe in public writing. What, if anything, are you willing to tell us about the preparation steps you're taking, what kind of information you're conveying? conveying, what you're doing in general as a person with hunches and intuitions, rather than a public intellectual with an epistemic image to maintain. People are very coy around this stuff for some reason. Sharing earnestly is a great public service. I aspire to be honest, honest as I can be on this feed. I like the fact that speaking verbally, you know, there's obviously it's a much richer form of communication than purely written text. And it's people get the sort of qualitative sense, I think, of where I'm coming from by listening. I don't really feel like I have much in the way of secrets. I don't think there's a big divergence between my private approach and what I'm saying in public. I was, you know, pleased to see that I was in the top 5% on the 2025 AI forecasting competition.

[35:54] I guess it wasn't really a competition, kind of a survey that they then ranked people on. I came in at position number 23 out of 400 and some. I feel like in that way, I put myself on record with some forecasts of what I thought was going to happen. And it seemed like I was at least more accurate than most. When I looked back and was like, do I feel good about these predictions or not? I was like, eh, I feel okay about them. And the fact that I ended up in the top 5% with predictions that I felt were like, honestly, only okay, sort of suggests that the field as a whole is like not making super accurate predictions. So that's something that I think should be a bit sobering. And both kind of over and underestimating progress. It seemed like the most, the savviest people probably overestimated benchmark progress a little bit. Certainly I did. Underestimated revenue growth. I got some good points on that because I had a higher revenue estimate than most, but I was still under the actual number. So anyway, that's just one way of kind of calibrating my, you know, my public statements. Like when put to the test on forecasting, they were reasonably accurate. And certainly in that context, I was incentivized to be, you know, as honest as I could be because I wanted to be atop the leaderboard. I think in terms of like bigger, you know, kind of philosophical things, things that I'm doing offline, one philosophy that I've adopted pretty strongly is I don't really think it's worth worrying too much about money. Like, I don't think things are going to stay the same. I think they could be amazing. The future hopefully will be super duper awesome compared to the present. It also could go quite badly. I certainly do take that still very seriously with, you know, whatever, a p-doom of somewhere in the high single-digit to low double-digit range. And either way, I think I'm probably not going to have to worry too much about money. If we're in a post-scarcity world of AI abundance utopia, then I probably won't have to worry too much about money. And if, you know, we're all dead from AI, then again, obviously I won't have to worry about money. That's a little bit easy for me to say. I've kind of pinched myself on a daily basis that doing what I'm doing, which is basically just trying to maximize my own learning about AI, turns out to have a business model in the form of sponsorship of the podcast. I do some other work as well for companies where I just charge an hourly consulting rate. And that's also a very healthy hourly rate. And so, and I, you know, and I honestly, my business model beyond that is really just to accept things that people offer me. Sometimes people offer a speaking fee or whatever. I accept usually without any negotiation, occasionally a little bit of negotiation, but usually not much. I feel like as long as there's a decent income to support myself and my family in the short term, like big picture, it's probably not going to matter if I have X dollars in the bank or 3X or 10X dollars in the bank in 2030, or certainly like 2035. It just feels like the changes that are coming are big enough that probably all kind of comes out in the wash. Maybe that'll sound crazy in a few years, but that is genuinely the way I'm thinking about it. I've also thought about, but I honestly haven't acted on downside risk mitigations. Like what could I invest in? And I don't mean financially, although that's coming up in a second from another Aaron question. But, you know, what could I do? What could I buy, install to make myself more resilient in the case of downside scenarios? And, you know, there's some interesting ideas, but I honestly haven't really done them. One would be to get Starlink, you know, just to be able to be A, mobile and have internet access. And B, you know, it's if we're living in a world where cyber attacks, you know, you know, infrastructure crippling initiatives are becoming more common, common. then having. having both my normal Comcast internet and a Starlink connection would probably be a good idea. Why haven't I done that yet? I don't know, honestly, just kind of probably inertia, but I think that would be a good idea. Solar power would be another one that, you know, if I'm like worried about the grid going down or, you know, just generally major disruptions, then having a bunch of solar panels on my roof and a couple big batteries in my house would certainly be a nice backup. Combine that with a Starlink, and maybe I could be online and, you know, connected and know what's going on in the world, even if my local power and local cable had been disrupted. You know, I've looked into solar panel, but I haven't actually installed them on my roof yet. And then I was also even thinking about like really worst case scenario, what would I really want to have? And one answer there would be a rapidly expandable permaculture garden.

[41:00] This is something I actually kind of stumbled onto on TikTok with a guy named Mike Hoag, who is a fellow Midwesterner who specializes in permaculture. And his philosophy, a lot of inspiration from Native Americans and whatnot. Basically, he designs these gardens where different species of plants support each other. And it takes minimal ongoing work once you've set it up for the system to produce food. And it can also, if you choose the right foods, they choose the right species, then they can also rapidly expand if needed. You know, that's the kind of thing where I think like some investment by humanity in general to like have those sorts of things in little pockets ideally kind of distributed around would be a really good idea. But again, I haven't done it. I've like looked into all these things and where have I come down? I guess it's like partly inertia. Maybe if I was just a little more agentic, you know, maybe once I get my cloud code personal AI infrastructure really humming, maybe I'll start to do more of these things. But then part of me is also kind of like, maybe I haven't done it because in the end, I kind of put it in the same category as money where I just feel like, it just feels like, is that really going to help? I live in Michigan. Is there going to be enough solar power to get me through the winter? It's definitely not going to be enough to like heat my home and be warm. So I'm going to be in a pretty rough spot, even with some solar panels that can get me through, you know, some disruptions or some scenarios I can envision where it could be worthwhile. But in the like extreme scenarios, how much difference is it really going to make? How many thousand dollars, a few thousand dollars, whatever, and kind of convert that into different forms of capital that would be really valuable in situations where money isn't and could be the difference between, you know, surviving and not surviving in a civilizational collapse kind of scenario. I feel like, you know, probably should do it, but it's just so kind of depressing to think about. And then also like, is it really even going to help? You know, is there anything I can do to really be in a position to survive in like really bad scenarios? It's tough. You know, those things might increase our odds a bit. There's still, just because you've got a little permaculture garden doesn't put you by any means in a good position if we're in like a worst case AI scenario or even just like a worst case like electrical storm scenario. You know, I think fairly often about the old, it's called the Carrington event where it happened in like the mid 1800s, I think 1860 something, maybe 1869, and it like destroyed a bunch of telegraph networks and people that were working on telegraphs got shocks because this electrical storm like just put such a surge through the network. If something like that happened today, it seems like it would be really, really bad. Could happen obviously with nothing to do with AI. That's just a random solar event that happened most recently, 130 years ago, or 100, maybe 50 years ago, and could easily happen again. Like, I don't think we have any assurance that won't happen next week. So, you know, there's other threats besides AI where some of these things could be really helpful. But, you know, does Starlink survive that? Do my solar panels survive that? My permaculture garden probably would survive that, but it's a pretty bleak life if everything really collapses and I'm trying to live off of turnips in my backyard. So I share all that just, you know, it's in response to the question that's kind of where some of my thoughts go when I'm thinking what, if anything, can I do to protect myself against the most extreme AI scenarios? But I haven't actually done those things. So in terms of like a big disconnect between my public persona and my private action, action. those are like private thoughts. thoughts, but they have not yet translated into private action. So there you have it. Part two from Aaron was: are you willing to share anything about investments? And basically, I have a similar philosophy here. I'm not really chasing money. I'm not really trying to maximize my return. If anything, I'm trying to maximize the cushion that I have so that I can devote my mental energy to learning as much as possible, understanding what's going on as well as possible, and hopefully, you know, sharing it with others as effectively as possible. So, what I do in terms of investing in like stocks is the most vanilla thing in the world. Super, super vanilla. I keep more in cash than I think most people do and most people would advise. And what my wife and I do put into equity investments is really just very generic index fund kind of stuff. If people ask me for my investment advice, I either say that or I would say, go along on big tech. I think the idea that there could be a quote-unquote big tech singularity is not unrealistic.

[46:09] Obviously, the increase in the stock market over the last however many years has been primarily driven by a relatively small number of companies. And I kind of expect that to continue. It seems to me that your NVIDIA's, Google's, Microsofts, Metas, you know, these companies are really well, Amazon, Apple, these companies are really well positioned to continue to dominate. And so I would expect them to probably continue to outperform the rest of the market. But I don't even really tailor my portfolio on that level. I just buy the index, and that's pretty much it. That's also partly for me because I have found that any sort of gambling, when I was in college, I played a decent amount of online poker and I like was a winner, although I wasn't like amazing at it, but I did win more than I lost. But I found that it wasn't a very psychologically healthy lifestyle for me. Not that it was terrible for me, but on reflection, after playing online poker, you know, a decent amount during like at least one year of college, I was like, you know, this is consuming a lot of my mental energy. The hourly rate that I'm making is not that great. It certainly wins and losses definitely affected me emotionally. And so, yeah, I kind of from that time on, I was like, you know what? I'm going to avoid anything that feels like gambling. It feels like it consumes too much of my time and energy. And the payoff isn't that awesome. And I'd rather just have a clear mind that doesn't have to worry about any of those things and is able to focus on other things that I'm like very confident that if I do a good job, there will be value. So that's pretty much how I approach financial investing. I'll say one other thing, which is I do have a very small, and this is more just for like camaraderie and friendship than it is for financial returns. But a good friend of mine from high school has organized an investment club with like, I don't know, 12 or 15 old buddies from high school and invited me to be a part of that. And so I am a part of that. I, you know, put in, it's a relatively small financial commitment. Everybody pays in a couple hundred bucks a month. And then we kind of discuss what we might want to invest in and we make investments. The only recommendation that I have made to that group in terms of an individual stock was NVIDIA when it was at $500 billion market cap. And I remember saying, it's hard to say that this is underpriced at $500 billion, but it does feel like the upside is pretty huge because I think what's about to happen in AI is going to be that huge. So sure enough, we've got whatever, an 8X return on that investment so far. And so I am not chasing money even, whether at the level of how I spend my time or how I negotiate, you know, before I'm willing to get involved with something, or, you know, how I'm trying to allocate what capital, what investable capital I do have. The other aspect of investing, which I've talked about here and there on the podcast from time to time, mostly as like guests come on that I've made small investments in, is early stage private company, you know, kind of venture capital style investing. And there I do two things. One is just invest very small amounts of my own money. And two is I'm also now, since the acquisition of Turpentine by Andreas Norowitz, I've also been able to become an A16Z venture scout, which basically means they give me a not huge amount of money and I can write relatively modest investment checks into very early stage companies. And the way I think about that is basically, I want to invest in things that I want to see exist and that and that I want to see succeed. I don't, when I'm writing my own. my own money checks, I don't really think about return at all. And I'm writing very small checks, so it's like, it's not something where I'm like, even if it does, even if some of these companies do extremely well, and companies I've invested in illicit, because I really respected their commitment and their philosophy of highly structured reasoning, you know, the idea that we can't just like allow the black mocks models to do everything and hope for the best. Like we need to really take it apart. We need like a systematic approach to structuring their reasoning and also to ensuring reliability of the reasoning. I thought that was great. So I invested a few thousand dollars there. Goodfire, because I'm really into interpretability. The AI underwriting company, because I think the flywheel that they're trying to create in terms of harnessing the power of the insurance markets, creating these standards, creating audits, and ultimately trying to bootstrap an insurance market so we can start to price the risk associated with various kinds of AI systems. I think these are all like worthy projects that if they were nonprofits, I might be inclined to make a donation to. But since they are private companies that I can buy equity in instead of making a donation to, then I'll, you know, I'll do that.

[51:19] But I'm really just doing that to try to support those projects, to be on the team. But I think what I'm trying to do there is identify things that are both safety promoting and have fast growth opportunity. And I do think that that's like a pretty decent intersection point. One way that I've heard people describe this is if AI is going to become the biggest market in the world, if it's going to start to compete with human labor broadly, which it certainly seems like it's on track to do, then the second biggest market in the world is going to have to be AI assurance tech. How do we make sure that this stuff is actually working the way that we want it to work? How do we control it? How do we quality control it? So I think there are quite a few things at that intersection of safety promoting, reliability promoting, control promoting, et cetera, that also do have potential to be quite fast growth. And those are the things that I'm inclined to invest in with my A16Z scout fund money. And hopefully, you know, that will kind of, I think there's actually a lot more, I've talked about this many times as well, but I think there's a lot more room for alliance between the AI safety community, or at least a lot of the AI safety community, and the A16Z accelerationist worldview than is commonly thought. Mostly because I think the AI safety community is often caricatured as being anti-progress, anti-technology, whatever. And honestly, that's, in my experience, like almost entirely wrong. The people that I know, and I know a lot of them, who are focused on AI safety issues are generally very pro-progress, very pro-technology. They're generally lifelong techno-optimist libertarians who see AI as a different kind of thing because it does have this potential to out-compete us at what we have been uniquely good at, which has allowed us to take over the world. And because of that special dynamic, like based on very specific arguments and analysis of this particular phenomenon, see AI as being different from kind of everything else. But, you know, the AI safety people are like very in favor of permitting reform and want to see more housing get built and are generally all for abundance and like want their AI doctors and they're very well aware that human doctors are not as great as we might wish they were. They want their self-driving cars. They're very like analytical when they see that self-driving cars have a 10% accident rate as compared to human drivers and then further see that like almost all those accidents are caused by other human drivers surrounding the AI drivers. They believe the numbers. They update on these statistics. So I would say on like a very large range of questions, there is a lot of opportunity for the AI safety focused people and the A16Z worldview to come together. And I hope that by getting A16Z invested in token fashion, because my checks will be very small, and certainly not the kind of thing that's going to make or break A16Z economics. My hope, though, is that I can start to send in a little bit of a signal to the firm more broadly that like, look, there are a lot of things where progress can be enabled, assisted, even accelerated by this sort of assurance tech. And these businesses can grow fast and that the people that are starting these businesses are ambitious. ambitious and they want to grow fast and they want to see a brighter future for everybody. So hopefully I can kind of send a small signal that might have some bigger ripples over time. And then where there's an opportunity to support something that I believe in, do that with either in my own money case, like basically no concern about return, or in the A16C scout case, looking for things that have potential for high return, but doing that in a sector or with a concept that I think could start to facilitate this coming together of AI safety folks and accelerationists. Thank you to Erin for a couple of good questions. Okay, here's another interesting question on early childhood AI literacy. So mainstream advice, this is the question. Mainstream advice on kids in tech often ends with a push for abstinence. What would a sex ed style approach to technology and AI look like for kids that's age-appropriate, values-based, and practical so they can build confidence and judgment instead of secrecy and bad habits? What could parents, particularly as models for their kids and schools, as creators of safe spaces to explore, do at various ages, say three to six, seven to eleven, and 12 to 18, to foster such healthy learning and exploration? That is a truly great question. And I don't think I have an answer that's up to the scope of the challenge of that question, in all honesty.

[56:32] Eugenia Koyda, founder of Replica and now Wabby on our live show, basically did advocate for abstinence for younger kids. She said we just don't know enough about this technology to trust any developers, no matter how well-intentioned they are, to serve young kids in a way that we're ultimately going to be happy with, you know, that we'll ultimately feel like we were wise to have done. And so given her experience in the replica space, I am reluctant to disagree with her. You know, I guess I would say for especially younger kids right now, and my kids are Ernie's six, almost seven. Teddy is just turned five, and our youngest, Charlie, is two and will be three in April. So, you know, I'm still in kind of the first grade and below bracket. I suppose abstinence might be a decent idea there. And yet I'm not content with that. I do think when I look at what folks like Alpha School are doing with two hours of entirely AI delivered education, two hours of educational content, two hours of focused academic work, entirely delivered and supervised by AI on a one-to-one basis for every kid, the fact that they're able to get kids going faster than usual schools go in just that two hours a day and create this whole afternoon of freedom to do all these other exciting things that kids want to do. I'm not content with the idea of abstinence being the answer. So I think that puts us in kind of a hard place. I think we are at a spot where it's right to say that the technology is too new. There's too many surprises about it. And nobody has really established themselves with a great track record for how to create AI products, experiences, whatever, that really serve kids well in the long term and don't just engagement max or whatever else. I think that's true. But I also don't want to put my head in the sand or try to get my kids to put their heads in the sand and pretend that AI doesn't exist for all that much longer because I do think there is going to be a lot of value in even just AI for educational purposes and probably more besides that too. We just haven't seen the right form factors. I do use voice mode AIs with my kids fairly often. It's not something we do like all the time. But if we're playing a game or there's some question that they have that I don't know the answer to, I'll totally whip out my phone and go into voice mode and ask Ada question and get the answer. And this is something that for them seems pretty normal. But yeah, it's not something we do like a ton, but it's also not something I'm like trying to hide from them. And it's not something that they're like clamoring for all the time. But occasionally, you know, one of them will say, hey, why don't you ask AI about that? And so I will. I guess I put this in kind of the same category as other major fundamentally, not just like life-altering, but sort of condition of life-altering technology. technologies that we're going to be confronted with in the coming years. I would put brain-computer interfaces as another one of those. You know, Neuralink is like planning to scale up its patient base this year substantially. And they've said that they plan to serve what would be considered well people in the not too distant future. So we're going to have questions around: do we get brain-computer interfaces as healthy people? We're going to have questions around all sorts of gene editing. Do we do that? Do we, you know, and as my kids are already born, so they would, you know, be getting whatever gene editing they'd be getting as formed people. But then we're also going to have increasing levels of power in terms of embryo selection or embryo, you know, level gene editing that's going to fundamentally change the nature of people, you know, as before they're even gestated and born. And I think this is this sort of surrounding ourselves with AI friends, companions, tutors, always on entities, I think is probably right up there in terms of the magnitude of the impact that it could have. And so I think we're going to have to approach it with extreme caution. But the value of it is probably also going to be undeniable. And it is very hard for me to imagine abstinence all the way up to 18 years old. You know, the idea that a high school kid today should not be allowed to use AI because of the downside risks. I don't see that. I definitely think high school kids should be able to use ChatGPT, should be able to use Claude. Should they use AI boyfriends and girlfriends? I certainly would understand a parent saying, no, I don't want that.

[1:01:48] And that's probably where my intuition would go as well. But I don't know, this stuff is tough. I haven't parented a teenager. And, you know, I do think the trade-offs there are tough, right? Like, if they're going to have a phone, they're going to have some access to this kind of stuff. If you tell them it's not okay to use, do you drive it underground? Do you lose visibility into it? It's very tough. I think this stuff is very fraught. So I think it's by all means a great question to be asking. I wish I had better answers. You know, get hands-on is always kind of one of my fallback answers. I would say if you are considering buying any of these form factors or allowing your kids to use any of these sort of, you know, whatever, it could be a stuffed animal that can really talk or it could be an app, you know, that sort of is a virtual friend or whatever. If you are considering that, I would definitely get hands-on with it yourself and really try to understand it and get a get an intuitive, experiential feel for it before you just give it to your kid and let them do whatever they're going to do with it and hope for the best. But I just don't think there's, you know, there's just not great answers right now. This is definitely one of the areas where like consumer reviews will be extremely important. It would make a huge difference to me to know that lots of other parents are out there saying, this thing has been great for my kid. You know, I would take that quite seriously. I think it, I believe it is possible. I guess maybe my closing thought on this is because the space of possibility with AI is so vast, I absolutely believe it is possible to create AI toys, products, virtual friends, whatever, that do effectively nurture humans at any age. And so we're just going to have to continue to watch the space really closely and be hands-on, work together, and hopefully that way we'll be able to come up with good answers. But it's a tough one. Next question. What is your timeline for work disruption? Could you comment on job disruption in terms of phases? So like if zero to three years hits roles like customer support, marketing operations and some software engineering tasks, and years three to 10 starts reshaping accounting law and parts of medicine, what's your best guess for years 10 to 20? Where do humans remain essential at each stage? I guess, first of all, if we're going to use a timeline like that, I would say time equals zero was somewhere around the introduction of the instruct GPT model, the first instruction following. Actually, I guess they later revealed it was just supervised fine-tuning and not yet RLHF as of the first release of InstructGPT. That's very esoteric in the weeds history. But that was early 2022. And then ChatGPT, of course, late, late 2022. I would say 2022 is kind of is like the year zero for this purpose. And And, you those know, early. those early models, the first things they were able to do, and certainly the things that I was like most interested in getting them to do early on was like write some basic marketing copy. So I think that we already started to see some disruption in like 2022, 2023. Copywriters do seem to have been very significantly impacted by AI. We also had an example of this at Waymark with VoiceOver, where initially our offering for voiceover was a professional service. We would charge $99, which is a pretty good price. And we worked with a provider that did a really good job. And we delivered professional voiceover at medium scale for an SMB accessible price point. And we were pretty proud and pleased of that service. In 2022, 2023 timeframe, AI voices started to get good enough that they were really not competitive with the Human Pro, but classic disruptive technology. They were worse, but they were way cheaper and they also happened to be way faster. So you could get with our AI voiceover integration multiple takes in seconds and at no additional cost versus a much better product for 99 bucks that would take a couple of days and maybe have a couple rounds of back and forth. And we started to see substantial adoption of that pretty much immediately. It did start to eat into how much of the professional work we were seeing customers choose to pay for, even while the AI voices were like clearly inferior. Now, fast forward to today, and we're now, you know, obviously spoiled for choices of great voices. 11 Labs is obviously great. Google also has amazing and like very steerable, promptable voices that are awesome. And there's lots of other companies doing great stuff besides.

[1:07:02] Hume AI has a really emotionally competent voice that can also, they're also like very focused on understanding the emotion of the human for our purposes with Waymark with creating marketing video content. We don't need to understand input human voices, but Hume is good at that as well as making the voice sound emotionally intelligent. And basically these days, like we just don't do much. I don't think we do really any human professional voiceover work at all. So that's like a pretty substantial change that I would date back to like 2022, 2023 starting. Marketing copy and voiceover work being some of the earliest. And now as of today, if you look at GDP Val, the latest models are winning in software engineering by a substantial margin. It's like in the 70s, maybe even up to 80% of the time when, and again, GDP VAL, three sets of experts. One set of experts defined the tasks in various domains. Another set of experts does the tasks. Their work is then compared by a third set of experts to AI work. And the third set of experts are responsible for determining which do they prefer. And the AIs are now winning a lot when it comes to software engineering tasks. So I would say for sure that I prefer working with Claude Code over working with an entry-level human software developer. That's like pretty obvious to me, honestly, at this point. I'm not sure yet. I think the debate is ongoing when it comes to has this hit the statistics or not, but will it? I think it will. I think in 2026, it's going to be hard to justify on purely economic terms, hiring a 22-year-old out of a CS program versus spending more on cloud code or getting really good or spending the time that you would spend mentoring that person instead spending that developing new skills and hooks and increasingly elaborate architecture for your AI coding setup. So I think the disruption is happening now, certainly in software. And I guess relative to the rest of the question timeline, I don't think it's three to 10 and then 10 to 20 years. I think it is probably coming sooner than that. When accounting, law, parts of medicine, well, let's start with medicine. I've talked ad nauseum now about how the latest models are competitive with attending oncologists. That's a reality. I've lived that. There's no denying it. Nobody can talk me out of it. As sure as I'm sitting here, all three. three of the frontier models, Gemini 3, COD, and ChatGPT 5.2 Pro currently, they're all competitive with attending oncologists, and there's just no two ways about that. So, in terms of parts of medicine, like, yeah, absolutely. I'd say that disruption is starting to happen. I don't recommend people like ignore their human doctor or like go all in on AI, but I do think that the, especially for things that aren't so important, the substitution effect is going to start to be real because you can get a lot of your questions answered and maybe not have to go to the doctor or go to the right doctor the first time or go in, you know, with obviously in our society right now, doctors still are required to prescribe the medicines. So, I do think that the way that this is going to play out is going to be noticeably different, like very soon. And I'd say that's probably going to be true in accounting and law as well. You know, when I, I'm certainly no expert in law, but when I get a contract to sign, I run it through three or even four frontier language models in the exact same way that I run my son's lab test results through language models. And I'm yet to regret it. Would I do that for the most important transaction of my life? I'd probably get a human lawyer, you know, to help review as well. But for routine stuff, I'm absolutely happy with the results that I'm getting. And if like none of the three or four frontier models identify issues that I need to be concerned with, then I'm good. I think it's pretty unlikely that all three would miss something that's really important to me. And even in accounting, we're starting to see models are getting pretty good with spreadsheets. So I think that too is coming pretty soon. Another mental model I have for this is like sort of how elastic is demand in a given domain. And I think that varies dramatically. You know, when it comes to something like accounting, I personally don't want to buy a lot more accounting services than I am required to buy.

[1:12:13] And maybe others are out there, you know, wishing that they could buy more accounting and just don't have the budget for it. And when, you know, AI makes accounting professionals more productive, then they'll like buy lots more accounting services and that will partially offset. But I just don't see that there's like that much latent demand for more accounting. I think most people are like doing roughly what they need to do, what they're required to do. And beyond that, they don't really have that much appetite for more. So in something like accounting, I think when we see threshold effects hit and all of a sudden AI can do the job, I would expect that to be a field where there would be more outright substitution and displacement and not like an explosion of like accounting services provided. Dentistry, by the way, is like my example of like maybe the most extreme version of this. Like I want zero dental services. I'd love to never have to go to the dentist again in my life. So if there was something that I could buy that was 1% the cost of dentists and did the same service, like I'd happily do that and I wouldn't be doing like lots more dentistry. I think most people share that intuition. Medicine in general is probably a bit different. I think most people might want a little more medicine. They might want a little more care so that there could be a surge of demand. And as things become more accessible, as sort of capacity expands, demand probably grows to fill it. And then software engineering maybe could be like the 10X or even the 100X. Like do we get 100x as much software production over the next few years? Very plausibly. And that might be enough to sustain software employment, at least among the senior engineers for a while yet, for perhaps a surprisingly long time. Even when it looks like, hey, AI can like code up this website in one shot. What do we need engineers for? Well, if we're doing 100 times as much software and the architectures are getting ever more elaborate, then maybe there's still a role for the kind of senior software architect, even if there's not for the junior person. I think this is one area where I do have a bone to pick with some of Dwarkesh's analysis. And I think generally very highly of Dwarkesh. I think his, obviously, his show is great. His questions, I think, are generally very effective in terms of eliciting alpha from his guests. And I think his essays are generally really good. But one thing that he has said recently that I do pretty strongly disagree with is that explanations, you know, when the question is posed, like, why aren't we seeing more impact than we have to date from AI on the labor market? He has argued that explanations that center on human bottlenecks are essentially cope. He thinks He it's thinks really. it's really that the models aren't good enough and it's not that the people are too stuck in their ways. And obviously, I would say it's both. I mean, the models do have room for improvement. Jagged Frontier, all that stuff for sure is real. But I think that there really is a lot of human bottleneck going on. And, you know, I see that at the hospital, right? I mean, when a resident comes to the room and talks to us and is like clearly less knowledgeable and less reliable than a large language model, I don't really see that as, I don't know how to interpret that in any other way than that the humans are the bottlenecks, that they have not realized, you know, that nobody has told them, that they haven't experimented with themselves, actually using a model on their own. That's on them, right? It's not on the model. Like I can just tell you easily, like a bunch of times that if the resident had engaged with ChatGPT as I had engaged with ChatGPT before coming to talk to me, then they would know more and they would be able to do better. And I'm quite confident that is true in a very, very wide range of contexts. I do think that the human bottleneck is a very real phenomenon. Not the entire story, but to say that that's cope, I would definitely debate Dwarkesh on that one. The other thing that I would point people to in terms of like just a mental model for this, again, I've mentioned already, but we ran that cross post with Luke Drago from the Future of Life Institute podcast. And he's got this idea of the inverse pyramid model, where basically the lowest rung in terms of the hierarchy of an organization, in other words, the entry-level employees, AI is basically coming for them first. Another way that he thinks about it is anything where there are lots of people doing the same job and the organization is geared toward trying to make sure that those individuals act more like cogs in the machine and like do the job in the same way every time where they're focused on consistency, reliability, process, standards, systematic evaluation, like all that stuff really lends itself to AI.

[1:17:18] So he recommends, and I think this is pretty good advice, try to do things that are N of one and try not to do things where you are one of N. So be N of one, don't be one of N is pretty good advice, I think. But obviously that's like, you know, advice that some individuals can take. It's not something that I think is going to, you know, preserve the structure of employment broadly. And we're not even talking about driving here, right? I mean, that wasn't mentioned in the question, but last I checked, I think it's 4 million Americans out of like 150 million employed Americans. Something like 4 million are professional drivers. We're getting real close, right, to where human drivers are just not going to need to exist. And it seems like the disruption there could happen quite soon. And it, again, seems like the bottlenecks increasingly at this point are human. You've got some city councils, you know, that are trying to do various things, or what are the Teamsters going to say, or what have you? I'm in Detroit. Waymo is projected to launch in Detroit in 2026. It's going to be really interesting to see how that plays here in the motor city, especially because companies headquartered here aren't really at the center of the frontier of the self-driving technology, to put it mildly. So I'm not sure what the response is going to be, but the bottlenecks there are pretty clearly human at this point. And it seems like even in a world where the AIs don't really get any better, I do see as we kind of figure out how to scaffold them and plug them in and manage the context right and all that kind of stuff, I do see pretty serious disruption, at least being possible, unless it is blocked by socio-political dynamics in just the next couple of years. And then, you know, again, superintelligence is a distinct question from that, but it seems like we're quite clearly close to where AI will be able to compete for entry-level jobs, for jobs where an individual human is one of N people doing that job because there are the standards and the processes and the evaluation frameworks to make sure that AI is doing well. And I expect AI, especially, you know, as we really put that elbow grease in, I expect it will do better than a lot of people in a lot of places. And that the economic, the market pressures will be very strong to adopt them. Customer service is another great example, right? I mean, will there be some customer service people? Well, I don't think it's going to zero in the immediate term, but my dad was on the on phone the phone with Bank. with Bank of America yesterday on hold for however long, getting increasingly agitated by the fact that he was on hold as he's listening to their whole music and messages. And I was just like, man, I know several AI customer service firms doing voice agent type work that would, at a minimum, be able to reduce the wait time dramatically and probably do just as good of a job for like a large majority of tickets as the humans are ultimately able to do. So I think we're bottlenecked on willingness. I think we're bottlenecked on implementation. At the very high end, we're still bottlenecked on model capability. If I had to kind of project how far up the pyramid can AI go today, it's like, you know, maybe half the way up vertically, but that's like, you know, whatever, 80% of the mass. And I think the technology is basically there for that. And now we'll have to see how fast people actually, how fast does the market work? How fast does market incentive and pressure and increasingly lower barriers to successful deployment, how fast does that all work to actually create the disruption that I honestly think is kind of inevitable? But relative to the question, I would take the under in terms of timelines. And then the final point of the question is: where do humans remain essential at each stage? I think that's a design question more than anything else. I think we want to design an AI future where humans are essential or are at least a big part of the process so that we can retain some authorship over the future and not just give the whole thing over to AIs as the gradual disempowerment people worry we might. But I think that is going to have to be intentional more so than like something where we will find that there's some human, some ineffable human essence, you know, some Elon Vital that, you know, something that for whatever mysterious or even mythical reason, mystical reason, only humans can do. I really don't believe in that. I think that we are pretty remarkable, certainly, in our breadth and our flexibility. And the high end of human achievement, you know, is obviously super impressive and inspiring to the rest of us, you know, mere mortal humans. But AI is doing superhuman level stuff in more and more domains.

[1:22:27] And I don't really see fundamental blockers to that being in the fullness of time, functionally everything. So I think it's much more on us to think about how do we design these things. How do we design our overall society? How do we design our overall systems? How do we design our models? How do we design our implementations so that we keep the parts that we want to keep and retain some overall control of steering the future? I don't think that's going to just happen because there's something that AIs can never do that only we can do. I think it's going to have to be a lot more intentional than that. Okay, next question. Regarding nonprofits and shifting need. As AI reshapes labor markets, how do you expect the definition and scale of those in need to change over again, zero to three, three to 10 in 10 to 20 years? What should nonprofits and funders start doing now to prepare, especially around workforce transitions, mental health, and basic economic stability? I would like to see a lot more work on this, honestly, than we have to date. My general answer is I have to give Sam Altman a ton of credit here for making personal investments in UBI. And I think that is something that we are ultimately going to need, whether it ends up being called UBI or exactly how it looks. I think we're going to have to have some new social contract that decouples a person's right to a decent material standard of living from their ability to contribute economically, especially in the context of competition that they're going to face from AI systems. I just think that's really the only way that we are going to get to a place that anybody would be happy with. And I welcome other suggestions, but I haven't heard many that really make a lot of sense. The only other, what I hear is either like, we're going to need a UBI or we're not going to need a UBI because there's always going to be work for humans. And I just don't believe that. Tyler Cowen, you know, it kind of confuses me on this point these days, to be honest. And I've been kind of waiting to invite him on the podcast for a while. Maybe I should finally get around to doing it. But I looked back at marginal revolution and I think his first mention of ZMP workers or zero marginal product workers was from 2010. And this was basically research that was looking at how firms responded to the Great Recession. And what many firms did in response to the Great Recession is they looked around their teams and they said, who do we really not have to have to go forward and be okay? And they were able to identify people and they were able to cut those people. And of course, you know, that's a noisy process. But by and large, it seems like firms were able to cut headcount and then actually see somewhat of a surge in productivity, seemingly because they were producing more or less the same with fewer workers because there were some people that just weren't really contributing much. You take that and you take the general phenomenon of like bullshit jobs where like a lot of people seem to feel according to survey results that like even absent AI that like their own work isn't really worth anything and it's kind of performative or it wouldn't matter if it wasn't done. Maybe they're wrong about that, but are you really that confident that people who are saying that their own work is meaningless or not necessary, are you really confident that they're wrong? I tend to think they, you know, at least some of them are probably right. So it feels like there's like already, you know, kind of at baseline, some ZMP workers that are employed. There's a lot of people who are saying that their work is just not really meaningful or important and that things would be fine if it wasn't done at all. And so, yeah, I just see that we don't have a great answer other than that we have to find a way to decouple somebody's ability to exist and like have a decent life from their ability to compete economically. Again, especially compete against AIs. Exactly what the right structure of that is, I think we would be doing, we would do really well to start doing much more of that experimentation. And I personally interpreted some of the recent UBI results as much more encouraging than I think the broader discourse did. I'm certainly not an expert in this, but my takeaway from some of the UBI research was like people were disappointed that people seemed to work less in response to getting the UBI. And I'm kind of like, I think that is the point. You know, I like if they worked the same, in my mind, that might even be more of a failure. Like, I think we want to, we might not want to quite yet. We maybe didn't want to like two or three years ago when these studies were being run, have people like just take cash and not work. But I think that what that does show is that people are just, in many cases, just working for money.

[1:27:34] They're not like, maybe you could also say like, at least some people are like relatively easily satisfied with like not that much money, which would suggest that they are able to find meaning in things like spending time with family and leisure and, you know, whatever. Maybe they don't feel compelled to go out into the workforce and do some job that they really don't want to do just to get a bit more money. Obviously, like marginal tax rates are also, in many cases, like totally out of whack and working more can mean you forfeit benefits of all kinds, whatever. It's a very complicated question. But I'm encouraged, I think, much more than the average commenter is by these UBI results because I just see that like it didn't take that much to actually get people to work less. And if they're not enjoying their work and they're substituting away from that work toward leisure, to me, that suggests they're not like missing out on meaning. They're not like pining for the workplace as the place where they're going to like have identity and find meaning. It seems like they're finding that in other places just fine. Thank you very much. And I would expect that probably to continue. I tend to think that the, you know, we need jobs for structure and meaning and whatever. I think that's mostly cope. And I think it's especially unhelpful cope when it's projected by people who have privileged positions where their work is like high status and where they genuinely do love it. And I count myself in that group and I'm very thankful for that. But when that sort of reality is projected onto people who are lower on the socioeconomic scale, who are doing work that they don't want to do because they have to do it, because that's the only way that they're going to feed and clothe their kids, I think that's like quite counterproductive and misguided. So yeah, bottom line, I would love to hear other ideas. I would love to hear, I would love to read your utopian fiction about other ways that we rework the social contract that isn't just kind of a vanilla UBI type structure. But until somebody more creative and imaginative and visionary than me comes up with those ideas, I basically still think that UBI is the default and denial is not helpful. More experimentation sooner about the details and the structures and exactly how incentives should work would be really valuable. valuable? Okay. Okay, Thank you for those questions. Next one. one: Are people being misled by benchmaxing? And then another kind of related question: does the massive success of pre-training give people the wrong intuition about AGI? Yeah, I mean, characterizing whether people are misled or whether people have the right or wrong intuition, I think, is very hard because people have very, very different intuitions and world models, right? So, like, there's such diversity there, but I can't characterize like people's opinions in general with any that's just such a wide-ranging thing, right? That anything I might say about some people's opinions is obviously going to be contradicted by other people's opinions. That said, I do think benchmaxing is probably misleading to at least some degree. And I would go back to the Chinese models, which I talked about last time, being like significantly worse on just a very random multimodal task than all of the leading Western models as a kind of leading indicator of how much benchmaxing might be misleading us more generally. It does seem to me that like the delta between the leading frontier American models and the Chinese models in benchmarks is much smaller than the difference between them in practical day-to-day utility on whatever idiosyncratic and or esoteric task you might want to use an AI to help with. And of course, the Lama 4 incident shows that as well, where they like, I guess, spun up like maybe a bunch of different Lama 4 variants to put one in each different LM Arena category to try to maximize their score. I'm not exactly sure what they did, but they sort of LM Arena maxed, and that was effective in as much as they got a high position on LM Arena. But you don't hear all that much about Lama 4, and it seems like it just wasn't really competitive, but they were able to make it look like it was competitive on some of these standardized scores. So, yes, I think benchmaxing is a problem. Independent analysis is, you know, is a good antidote for this. Looking at the meter charts, looking at like artificial analysis, you know, there are people that are the scale benchmark that is like largely private, you know, is a pretty good way to think about that as well. I think ARC AGI has been like remarkably durable in terms of how relevant it's been, but you know, people can obviously benchmax on that too.

[1:32:38] But definitely things where there's a private test set and things where people are like taking it upon themselves to really analyze capabilities and they make that their thing, you know, and like really try to earn your trust by being a reliable guide to model performance over time. I think those are the things that increasingly the field will be looking to beyond just performance on open standardized benchmark tests. Those continue to be somewhat relevant for sure, but like less and less all the time. And then on this, on the question of like, does pre-training give people the wrong intuition about AGI? If anything, I think maybe post-training is giving people the wrong intuition about AGI, and maybe the classic Shogoff meme for pre-training is maybe a better intuition. And this kind of also connects to the breadth-first versus depth-first search of the space of possible AIs. I think that if people are misled about the intuition, or if people have the wrong intuitions about AGI, I would expect that it's wrong in the sense that they have encountered a relatively narrow range of form factors of AIs, which are basically like chatbots and coding assistants. And that those, you know, those designs have kind of converged thus far. The overall paradigm of helpful, honest, harmless, which has been great for unlocking a ton of value and pretty great in terms of certainly when done well, as Anthropic has done, shaping the model's character. No beef with those problems or with those approaches. But if there is a problem with that, it has presented to the public a very, very narrow slice of the overall conceptual menu of what AI can be like. And so it maybe is lulling people into a false sense of security or a false sense of like this will be continued to be normal. And I think that in reality, the Shogoff meme of like, this thing is an insane alien that kind of encompasses everything in a super strange way and can kind of shape-shift and be anything you want it to be be, or maybe even things you don't want it to be. I think that is maybe the better intuition. intuition for at least the space of possibility for AGI. And, you know, when you think about things like the, I think back to the episode we did with Apollo with Marius Hopan from Apollo Research, where they got access to chain of thought from O3 class models, and they found that the chain of thought was kind of evolving to become its own dialect. Remember, like disclaim, disclaim, vantage, you know, the watchers, these strange phrases that really didn't look anything like the training data, in this case, you know, being driven by reinforcement learning at increasing scale, it just feels to me like there's just so much more alienness to the systems that we are than we are seeing. I guess, you know, just a couple other little intuitions for this, maybe worth mentioning. People often cite the bird airplane analogy. You know, there's many, many people have commented, of course, that like we wanted to create a machine that could fly. A lot of early attempts sort of tried to mimic the bird. What we have is an airplane that like flies on quite different principles and is just like way faster, way more powerful, can carry way heavier loads than birds. It can't do necessarily everything that a bird can do, but for the things that we've designed it for, it's just way, way better, right? We wouldn't want a scaled up bird to take a cross-country flight on. Airplanes are just way better than scaled up birds. Similarly, many people may have seen recently a video of it was kind of a compare and contrast where there was like a humanoid robot harvesting grain in a field, you know, and kind of chopping it down and bundling it up in a way that humans traditionally used to do. And then there was this carrot picking machine that's just rolling through a field, pulling carrots out of the ground in mass, washing them off, and just operating, you know, at orders of magnitude faster than humans could possibly pick carrots. And so I think something similar, you know, maybe happens with a true AGI or like a super intelligence, where it becomes so powerful that it potentially doesn't even really make sense for it to present in like this natural language sort of way anymore. Or at a minimum, that becomes sort of a dramatic reduction, some sort, you know, a majorly lossy kind of summary of what it's doing as opposed to the core thing that it's doing today. You know, today, the thing, the response that you get from the chatbot, like that is its output, right? In the future, I think the natural language summary of what it's doing may be just a very small part of what it is actually doing. And so, yeah, I think study the show goth, study the weird chains of thought, expand your mind when it comes to the space of possibility and how weird things could be.

[1:37:52] Those are my guesses for how most people are being misled, if at all, today. Okay, next question: is learning from a physical environment a requisite for AI? I don't think so. I think we're pretty far along, obviously, in AI at this point. And when we look at all the things that AIs can do and like how many categories they're winning more than 50% on in the GDP val context, all without any robotics or like physical embodiment at this stage, I think it's like pretty clear that you can get pretty good AI without needing to learn from a physical environment. Now, the line maybe starts to blur a little bit there when it's like, can you use a computer? Is that a physical environment? Well, it's a digital environment, but it's like spatially organized. And it does seem like the way that we are getting AIs to learn how to use a computer is by them actually trying and failing to use the computer a lot, right? We needed some language model-based capability to like conceptually understand what one might want to do on a computer and what different buttons might mean and so on. And then from there, it's been like a lot of actual reinforcement learning where once there was at least some ability to succeed, then you could hill climb from there up to like, at this point, pretty decent performance. And I definitely think we're not quite there, but in like 2026, I think we'll definitely have computers that are AIs that use computers basically as well, if not better than your typical human user. So yeah, I guess maybe it kind of comes down to like a task by task thing. Do I think that large language models are going to become plumbers without a bunch of reinforcement learning on physical plumbing like tasks? No. I think that if you want to be a plumber, you're going to have to do a bunch of stuff in the physical world. You're going to have to get good at that. And that is probably going to require some sort of embodiment. You know, when the question says physical environment, I would also say simulation is going to get really good. Obviously, NVIDIA's GPUs are not just good for training. They're also good for simulation. So we're going to see more and more simulation being used in all facets of AI training, including for robotics. But if you count that simulated physical environment as a physical environment, I do think you need it to do physical things. And ultimately, like a language model can't be a plumber. I think you can be a lawyer without ever having any physical embodiment. I think you could be an accountant. Arguably, your Excel spreadsheet maybe is your physical quote-unquote environment there. But yeah, I guess basically my intuition is that you need training in something like the environment that you're going to operate in. So as long as you're just operating in language space, language is probably enough. You want to start to operate in pixel space, you're going to need to be trained in pixel space. If you want to operate in spreadsheet space, you're going to need to be trained in spreadsheet space. If you want to operate in physical real world space, you're going to need to be trained in a combination probably of simulated physical real-world space and to some degree, actual real-world space. And it's going to happen. You know, the other question, the other way to think about these kinds of questions is like, we'll never know, because is it required? I think not for many tasks, but it's going to happen. So there may be some positive transfer. Will the AIs of 2027, 2028, even if I'm just doing something that is ultimately a purely language, say, legal task, will the best models that I can go to for those kinds of tasks also have some sort of physical real-world training as part of their overall training mix? It wouldn't surprise me at all. It wouldn't surprise me if there's some positive transfer there, if there's just some better intuition, if certain kinds of queries that require spatial reasoning just perform better because that kind of training is folded into the mix. And so I'm not saying that 2027, 2028 AIs, even of the chatbot or digital assistant variety, won't have any physical environment-based training going into them. I'm just saying I don't think you would have to do that to be successful, but it's pretty likely at least that it will happen. And so then it'll be kind of hard in the end to tease out like, was this required to happen? Could we have got here another way? It just seems like everything is going to be developed and kind of folded into to the degree that it works. And it seems like everything is working. It's all going to be folded into kind of the mainline frontier models. And so we will probably not have a clear answer. Could we have got here without doing any physical training? My bet would be counterfactually yes. If it adds even a little bit, then it'll be part of the overall mix. And so, you know, what we'll actually have will be AIs that are kind of trained on everything, regardless of whether some of those things could have been, you know, cut with minimal performance loss or not. Okay, next section gets a little more focused on tooling and kind of AI engineering concepts.

[1:43:06] Are you noticing any emerging standards or winners for tooling emerge across the companies that you work with? This is interesting. I would say I don't really have anything super shocking to report. I think in terms of models that people are using, Claude for coding is typically the go-to still. Certainly get very good reviews from people on GPT codex, but Claude still seems to be the go-to for most people. If they didn't have Claude, there are certainly other great options out there. Gemini 3 is also excellent. But I'd say Claude remains the kind of consensus top choice for coding. Open AI is kind of pretty good at everything and is probably the default thing that most people use for like in-browser random queries on a day-to-day basis. And Gemini Flash is, I would say, pretty clearly the top choice for things that don't require frontier capabilities and where speed and cost is a significant factor. And I don't think that's really surprising. We're out of step with like mainstream online consensus at all. I really haven't seen anything that is, you know, that challenges the mainline narratives at a certainly at a model provider level. Now you go to the sort of tool level and yeah, I don't know. I think, again, it's like, I think it's, I think the market is like getting away from most people. I think most people are, because things are changing so fast, most people are behind. You know, I think most people are not using the best thing, which is maybe, you know, normal life in general. For example, I've used Langchain on a couple projects recently. recently, One at Wayman. Waymark and one at another company. And I'd say it works pretty well. You can build agents in it. You can have those agents hosted on their infrastructure, which can be pretty cool. You can just log traces to it. It's a pretty heavyweight UI with a lot of features. It can be a bit of overwhelming data presentation for many people at first. It does take a little like getting used to, but I'd say it works pretty well. And until such time as I'm not happy with it or hear some note from somebody else that like there's something that's just dramatically better, I'm pretty content with it. And I think a lot of people are kind of in a similar spot where it's like, man, I can barely keep up with model releases. And tool releases kind of feel like a second tier question where as long as I'm like meeting the need that I originally had, I'm probably fine. And so even though there might be better things out there, I think it's just so overwhelming to go try to shop effectively that it's like people are kind of staying the course, you know, with whatever decision they made quite a bit. Another interesting direction that we're going to start to see develop here and already are starting to see develop is, of course, that the model providers are trying to become platforms and they are building out more and more of this stuff themselves. So of course, OpenAI has their like agent builder type stuff now that competes directly with many agent builders that were built on their platform. They also have like observability of various kinds. Anthropic bought Human Loop, which was a past guest. And I was a happy Human Loop customer for a while, but now they're part of Anthropic. So it's going to be interesting to see. And of course, Google is going to just build everything over time to probably somewhat varying degrees of quality, but they're going to have, in the end, like the most robust portfolio of products, probably of any of the top-tier model providers. So what the dynamics will be there between how often it makes sense to just pick a platform, go with their model, go with their observability, go with whatever tooling they have, whatever monitoring they have, versus try to maintain flexibility and ability to upgrade a model quickly on a new release, which would potentially require you to have a more horizontal layer for these kind of observability questions and all sorts of tooling questions. It's going to be interesting to see how that develops. If I'm a big enterprise, I want to avoid lock-in and I want to invest potentially more than I really have to in some of these things so that I at least have a little bit more ability to control my own destiny and I'm not totally beholden to one of the platforms. If I'm a startup or if I'm just an individual or if I'm an individual just doing a one-off project, then maybe I just kind of take the convenience and use OpenAI observability or use Anthropic's observability because I'm already using their model and it all just kind of integrates and makes things easier and faster.

[1:48:12] But all of this is to say, I don't think that there is any super obvious major trends that I'm seeing. Others may have better answers. I would certainly, if you feel like this answer sucks and you have a better answer, write me and let me know. And maybe we'll even do a full episode on it. But yeah, and I would even say like just listening to the Latent Space podcast, I take quite a bit of pride in the fact that the cognitive revolution has been voted twice at the Latent Space AI Engineer World's Fair events as the number three podcast for AI engineers. Latent Space has been voted in both of those surveys number one. And in listening to them, it also does feel like the sort of content mix has trended away from these kind of tooling questions. Of course, it's still like an important element, right? You need to be able to look at traces, but is it, you know, is it really where like people's mental energy is going right now? It feels like it's less topical than it used to be. And I think you can even see that in their mix of guests. And then just one other anecdote I'll give on this is when I was vibe coding the Christmas present apps that I've mentioned a couple of times, the one that I was coding for my mom, the custom trip planner, it involves like AI research into all these various things, right? So there's like a prompt to go find restaurants and she's gluten-free. So we're like really trying to dig into like gluten-free options in all these different places. And when it comes to where she wants to stay, she loves to have a nice view and wants a balcony. So, you know, there's the prompts are customized to that level as well. I, you know, in case you're early on in the development, and even still now, as I continue to kind to kind of work with her and, and you know, try to enhance it in ways to make it more valuable for her, because she is she actually. is actually using it quite a bit. I wanted to look at the traces of what are the inputs and outputs. What prompt is the model seeing and what is it giving back in raw form so that I could kind of debug that or know what's working and not working at a level below the UI that she as a user is using. And so, what did I do? I just added a trace function by just prompting Claude Code to say, hey, I want to see all the history of the queries that are made to you. Can you add a tab to this application that just gives me direct access to the full history? And it just built that thing, I think in maybe one prompt, maybe it was like two prompts. And now I have, you know, right alongside all the core features of the app, a debugging tab where I can go in and just look at the history of the prompts. So I think that's another factor that is kind of challenging. And this is like, you know, maybe future of SAS writ small or, you know, future of SAS in a nutshell. What would I have done in the past, right? If I didn't have Claude code to code that kind of thing, then maybe I go make a free trial account on Human Loop, or maybe I, you know, wire it up to Langchain or whatever. But I didn't do any of that. I just said, hey, Claude, log all these things and give me a tab where I can see them. And that's been perfectly good. Could it be more elaborate? Yeah. Could it be more full-featured? Sure. Like, you know, it's, but it meets my needs for the development of this particular app. And it basically took, you know, I can say confidently, it took me less time to prompt and have Claude code do it than it would have taken me to go find some solution, you know, figure out what it was, figure out how to connect into it, et cetera, et cetera. And so I suspect maybe that is also a part of why people are talking about this stuff less because it's just become like a lot easier to meet the basic needs in some cases just by like coding it from scratch, you know, with a couple of prompts. So yeah, I would love to hear more about this, you know, again, from other people. Okay. Next question. This one's kind of an interesting one. And it wasn't actually part of the AMA, but it was a question that I was asked. So I had a little interaction. I just posted, actually, this is in the context of Vibe coding another app. So the app that I vibe coded for my dad for Christmas was this stock trading strategy backtester. He has these ideas. What if we did this? You know, what if I, every week, what if I bought the stocks that lost the most last week and tried to catch them on the rebound? Okay, fine. Is that going to work? Is it not going to work? The app that I created for him allows him to articulate a strategy in natural language, translate that to trading rules, and then go back and look at some time interval and see how would that trading strategy have done over that time interval, just systematically executing those trading rules. Pretty cool. He hasn't used it all that much, to be honest. But in the context, in the course of doing that and testing it, I was testing that strategy of buying the biggest losers and then trying to catch them on the rebound. And I tried it once on an annual basis. What if I bought the biggest losers from the last year, held them for the following year, and did that every year? Would I beat the SP or would I fall short of the SP? I was sure there was a bug when in 2022, I think it was, NVIDIA was one of the biggest losers. It was down like 50% on the year.

[1:53:30] And I was like, okay, something's clearly wrong with that. No way NVIDIA would have been one of the biggest losers for a whole year. Turns out it was. And so the AI was right. And I asked the question and Claude Code went and like verified online that indeed this is right. Like NVIDIA was one of the biggest losers of that year. So the app is working correctly. You were just wrong in your assumption. This is also another one of these moments where I noticed sycophancy on the decline because it would have been very easy for the model to be like, you're absolutely right. NVIDIA has been a killer stock. Like there must be a problem in the code. It did not do that. It came back and said, no, you are wrong. 2022, NVIDIA was one of the biggest losers and the app seems to be performing correctly. Okay. I posted that online. Here's the crazy stat. In 2022, NVIDIA was one of the biggest loser stocks in the market for the entire year. And Holly Elmore, executive director of Pause AI, came by with a, I would say, a critical comment, as she often does, and basically said, like, you know, this isn't Sports Center. Basically, like, you are morally out of line for engaging in, you know, fun or sort of, you know, trying to make yourself look clever or, you know, show insights by noticing these quirky things in the AI space because the whole thing is bad and it ought to be condemned and you ought to be condemning it. And doing anything else, basically, in her view, is morally reprehensible. And I was like, okay, yeah. Do you really think this is like an effective way to advocate for your position? Because I think everybody who listens to this feed, and if you're two hours in with me on this episode, that I'm like seriously concerned about AI safety issues and like do not want to see a race to recursive self-improvement. Big decisions that are going to be made in the next year or two around the automation of AI RD, I think are very, very big and important questions. And I don't think we're ready to cross some of those thresholds or Rubicons, if you want. And so I said to her, look, I think I'm much more sympathetic to your cause than most people. I did sign the banned super intelligence statement three or four months ago, for example, as just like one material or one tangible indicator that I'm like much more sympathetic to her cause than most. But do you really think that coming after me over a tweet about some random observation about NVIDIA stock is the way to advance your cause? And then she asked me, well, how did my comment make you feel? And I thought about that a decent amount and decided to address it here in the AMA, even though that's not what she was asking for. And I think my bottom-line sense of this sort of thing goes back to a mantra that I used to say a lot more often, which is that we should really try to avoid psychologizing others' AI takes and focus as much as we can on the object-level facts of what is actually happening, what does that imply, and just take people's statements and positions at face value. There is such disagreement in the space. There's such uncertainty. The AI 2025 forecast results. Again, I got top 5% with predictions I don't think look all that super accurate. And of course, famously, we've got Turing Award winners who have extremely different positions on how dangerous AI is going to be. And I think they're all genuinely held. So I think the way that like being sideswiped, attacked, or like, you know, accused of moral corruption online, because I made one random comment about NVIDIA stock performance from a couple of years ago, I think the way that made me feel was like kind of indignant, confused, averse a bit to the person saying it and the cause itself. And this is a cause that I genuinely, you know, I'm not like advocating for exactly a pause on AI. I don't know, again, like, what does that even mean? Whatever. I don't think we should shut it all down. I do think there is tremendous upside. I do think that like we should be very careful and we should, you know, we should be willing to slow down, if not pause at some point when we hit levels that we really might not be able to control. But I also think like the progress has been clearly much more beneficial than it has been harmful so far. And I've lived that in recent months. And so I kind of felt like alienated from the AI safety movement, or at least the more, let's say, strident or shrill voices in it by that comment. And I think my takeaway from that is I don't think I wouldn't go so far as to say we shouldn't shame people, but I think we should shame people very carefully, very selectively, and only for what they are doing. I think it is pretty defensible at this point to shame XAI for some of the things that they have released in Grok.

[1:58:40] I think it is appropriate to shame XAI for having Grok on Twitter undressing women with seemingly no guardrails in place. That is worthy of shaming, I think. But that's an action. I would not shame people for, or I would not assume bad faith. I know that not everybody's positions are fully rightly taken at face value, but if somebody's going to be out there engaging in the discourse, I think that projecting your sense of their psychology onto them and arguing from that basis, it just ends up with people more often than not feeling bad, being increasingly bitter toward each other, more sort of calcification or hardening of factions, more sort of like sense of somebody's my ally, somebody's my enemy. And I don't want any of that. I think the healthiest discourse that we can have around AI assumes that everybody's trying their best, ideally gives people the space to genuinely be trying their best and avoids this sort of kind of sideways accusations or moralizing or psychologizing, unless people are really doing things where you're like, you are undeniably fucking up. And maybe even you have a pattern of undeniably fucking up, in which case, sure. So I would say to Holly, your question did not make me feel very good. It kind of made me feel alienated from you and your cause. And I wouldn't recommend doing that. I do think if you want to go protest outside XAI, I would support you in doing that. Choose your targets a little more carefully. Try to recruit people like me to your side as allies and shame people selectively for things that they've really done wrong where they are responsible. I think that's fine. But don't just start using these tactics everywhere because it's just coarsening the discourse and making everything a little more bitter and contentious than it needs to be. And I don't think people do their best reasoning that way. Certainly, I don't think I do. Okay, these are the only three AI questions that I thought made the cut. And I probably generated across three models, well over 100. So definitely give this one to the humans in terms of questions that I wanted to answer. But here's three questions from AI to wrap us up. First, you're in Michigan, not San Francisco or London or Washington, D.C. Does that distance help you or hurt you? What do you miss by not being, quote, in the room? I would say this is definitely a real issue. It definitely hurts to be outside of these core hubs. And obviously, San Francisco is like far and away, number one, and London is like two, and maybe as close. I mean, I don't know, whatever. San Francisco and London are both like the hubs. DC is increasingly becoming a hub because there's so much policy going on there. It's a very different hub, I would say, from the San Francisco and London hubs. And I'm not even sure it really belongs on that list, but that was the way the AI phrased the question. So being outside of San Francisco and London, I do think it makes it harder than it is if you're in those places to stay up to date, you know, to be in the loop, to have the sort of zeitgeist. There's definitely a lot of stuff happening in person in San Francisco, events going on all the time, hackathons going on all the time. To some degree, secrets are being traded or spilled across frontier model developers at the proverbial parties. The fact that there is an emerging trend of rooms at San Francisco house parties where there's like a no AI talk room, that just goes to show like how much AI talk is going on. And the AI talk there is definitely way more sophisticated than it is anywhere else. So I think that's my honest sense of the reality. And I'm able to compensate for it pretty well by being hyper online. So certainly like spending a ton of time on Twitter still is like part of where I keep up to date, no doubt about that. The podcast itself is also really helpful because I do get to have substantive conversations with very, you know, plugged in people who are in those rooms much more often than I am. And then occasionally I do try to go to events. So having gone to, for example, the curve each of the last two years or last year going to the summit on existential security, these are also gathering places where you can get a very concentrated dose of exposure to the leading thought. I think it does really help to be there. So I feel like I'm missing out on that to some extent, but due in large part to the podcast, I am able to compensate for it to a significant degree. But being outside of those hubs, I do think you have to be much more intentional about how you're going to compensate. And it does probably end up meaning, you know, in the Bay Area, you could probably be much less online and still equally or even more plugged in.

[2:03:55] But if you're not in those places, then online, I think, is really the main place to get it. And also, you know, I would definitely make some occasional trips to those places to be in the room because I do think that is a really valuable way to learn and make sure that you stay up to speed. And you look at the results of the AI forecasting survey from last year. And, you know, you got Ryan Greenblad at number two. You got Ajaya Catra at number three on the leaderboard. Like that is not an accident, right? Those people are extremely well informed. And it's because of the social context that they find themselves in. In fact, Ajaya said that. She said, my method for the survey was talking to Ryan and then getting a few more things wrong than he did. So the leading thought leaders do know each other. They do communicate a lot. And that environment, you know, is, I think it really does help to spend at least a little time in it. Okay, next question. As a survival and flourishing fund recommender, You see the landscape of safety organizations up close. What's underfunded that? that shouldn't be? Again, that's an AI question. I think my big answer here is neglected approaches. I'm wearing my AE Studio ACDC themed swag hat, and I do think the neglected approaches approach, as articulated by AE Studio, is a great answer, great meta-level answer to this question. And as a reminder, they did a survey of people in the AI safety field. They asked them, do you think we have all the ideas that we need to be successful in AI safety in the big picture? The answer was no, we're going to need more ideas. And so, like, clearly, the community thinks that we need more ideas. The community thinks that there are not, we do not have all the answers that we're ultimately going to need. So, I think that things like what Janice does in terms of just being like super deeply engaged with language models and really trying to understand their characters and their tendencies. I think that stuff is really good. What Elios has done similarly with like model welfare tests, I think is like really interesting as well. I think AE Studio's own work in terms of self-author overlap is, you know, I come back to that all the time. What Emma Shir and the team at SoftNax are doing, all these sorts of things where it's like, can we find creative ways to either design or train or somehow get into equilibrium with AI systems in ways that feel more stable, that feel like they could be, you know, the beginning of some sort of stable equilibrium. I think those ideas are dramatically underdeveloped. They are often pre-paradigmatic. They are often developed by kind of weird people, some of whom I think would wear that label with pride. They are sometimes like intersect with kind of non-scientific ideas or woo sort of ideas or ideas about AI consciousness, which are obviously like very hard to prove and don't feel intuitive to many people. But I think all those sorts of things are, I think we should have more of them. And my kind of call to action there is like, if you have a weird idea that you've never heard anybody else talk about, I absolutely think it is worth trying to develop that idea. Most of the time, it's not going to go anywhere. Certainly, most of my idle, you know, shower thoughts do not turn into anything great. But the field collectively believes we need more ideas. So where are those ideas going to come from? They're probably, at least some of them, are going to come from people from other fields, people with just very unusual or idiosyncratic ways of looking at the world, people who interact with AIs in very unique and particular ways, people who find inspiration in biological systems that they can map onto AI systems in ways that other people aren't thinking about. I think all of that stuff is dramatically underdone. And arguably, the whole AI safety landscape is underfunded. So I'd love to see more resources go into interpretability. I would love there to be not just one good fire. I know there are a couple other organizations, for-profit companies that are working on interpretability type stuff, but I would love there to be significantly more work going into interpretability. If we scaled that up by an order of magnitude, I think that would be great. Some of the stuff that Redwood Research is doing, where you take the assumption that models are going to be out to get us and then try to figure out how can we work with them, I think that is also really underfunded. For me, that's just like, I admire that work so much because it feels so hard to me. Like feels so depressing to work under that paradigm and try to make it work. But Lord knows, you know, that might many, many scenarios that, you know, the kind of work could be the thing that saves us. So I think a lot of things should be scaled up.

[2:09:11] Probably the thing that I'm like, there's probably enough going on is what the frontier companies are doing, which seems to be trying to get the current model to be aligned enough to like supervise the training of the next model by like, you know, any number of things, data filtering and, you know, RL AIF type techniques, all that kind of stuff. Anything that kind of is in the recursive self-improvement realm, I would be a little less inclined to write a new check for because it seems like that's what the companies are doing. And if anything, I think they're probably going too fast at that relative to all the other things that we could be pushing on. But yeah, for me, the weirder out, the farther out you get, the weirder you get, the more I think there's There going are going to be obviously many more misses than hits, but those hits could be really valuable. And, you know, when I see something like self-other overlap, I'm like, yes, this feels like something that is just so underdone, has so much potential. And I would love to see more people, you know, of all kinds of idiosyncratic persuasions trying to develop those kinds of ideas. Okay, last question. Turpentine got acquired by A16Z. You mentioned that you negotiated for editorial independence. What did that negotiation actually look like? How did it go? Again, that's an AI question. And I think that probably came out of ChatGPT because, or possibly Claude, but I think it was ChatGPT, because I did use multiple AIs to review the contract as we went through the process. So it was very well aware of that negotiation. And honestly, I have to say, you know, it was credit to Eric and credit to, I guess, A16Z more broadly. I'm not sure, you know, who all was involved. Eric has been my main contact person for all of this. It was honestly very smooth and pretty much entirely painless. I did already have, via earlier agreement with Eric, an editorial independence clause in my agreement with Turpentine. And so that was a good starting point. I was a little concerned, honestly, to be totally candid when the deal was made. And Eric called me on a weekend. I was like, hey, so I got an update for you. We're going to be joining A16Z. And I was like, oh, that's interesting. Mark Andreessen blocked me on Twitter a long time ago before I ever even interacted with him on Twitter. I think it was famously, he supposedly blocks people in mass who just like a tweet that he doesn't like. So I was probably mass blocked with many other people, but I sort of said, hey, Eric, the dude blocked me on Twitter. I've never even met him or talked to him. But that does, you know, of course, in the Techno Optimist manifesto, he did have an enemies list, which is not something I generally think people should be doing. I would not be, I would not recommend publishing enemies lists for almost anyone. So I was like, you know, I'm a little concerned because it's pretty clear to me that some of the things that I think are important and value and want to advance in this world are on the enemies list for A16Z. I really do want to like make sure we reaffirm the editorial independence that I did already have codified, but really want to make sure that it was super solidified going forward. And yeah, there was no problem. We worked through a couple turns on the agreement and it was pretty smooth sailing. And basically, I think I requested something very reasonable, but basically they agreed to it with no, no real substantive pushback. You know, a little tweak wording here and there and that kind of stuff. But like overall, it was pretty smooth sailing. So where it landed is I do have an explicitly written, contractually agreed upon right on this feed to say whatever I want. That includes criticizing A16Z, criticizing partners, criticizing portfolio companies, disagreeing with their stance on policy questions. Basically, I can say whatever I want, and it's totally fine for it to contradict their policy positions or to say that some of their investments are dumb or whatever. I can say anything that I want and that is positively affirmed in the contract. So I give a lot of credit to them for being willing to do that. And the only kind of pressure release, Valve or like off-ramp that is in the contract is that if at some point for whatever reason, A16Z decides that they just don't want to be affiliated with me anymore, you know, because I say or advocate for or do something that is just too much at odds with the agenda that they're trying to advance, then they can just release all of their interest in the intellectual property of the podcast to me. And that would include like the feeds and the logos and that stuff is kind of jointly owned right now. Like I have the domain and control of the website and they have like the YouTube feed and whatever. So we're kind of mutually dependent at the moment.

[2:14:22] But if they ever wanted to, they could just give me all those things and tell me you're on your own now. And I hope that doesn't happen. And I don't think it's likely to. I certainly won't be afraid to be bold on this feed if I feel like there's important stuff to talk about. But, and I have, you know, actually, it was funny because the first day Eric called me on a Sunday to tell me about that deal. And I had already recorded with Zvi on Friday, two days before, at a classic Zvi episode, which was going to come out the next day and did come out the next day. And in that episode, Zvi. Zvi accused Andresen of perjury before Congress for having said that basically interpretability was solved. And I was like, Eric, you know, this is coming out tomorrow. And he's like, eh, I don't think they're really going to care. I think it'll be fine. And so far it's all been fine. Obviously, they're big boys and they can take some criticism and they've been willing to put that in black and white. And I do hope that over time, as we learn more about the overall shape that AI development is taking, I do really hope that the accelerationists and the AI safety people, that the accelerationists and the AI safety people can realize how much common ground we actually have. I think on an overwhelming number of questions, I'm probably going to agree with A16Z. There are some where I definitely don't. And I'm certainly not going to, especially now that I have this contract in place, I'm certainly not going to shy away from that. But I also don't want to pick a fight unnecessarily. I'm certainly not going to, I'm going to try to follow my own advice from 20 minutes ago and not psychologize their takes. I'm going to assume, and I, you know, I tried to assume this kind of in general about like powerful people. It's not always the case, but like, I think that they, first of all, they're rich as can be, right? They don't need more money. Why are they doing what they're doing? I genuinely think that they're trying to advance the human condition. I really don't think that multi-billionaire, deca-billionaire folks like Mark Andreessen and Ben Horowitz are making their decisions at this point based on lining their own pockets. I really don't think that's the case. I can't rule it out. I don't know them, but I really don't think that is the case. I think that they are trying to advance the human condition and they are trying to, you know, make sure that we don't go stagnant, that we don't allow fear of change to prevent us from realizing a beautiful future. So I take them at their word. That is what their motivations are. And I share a lot of that, some different opinions on some, I think, pretty important questions, but I think that there's a lot more alignment than has at times been assumed online. And I think that's true, not just for me, but a lot of people that are fundamentally interested in AI safety, as I said earlier. I hope that I can be critical from time to time or, you know, at a minimum disagree and potentially even go into criticism without it fundamentally breaking the relationship. But I do have the assurance that I can continue to do this podcast and continue to reach you, the audience that has subscribed to the feed. And if you're, again, listening this long into the podcast, I appreciate that. And I don't take for granted at all that people want to follow me on this learning journey. And I just wanted to make sure that I had the ability to speak freely, speak my mind, say what I think is important without having any fear that I would lose the ability to continue to use the modest platform that we've built. And that is in place. So I feel really good about it. I appreciate Eric and A16Z broadly for making that a pretty smooth and painless process. And it gives me a lot of confidence that I can just keep doing this and just keep calling it how I see it. Again, I'm not going to try to create conflict where it doesn't need to exist, but I will say what I think. And it's a very privileged position to be able to do that and even make part of my living doing it. I definitely don't take that for granted. So thank you again to Eric and A16Z for making that as smooth as it was. Thank you to everybody who has listened. Again, I really don't take this opportunity for granted. It's a lot of fun and it's thrilling every day to get to wake up and think about what do I want to learn today? What feels important? How do I go make sense of this crazy AI wave that we're all riding together? It can be a little scary at times, but I do love the challenge that it presents me. So thank you all for making that possible. And in closing, thank you for being a part of the Cognitive Revolution.


Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.