Feeling the AGI with Flo Crivello

Feeling the AGI with Flo Crivello

In this episode of the Cognitive Revolution, Nathan engages in a critical dialogue with Flo Crivello on the trajectory of AI development.


Watch Episode Here


Read Episode Description

In this episode of the Cognitive Revolution, Nathan engages in a critical dialogue with Flo Crivello on the trajectory of AI development. They touch upon the imminent arrival of AGI, the potential risks of a US-China AI arms race, and the complexities of AI safety and international cooperation. Explore their perspectives on regulation, technological progression, and the ethical dilemmas facing the AI community. This conversation is essential for anyone interested in understanding the challenges at the intersection of AI technology and global politics.

SPONSORS:
Oracle Cloud Infrastructure (OCI) is a single platform for your infrastructure, database, application development, and AI needs. OCI has four to eight times the bandwidth of other clouds; offers one consistent price, and nobody does data better than Oracle. If you want to do more and spend less, take a free test drive of OCI at https://oracle.com/cognitive

The Brave search API can be used to assemble a data set to train your AI models and help with retrieval augmentation at the time of inference. All while remaining affordable with developer first pricing, integrating the Brave search API into your workflow translates to more ethical data sourcing and more human representative data sets. Try the Brave search API for free for up to 2000 queries per month at https://bit.ly/BraveTCR

Head to Squad to access global engineering without the headache and at a fraction of the cost: head to https://choosesquad.com/ and mention “Turpentine” to skip the waitlist.

Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off https://www.omneky.com/

Recommended Podcast - The Riff with Byrne Hobart
Byrne Hobart, the writer of The Diff, is revered in Silicon Valley. You can get an hour with him each week. See for yourself how his thinking can upgrade yours.
Spotify: https://open.spotify.com/show/...
Apple: https://podcasts.apple.com/us/...


CHAPTERS:
(00:00:00) Introduction
(00:11:14) GPT-4.0
(00:14:56) AI arms race
(00:18:23) Devon and GitHub workspaces
(00:20:08) Sponsors: Oracle | Brave
(00:22:16) The AGI discourse
(00:25:29) The invisible protection
(00:29:33) AI arms race
(00:34:41) Stalling out
(00:38:33) P(doom)
(00:43:38) China
(00:43:38) Sponsors: Squad | Omneky
(00:45:24) Government
(00:48:05) What should we do?
(00:51:35) Open sourcing models
(00:54:17) Manhattan Project for Alignment
(00:57:26) Current State of Play
(01:01:39) How much more mundane utility can we get?
(01:04:54) How much of what we are building is future proof?
(01:07:41) Adoption, acceleration, hyperscaling, pause
(01:10:17) OpenAI leadership
(01:14:32) Future of Live Players
(01:17:38) Big Tech Singularity
(01:21:30) AGI is Coming


Full Transcript

Nathan Labenz (0:00) Hello and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas, and together, we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz joined by my cohost Erik Torenberg. Hello and welcome back to the Cognitive Revolution. Today at the end of another dizzying week in AI and with my voice well on its way to failure, I'm speaking with returning guest, Flo Crivello, founder and CEO of Lindy AI, to chew on the week's top story, which was, of course, the situational awareness manuscript from Leopold Aschenbrenner, formerly of the OpenAI Super Alignment team, and to try to make sense of where we are in the big picture. If you haven't already read the article or heard the accompanying Dwarkesh podcast, the short summary is that AGI is near, superintelligence is likely not far behind, and particularly in the current geopolitical context, it seems almost inevitable that The US and Chinese governments will begin to compete for AI supremacy, likely ending up in an extremely dangerous AI arms race. Leopold's hope is that The US can maintain enough of a capability lead to give researchers the time they need to solve critical AI safety problems before deploying highly powerful systems. And then, having achieved a decisive strategic advantage, offer China a benefit sharing deal that they can't afford to refuse. While such a scenario might have sounded fantastic not long ago, in today's world, try as I might, I can't really find a major hole in Leopold's argument. All the Frontier Lab leaders are talking about AGI on a 2 to 5 year timeline, and the US government is already starting to feel the AGI itself. So as a most likely scenario, I think it's a pretty reasonable sketch. And yet, I find the idea of an AI cold war so scary and honestly so tragic that I feel compelled to at least try to imagine an alternative. While it seems almost impossible that The US and China could trust 1 another enough to work together on such an important issue as AGI development and safety, and while as I've said many times, I would not want to live in Xi's China, I personally would rather take our chances trying to work with the CCP than hoping we can solve AI safety in a super short time frame under the intense pressure of potentially winner take all international competition. After all, as we talked about last time Flow was on the show, the Chinese government does seem to understand that AI poses a threat, if only to its future political control. And just this week, a Chinese firm has released a Sora like video generation model, which suggests that Chinese AI research isn't necessarily all that far behind. All that said, as you'll hear in this conversation, I still have a lot of uncertainty on a number of core issues. Will GPT-five pose major risks? My best guess is no. GPT-five class models will probably still be in the sweet spot, where they're powerful enough to be super useful, but still safe enough to deploy, though perhaps not safe enough to open source. Will regulation, like the recently SB 1047, do more harm than good? Flo and I are both longtime libertarians and well aware of the potential, even the likelihood, of unintended and unexpected harms from regulation. But given Frontier Labs' apparent inability to self regulate, we are both inclined to support it as a starting point. And I, for 1, could imagine myself beginning to advocate for an outright pause in further scaling in the not too distant future as well. In the meantime, is my adoption accelerationist hyperscaling pauser position even coherent? I do still think so, but the argument that adoption inevitably hits the limits of current models, which in turn invites further scaling, can't be dismissed out of hand either. It is a strange time all around. Flow and I both continue to love exploring AI technology, building products with it, and imagining the many ways it's likely to improve the human condition in the years ahead. And yet, I can't shake the sense that the situation as a whole is at risk of spinning out of control. And not only is there very little that I can do about it, but in some ways, I am even contributing to the dynamic. I certainly don't have all the answers, but I hope that by speaking candidly about these issues, I can play at least some small part in steering the world toward the best possible outcomes. As always, we'd love to hear your thoughts. Do you find value in this sort of real time reaction episode, even though it raises more questions than it answers? Or would you rather I stay focused on concrete research and applications? I look forward to hearing from you either via our website, cognitiverevolution.ai, or as always, you can DM me on your favorite social network. For now, I hope you enjoy this weekend in conversation with Flo Crivello. Flo Crivello, CEO of Lindy and Feeler of AGI. Welcome back to the Cognitive Revolution.

Flo Crivello (4:50) Thanks, Nathan. Yeah. It's it's becoming a tradition. Glad to be here as always.

Nathan Labenz (4:55) So you were just saying you are most of the way through the talk of the week, which is the Leopold Aschenbrenner situational awareness manuscript, and apparently, you're losing sleep over it.

Flo Crivello (5:07) Yeah. If you haven't read it yet, you you should. For the record, it's not the first time I literally lose sleep over AGI. But, yeah, I slept 5 hours last night. I'm I'm quite tired. It's it doesn't make for this is probably not a good idea to read this right before going to bed. It's it's pretty tricky. I think the argument is very cogently laid out. At this point, I don't understand how people don't freak out, honestly. I think if you if you understand what's going on, you should freak out.

Nathan Labenz (5:33) Yeah. Yeah. I tend to agree, and I might we'll circle back to this later, but I think there's a big question of whether or not this you know, we could we could assess this thesis on multiple dimensions. 1 is, like, how accurate it is, how compelling it is. Another is, is it an idea worth promoting orthogonal to its potential accuracy? And but I think before we do that, maybe let's build up the case and get people to the point where, hopefully, they're with us in terms of freaking out at least a little bit. I think 1 of the things that people hear and repeat without being maybe as critical as they should be about it is the idea that we're probably hitting a wall now because we haven't seen anything better than GPT-four, and that's been, jeez, 18 months plus since OpenAI finished training that. So that means we're pretty much stuck at GPT-four. I think we would both agree that is not really an accurate description of the last 18 months, but how would you rebut that claim for starters?

Flo Crivello (6:37) I'll I'll start by saying that people have been saying that for 10 to 20 years or something like that, that deep learning was hitting a wall. It's a meme at this point. Right? That's the first thing. The second is it's not true that we've not seen anything new since GPT-4. We've seen we have seen models that are basically a GPT-four level of performance, sometimes greater for literally about a 100 a hundredth of the cost and about 10 to 100 x the speed of GPT-four. So I think like Gemini 1.5 flash, I think is greatly under discussed. It's insanely good of a model. It's very cheap and very fast. And that's a big deal. And I think the cool thesis of the AGI-pilled people is the scaling hypothesis, right? It's, hey, if you can make it bigger, faster, if you're gonna throw more compute at it, it's like intelligence emerges as a result of that. And I think that not every inference optimization is also a training optimization, but by and large, when you see this kind of inference optimization of basically a 100x, something also happened in the training pipeline. And in parallel to that, you are seeing obviously context windows all becoming infinitely bigger. 18 months ago, we had 4,000 tokens context windows. Today, we have 1,000,000 tokens context windows. That's pretty unreal. We have the Mamba architecture that's starting to be shipped, and it's working from what I'm gathering. We're on track. GPT-five is going to take people by surprise. I have no insider information, but I feel like GPT-five is gonna come out in, like, q 3, q 4 this year. It's going to take people by surprise.

Nathan Labenz (8:17) Yeah. I think Flash is a great data point, and you hit the key points there. And just to give a a use case example of this, which I've maybe talked about in a couple episodes, but I tried to get it to write a character sketch of me based on the last 250 emails that I sent. That turned out to be about 250,000 tokens, which is outside of the context window for any other commercial provider. Anthropic maxing out at 200,000 still doesn't get quite there, but Flash has the million. So it literally has a 100 times longer context window than the original GPT-four. Now it's only 8 times longer than the latest GPT-four, but still over a 100 since when people are referring back to perceived flattening. And it is 1 to 2% the cost, and it is a lot faster, and it did a phenomenal job of writing a character sketch of me based on just this data exhaust, something that, I didn't even really prompt engineer or optimize. I literally just said, here's, like, a big boatload of emails out of somebody's account. Can you synthesize this into a useful sketch? And it really was super impressive. That took about 45 seconds and cost under 20¢. That is the kind of thing that and just for contrast, right, to do that back in the original GPT-four would have been chunk it down into, like, lots of little pieces, few emails each, try to summarize those, summarize the summaries. And honestly, I don't think you could have expected to get anything as good as what I got out of just a single call just because so much would be lost at each step and you'd have to ladder up to the final summary. And but, yeah, the just the ease of that, the cost, the speed, it really is incredible. Maybe we can just refine a few of these other things. I think GPT-4o is also really under discussed mostly because people haven't actually had a chance to play with it yet. But I'm sure you've seen the tweet from Greg Brockman where he used the model to generate an image of somebody writing on a chalkboard where the content on the chalkboard is probability of text, comma, pixels, comma, audio. And the obvious implication of that, in addition to just flexing the image generation capability of the model, the obvious implication is that this is a single architecture, might be slightly modified transformer at this point, who knows? Could be some state space stuff in there, but it's handling all of these different modalities on par in such a way that we're no longer pipelining data through a transcription and then put that text into the model and then get text back and then maybe generate an audio or or create instructions for DALL-E 3 to then generate the image. But it's literally all happening in the same space And the generality of that is really going to be a huge deal as people actually get their hands on it. We don't have it yet, but when I think about what I would build with that, I it gets pretty crazy pretty quick. The fact that they solve the interruptions too in such a nice easy way, it's like the speed with which they're able to convert the tokens in is and and it seems that that they're modeling I wanna know more about how they do this. This. I'm sure many people do, but it seems like the fact that it handles this interruption so smoothly suggests that it is putting its own audio and your audio right into the same audio mix. And so it's immediately able to stop when it realizes that there's some other audio in the same space superseding it. Just becomes Pretty awesome.

Flo Crivello (12:00) Yeah. It's the bitter lesson strikes again. Right? Instead of engineering your way around interruptions, it is what's the threshold and how many seconds and all that, it's just train it. Train it to figure this out, right? And it's just end to end giant ass model. I agree with you that like the multimodality of GPT-4o took me by surprise. Like I didn't expect to see this kind of heavily multimodal model come out so soon. By the way, it hasn't come out yet indeed. Now I actually find its reasoning abilities to be very underwhelming. I've actually moved away from GPT-4o for most of my workflows, but I think that's fine. I think the scaling hypothesis is going to take it from here. But, yeah, multiple of GPT-4o is awesome. So, yeah, I think I don't know what to tell you, man. Like, a GI is coming. I I will say, though, 1 thing I've updated in 2 ways, actually in recent months. 1 of them is, and both of them are in the book, right? 1 of them is if we don't get AGI by 2030, then probably we won't get it by 2040. It's like there's like this window right now where we've got like the next 5 years are going to be very critical. And the reason for that is because there's a lot of 1 time optimizations that we're going to grab during the next 5 years. So the data obviously is going to run out, like may not be that big of an issue, but we're gonna have to figure this 1 out. Most important to me, we're going to run out of dollars. Right? It's like there's a lot of scaling that's been happening just because we've been investing so much more. Right? But at some point, you can't really once you hit, like, a 1,000,000,000,000 training run, like, you can't really grow much bigger than that. So basically, there's like these orders, oh, and then there's like what he calls the unhubbling of the model, where it's like the low hanging fruits of the cognitive architecture. There's a lot of stuff around the model that we need to figure out. All of those are like 1 time gains. And so we have these like 3, call it 3 to 6 orders of magnitude of improvement ahead of us. Once we've grabbed them, if that doesn't lead us to AGI, then we are left with Moore's Law, which is pretty slow. It's half an order of magnitude a decade a year or something. And we're left with architectural improvements, which are also not super fast. That's So the first way I've updated. If AGI doesn't happen by 2030, 2032, I think then we're left with 10 more years of no AGI. The second way I've updated is on the speed of the takeoff. I used to be very undecided about takeoff. Is it going to take a day? Is it going to take 10 years? And I'm increasingly convinced that it's like 1 to 4 years or something like that. But I no longer decouple AGI and ASI as strongly as I used to. And so ASI is superintelligence. I used to be like, AGI, for sure, ASI, I don't know. Now I'm like, if you've got AGI, you've got the ASI period. Because you can automate AI research, and all of a sudden, you literally 1,000 x of bandwidth of AI research. It seems to me like a foregone conclusion that that leads you to exponential improvement, very quickly.

Nathan Labenz (14:57) Yeah. I've been trying to fight that notion in my own head, and this is previewing or tipping my cards on how I feel about the the value of promoting some of these memes because I'm like, I don't want to see us get into a AI arms race with China. That seems like a very bad scenario, and I would almost do anything to avoid it. And so I do find myself being motivated to say, maybe, cause I I think the path to some sort of AGI seems quite clear. And I honestly think it's it's not even really that much more than what we have today. In terms of when AGI will be declared, it seems to me almost more a function of how and when OpenAI wants to renegotiate its deal with Microsoft than anything else because they have the contractual clause where when the OpenAI board declares AGI to have been achieved, then they don't have to give Microsoft the IP. Obviously, they're running around trying to court other infrastructure providers and diversify their kind of power base away from purely being dependent on Microsoft. And Microsoft, of course, I think is well aware of this too and investing in their own internal research. But, like, at what point would they wanna make that move and be able to renegotiate? I think if they wanted to declare that in 2025, they probably could. It seems like you start to think about, like, how many un hobblings are really necessary from where they are today. The reasoning and the coding ability isn't like superhuman, but it does seem to be on par with your or maybe even a bit better than your typical knowledge worker. And if you imagine the sort of investment in post training of the sort that John Schulman is really on an unbelievable, run here in terms of creating historical artifact podcasts. The Schulman 1 I thought was it was noted, but it wasn't as noted as this last Leopold 1. But it was quite striking when he was like, when do you think this will happen? Well, maybe next year, maybe 2 to 3 years. But it was just like, yeah, we're gonna go collect a lot of kind of mid and longer, project execution episode data and teach the pattern of, like, how you run into obstacles and backtrack and come up with some other strategies and go around those obstacles. And that seems like it's probably well underway in terms of the the collection of that data. I would be shocked if they don't already have Scale and potentially several other partners already, like, tracking the work of knowledge workers in a variety of different verticals and annotating what they're doing, explaining what they're doing out loud. We don't have, like, chain of thought in the sort of raw Internet dataset, but it's not that hard to collect if you're just, like, prompted every few minutes to explain what you're doing as you're working to to and just dictate it via microphone. So it seems like they're gonna have that. It seems like the next generation of the the next big upgrade, in addition to being smarter, it should have these, like, mid length at least project execution capabilities with the ability to get stuck and restart and come up with other ways and gradually get there. And I think you could probably call that AGI if you just stuck to the very literal textual definition of something that can outperform humans at the majority of knowledge work. Have you had a chance to use anything like Devon or the new GitHub workspace?

Flo Crivello (18:28) I have not. No. I I usually deal all day, so I I see I see agents doing real stuff all day. Yeah.

Nathan Labenz (18:34) I think they're quite impressive. I think Devin definitely has an interesting paradigm where it's especially for someone who is not a full time coder themselves. With with the GitHub workspaces, my feeling was like it's very much a product for coders. It starts with an issue. You have to have a GitHub issue to start with. So for anybody who's like not a coder, right off the bat they're like, what's a GitHub issue? I've lost. Devin on the other hand, you just go to the website and you say, here's what I want you to do. And it assumes nothing in terms of any infrastructure that you have whatever it handles all that on its side. What's really interesting about it and feels like the future is that it just goes to work. You don't have to really do anything. It may get stuck or you may observe that it's doing something a little weird and you can just chat with it even while it's still working and then it will interrupt itself, absorb your message, rework the plan if it needs to, and then keep going. But you can just drop in a message anytime you want and it just keeps going incorporating your messages when they show up. But otherwise, if it runs into issues it'll just try to work through them. I wouldn't say it is drop in, honestly it does feel like coding intern if you had a a young person who, like, hasn't run into a lot of things before and needs that sort of basic coaching. It probably feels like that. It honestly is getting to that level. And with 1 more turn, it feels like it would be, like, likely competitive with a not elite, but, like, employable coder. Hey. We'll continue our interview in a moment after a word from our sponsors.

Flo Crivello (20:13) I I agree. And so, again, I think people are asleep. And no 1 is really realizing what is for sure happening, which is yeah, like AI code deals are like 100% happening in the next few years. And I was actually just having this conversation with a teammate and I'm realizing that sometimes there's this real psychological thing that's happening and I don't know what to call that logical fallacy, but it's like, suppose I came to you and I was like, Nathan, I'm gonna tell you 2 things. I'm gonna tell you, number 1, here's proof of very compelling evidence that someone's gonna come to your house and kill your family tomorrow. And number 2, a shark tornado. A shark tornado, Nathan, is going to come to your house. And there's this weird thing that happens psychologically, which is when you hear that, all of a sudden, you're like, oh, nonsense. Shark tornadoes, bullshit. Even though I just showed you evidence that even on its own is very, very telling that like someone's gonna come to your house tomorrow. So I think what's going on with the whole AI discourse, there's so many claims that are bundled into the same bundle because realistically, we'll basically speed running a 100 of history, if not more. We're gonna speed run them in the 5 to 10 years. And so lots of shit's gonna happen. And you've got folks like the Eliezer Yudkowsky who are, like, making outlandish claims to some. I happen to find them very credible, but that the whole sci fi thing is going to compound matter and it's going to take over the universe and all of that stuff. And so people are okay. And then I think there's like very straightforward claims that are also being made about, look, the AI engineer is coming. Right? We are going to automate all knowledge work, period. That's coming in the next 5 or 10 years. So we're gonna have to deal with that. And even when you talk about, like, the risks, so, like, this is merely disruptive, but heavily disruptive. Hey. We're automating all knowledge work. FYI, this is happening. No no practitioner disputes that. Period. Right? I find it insane that, like, we've moved on from the database. Oh, yeah. Of course. Let's not talk about that. Let's talk about the sci fi stuff. It's, hey. FYI, people don't know that. People don't know that thing that we all agree about and don't even talk about anymore because it's just so boring. We're automating all knowledge work in the next 5 or 10 years. Okay. And then also, as far as the risks are concerned, you don't have to worry about the sci fi misalignment risks to be deadly worried. I think there's 2 broad categories of risks. There's like the AGI goes rogue, and I understand why people are skeptical about this. I disagree, but I understand. But then there is another form of risk that I understand a lot less that you could disagree about, which is the misuse kind of risk. I do not understand how there can be any doubt whatsoever that people are gonna do evil shit with this extremely powerful technology that we are opening up to everyone and gleefully open sourcing. I don't understand the case. And I don't think there is a case period. I think it is extremely telling that the majority of the case that's being made about this is not the case at all. It's just slogans. It's on my dead body. You will take my Second amendment, first amendment. Can we just stop on the objective all about what we're talking about here? Right? I believe it's Dario Amodei from Anthropic who made this excellent point. He said, we've been benefiting from an invisible protection into the world today because there is a set of evil psychopaths out there. That's just a fact. That's indisputable. We want to do evil shit and kill people. Okay? But then the overlap between the evil psychopaths and the capable people is very small because as you become capable, as you educate yourself, 2 things happen. 1 is you become socialized. Right? And so you don't wanna kill people. But also your opportunity costs rises because now you can do things with that education. You can become a trader or software engineer, an entrepreneur, an investor, whatever. And so you can make money and so you've got better things to do with your times and go around and kill people. And when there is an exception to that and when someone very educated goes rogue, it's notable. It's like the Unabomber things. People remember it for, like, decades now. Like, this is a freak thing that just happened. But I think that that protection is disappearing. I always talk about the upside of that is, oh, look. We're going to empower everyone with AGI. A 17 year old is going to have the same go to market power as the Coca Cola company because he's going to have 10,000 employees in in his bedroom. Right? Just MrBeast built a media empire from his bedroom. Same thing. You're gonna be able to build an industrial empire as a 17 year old in your bedroom. Right? That's awesome. The corollary of that, though, is that we all just giving capabilities to everyone. And so some people are going to do evil shit with this capability. It's just obvious. And so I I think really, like, the the 2 most obvious classes of risks here are going to be bio and cyber. And so, for example, the cyber thing is, look, our entire civilization relies on computer systems that we barely understand, by the way, at this point. And many of which arguably is the most important of which are extremely integrated. Right? So our banking system, our grid, all of that stuff relies on the Internet and information systems. We are about to give a superhuman cyber hacker to everyone. And again, I think 1 class of risk that that concerns me, it's not existential, but it's bad. It's yeah. You wake up 1 morning, the grid is fucked. Okay? So you have no telecommunication. So total fog of war, by the way, right here. Right? Because there's no TV. There's no what are you gonna do? You're gonna turn on the radio. Do you have a radio? Right? So total fog of war, no communication, no banking, no electricity. Right? Cities are entirely dependent upon transportation networks, which also shut down. People starve to death. Right? And by the time people fix the grid and the banking systems and all of that stuff and hack it, the hackers bring it back down. And the hackers could be pick your flavor of AGI, like, depending on, like, how much real fact. Like, it could be like a script kiddie or, the most extreme side of the spectrum. It could be China. It could be North Korea. Could be Russia. It could be whoever. Right? Or like just like an organized group with a few a few $100,000. It could be Al Qaeda. I don't know. And by the time you fix the grid, they fax it again. Right? And how long does that last? A month? 2 months? 3 months? People always reply, oh, but AGI can protect us against AGI. We could also have AGI cyber defenders to which we talked 2 things. 1 is that a technology does not necessarily symmetrically increase defense at the same pace at which it increases offense. That's the first thing. The second thing is that even if it was symmetrically increasingly increasing defense and and offense, even if indeed it was increasing, it was improving defense faster than offense, I think these institutions are going to adopt this new technology much slower than the hackers will. Who do you think will adopt AGI first between Bank of America and PG\&E and the hackers in Russia? Right? Of course, it's going to be that as the hackers in Russia, but maybe by as many as a few years. Right? And so that delta is going to be super dangerous and super scary. And I've just not heard a single quality rebuttal to this simple reasoning.

Nathan Labenz (27:31) Yeah. There was just a new paper out in the last day or 2 from a former former Cognitive Revolution guest, Daniel Kang, who the the new paper is that they've got a little team of LLM agents finding and exploiting day 0 or 0 day exploits. And I haven't quite got deep enough into that to understand exactly what the dataset is that they're creating because, obviously, that's a a tricky 1 as this stuff gets known, then it gets on the the web and whatnot. But presumably using cutoff date as a safe way to delineate what was known at the time and what's new. They're finding novel things that are not in the training data and exploiting them. But, yeah, that's a little it does feel like a bit of an unstable situation. So I was saying earlier that I have been like, man, I really don't wanna see us get into a an arms race AI arms race with China. If both states believe that there is the prospect for decisive advantage in some slim lead in AI technology, then it seems they probably will race for it. I don't it'd be great if we could get them to get the 2 governments to come together and have some sort of preemptive arms control or whatever, but, obviously, that's tough. So that could be maybe the next strategy to pursue. But if they do believe that there's gonna be this decisive advantage, then it seems like they probably will race for it. So I'm like, okay. Can I come up with a reason to believe that there's not gonna be or it's at least not obvious that there would be some sort of decisive advantage? And so I asked myself, what's the story whereby either we don't get to a, like, very high end human knowledge drop in knowledge worker by the late 20 twenties or conditional on getting there, we stall out before some kind of intelligence explosion to move toward something genuinely very superhuman. And honestly, I don't have great answers for either. When I think about the AGI case, it really just seems to me like you think about all the kind of current weaknesses and it seems like there's pretty clear path to un hobble on a a few different key dimensions. And then you imagine, okay, now we're there. Is there any way that we'd stay there and don't get into a a superintelligence type dynamic? And certainly the uncertainty increases, but when I look at all these other neural network tools that have been created recently whether it's AlphaFold 3 or 1 that I just saw the other day that is learning from quantum mechanical simulations how to model particle level dynamics at like orders of magnitude faster than the raw simulations can happen and is starting to show this emergent behavior this incredible graphic the other day from somebody studying ions in solution and they trained it on just this solution dataset but then observed crystallization predicted by their trained model. So they're seeing these phase changes as emergent properties coming out of this learning on pure simulation data and all that stuff's being developed in parallel. Right? We've got the bio ones happening at the same times as the material science ones, as the Go players and whatever else, and it's, damn. At the time when there is a drop in high end knowledge worker, there will also be incredibly powerful tools that we're just starting to see the beginning of, but that will presumably, like, those loops will be pretty well established, right, to say, oh, run this sort of simulation, see what happens. Run that 10,000 times, by the way, see what happens. Now we're also seeing the development of automated labs, Cloud Labs, Emerald Cloud Lab and similar things where via APIs, you can actually run physical experiments to verify the the sort of candidate fear hypotheses that have come out of the simulation. And it just feels, man, those things will be able to run pretty fast. They're gonna be pretty parallelizable. And even if the thing isn't, like, beyond Einstein, it still feels like it's gonna have tools and the ability to use this, like, insane array of tools. And that's even assuming that those things don't merge. I could also easily imagine in the quest to scale up to something super powerful that biodata and quantum mechanical simulation data just gets folded into the core dataset. And then it's all and it's like p of text and pixels and audio and quantum mechanical data and biological sequence data and proteomics and everything else. But even if they don't all get merged, seems like the the tools are gonna be so powerful that it is hard for me to imagine that it stalls out at that point. And I've been wrestling with this for the last couple days trying to come up with if I was gonna try to talk somebody down from this notion that they should be concerned that there might be some decisive advantage to be had, I'm like, I honestly can't put together a great argument for it. Could you put together what's what would be your if it doesn't happen, if we don't get any AGI and if we don't get and or if we don't get superintelligence relatively quickly after some early AGI, could you put together a coherent story for how that might not happen?

Flo Crivello (33:10) I'm sure I could. I think probably I think the data headwinds would be my first reflex here. This is where I would jump because we truly are going to run out of data. I think it's tractable, but I could be wrong. But I also think it's the wrong question to ask. I think this is not the question that we ask when it comes to any other risk. Like, when it comes to risks, for example, about climate or about nuclear war, about pandemic preparation, we're not asking like, hey, we really need to get in a room here and think about everything and establish beyond the shadow of a doubt that this is for sure going to happen. It's no, hey, we've got a bunch of really smart experts around the room and there's like an emerging consensus. So climate is like a consensus and it's look, something's happening that's not fun, and so we probably should invest something and be prepared, like, just from a pure EV standpoint. It's, look, you have some likelihood of something really bad happening. It's worth growing, like, a few billion bucks at this. I think we're, like, way way beyond that in AI. I think the experts all saying there is a consensus. If you look at like the top sighted experts, the top 3 or 5, every single 1 of them is really ringing the alarm bell right now and being like, hey, this is really concerning. Right? There is a consensus, which wasn't always the case. I will note that it's interesting that the AI risk skeptics, or deniers, I should say, the AI risk deniers used to hide behind the lack of a consensus in the research community about AI risk, right? They were like, look, I don't know what it is because I'm not an AI researcher, you're not an AI researcher, but when you talk to the AI researchers, they said 8 years ago, they say there's nothing to worry about. I'll just find it interesting, that's completely changed since then, and these people haven't updated the slightest. They just don't say, they just don't mention the AI experts, So look, at this point, it's very clear, right? It's like the trend line is up into the right, smooth, not flattening for many orders of magnitude, and the experts are all in alignment. I think at this point, we need to start treating the deniers the same way we treat the climate change deniers, which is politely ignored. This hey, which, by the way, climate change, really don't wanna get into this thing, but sure, it'll it also politics gets in the way, and it gets co opted by all sorts of political agendas, that's pretty shitty and dangerous. There's no race, there's no doubt, though, that climate change is happening and is bad and is worth investing deeply into. That's it. So you ignore the folks who say, No, actually it's not true. This is we need to do. We need to know these people. We need to move the conversation onto, Okay, what now? There is a consensus amongst experts. It's very clear what's happening. What now? And sort of policy conversation that's happening in California, for example, the bill that's being introduced, whatever you think of the specifics of the bill, I think that's exactly the kind of conversation that needs to happen right now, and we need to start talking about AGI preparedness right now. Because if you don't treat the problem now, sooner or later you're going to have to worry about it, And it's much better to be curtailed and have to be taken by surprise 5 years from now.

Nathan Labenz (36:19) Yeah, I agree with you that my question was an inversion of the right question, at least on the first analysis of, is this something we should be worried about? Is this something we should be doing something about? I always say, in response to the p doom question, I always say something like 10 to 90% or 5 to 95%. And the key point there is that on both ends, there's enough to be

Flo Crivello (36:45) That's right.

Nathan Labenz (36:46) Worth fighting for. Right? It's if it's only 10 percent or if it's even if it's only 5 percent, like, that's a big enough problem to motivate me. And on the other end, if it's only 10 percent chance of survival, then that seems like it's enough to try to achieve. The next the sort of I'm playing out this game I don't know if it's game theory or just going down this this, like, branching scenario analysis and thinking. The node that I was at on that question, is there any way that there's a is there any credible case that we should be thinking about that we don't end up in this race for decisive advantage? And I can't come up with a good 1, but I do think that the utility of that if there were something would be like, hey. We don't need to be racing China to create the trillion dollar cluster, and maybe we can take a little bit more chill attitude toward scaling. I think the the trillion dollar cluster to me feels like an upper bound on it's likely to take. There's just not it seems like that does not assume that there's anything clever happening between now and then. That's just like a raw just take what we have and just keep doing it more and more. And that probably will work for a while yet, certainly when you think about all these different data modalities and whatnot. But my best guess is that there'll be plenty of efficiency things and plenty of ways to break this out across data centers. As soon as the if there is a scenario where it's like, hey, we need 1000000000000 dollars worth of data centers for security reasons alone, you would probably wanna diversify that location wise away from 1 single site. So there's gonna be incredible incentive to both just reduce the resource requirements and also to to figure out how to not have such a concentrated physical capital plant required to to do this all in 1 place. I just don't like it because there's been, like, a few different big worries of the AI safety community over time and runaway unfriendly AIs when the paper clip maximizer, the thing that doesn't understand our values can't understand our values, that was like it remains, I think, not off the table, but that's not exactly looking like the AIs that we're getting today. But then the AI arms race with China was, like, another 1 that I've been hearing about for years, and it seems like despite everybody recognizing that would be a terrible scenario to enter into, the gravity of it is hard to fight. Right? I I don't know. It just seems like, man, I'm trying to come up with some way to not fall into that trap, and it seems really tough.

Flo Crivello (39:27) I I think we don't need China, though, to be in an AI arms race. I think even if you took China off the map, like, you you still have Google and Anthropic and OpenAI, and so these folks have been in an arms race for a long time now. But the thing is that if you don't play the arms race for geopolitical reasons, you're going to play it for economic reasons. It's just that the economic incentives are just too strong for corporations to resist. It's their job to pursue this kind of thing. It's just, look, even if it costs you $10,000,000,000,000 to automate all knowledge work, that's cheap. That's very cheap. To automate all knowledge work, are you kidding me? We'll find the money, no problem. Like, markets will love to to take that up. And I agree with you also that, look, if it is funny that the community seems like it's disagreeing about, hey, is the risk 10% or 90% or is it even 1% or 90%? And is the timeline It's funny because I remember before being as strongly AGI killed as I was having dinner with a friend, and he was like, I was like, I'm really worried about AGI. And he was like, oh my god, how worried are And you? I guess he was more up to date than I was about the timelines. And I was like, he was like, What's your timeline? And I was like, I don't know, like 15, 20 years. He's like, he's like, Dude, people are freaking out right now. They're saying 5 years. And now that I'm more up to date, am also saying 5 years. And but they just found his reactions so strange. They're like, man, like 15 or 20 years is nothing. So I really think there's this weird anchoring effect or framing effect that's going on. It's like the shark tornado thing that I was mentioning where it's like, because people are saying 5 years and that sounds so outlandishly incredible to these people, they're totally dismissing the fact that they would agree about that it is maybe 15 or 20 years away, which is tomorrow. It's coming so quickly, right? We all today as a civilization spending plenty of money on risks that are not supposed to materialize before 15 or 20 years, like corporations on a routine basis plan on that kind of timeframe. So even if you just think it's 15 or 20 years from now, which to me is an upper bound, we need to be planning for this. And I I view way too little planning and way too little conversation about this topic.

Nathan Labenz (41:36) Hey. We'll continue our interview in a moment after a word from our sponsors. So I do think we took China off the

Flo Crivello (41:43) game

Nathan Labenz (41:43) board. Things might look significantly better. Maybe you'll talk me out of it. But my thinking there is that you look at how many live players are there today. Right? There's 3 to 5 maybe, 3 to 7. And, like, for the most part, aside from the Chinese dimension, they all know each other. A lot of them, like, live very close by to 1 another. A lot of them have, like, directly worked together in the past. Even, like, the Mistral team is, like, largely ex DeepMind, I believe. So there is a collegiality there and there's also the sense that I I would if there was no external force, I would think that the US government might be a lot more likely to create some governing environment that could slow things down. And that could look a lot of different ways, but my concern is that the government will move from a moderating in in a scenario where there's no China, I think the government would at least have some chance of playing a moderating force. And if the government instead is obsessed as it seems to be with competing with China in every dimension, then it seems like the Leopold scenario of national security, state getting involved and this sort of likely militarization of technology as like the top priority and the race for decisive advantage, the idea that we're gonna get just far enough ahead that we'll be able to solve alignment in months and then offer China a deal, that seems like that probably is what happens if there's not any if there's no deal with China and the specter of China just continues to be what it is. In the absence of that, though, it does seem like it could be cooler heads could prevail, but seemingly much more likely to me.

Flo Crivello (43:38) But I feel like China also has private companies. I think that the government in this whole thing is coercing fools. It's not the driving fools. It's not going to be the 1 doing the AI research. Right? And so Chinese private companies are going to participate into the arms race just the same as the American companies are. Right? And minus the code GMD that you just mentioned. And the role that the SEC government plays in this whole thing is once we have AGI, at some point the DOD is like, oh fuck, okay, like this thing is really powerful. And also Baidu's also got it or Tencent or whatever it is maybe, We need to do something about this and we need to, like, ask the nerds to please give us the keys ASAP before before shit gets out of hand. That's how I I conceptualize the role of the government here.

Nathan Labenz (44:23) So what do you think can and should be done today? We you meant you alluded to a policy like an s b 10 47. I've had a couple episodes on that. If anybody is not up to speed on it, they can go back and check those out. It's just been amended, and the amendments probably seem to have been considered to be positive updates. It's pretty much across the board, although I haven't seen a lot of people actually going from not supporting to supporting. It seems like the among those that were not inclined to support it previously, the response has largely been, this is a positive update, but I still don't support it. I do find that to be weird in as much as it really doesn't feel like there is a huge ask there. You know, it's like trying to make sure your models aren't gonna cause mass casualties or hundreds of millions of dollars in damage. And if you can't be reasonably confident about that, then you can only you should deploy them with safeguards, which would mean not open sourcing. But again, it's like, today you can be reasonably confident about that. We're talking about a a future scenario where you no longer can be, which seems quite plausible if there's not like research breakthroughs. So we could put policies like that in place, we could double down on interpretability and hope that that the sparse autoencoders will figure it all out for us and we can find our way to alignment somewhere that way. We could imagine I recently saw a pretty interesting technology that I might do a whole episode on called SOAPON. Actually, Zvi tipped me off to this in my last conversation with him where this notably out of China is a technique that they call fine tuning suppression where for a domain that they call take a and this doesn't yet work on language models. They did it on image generation models for starters, but they created a restricted domain versus the open normal domain. And within the restricted domain, they did a fine tuning technique to basically create like a denial of service type of response where in the denoising step of an image generator it would just guess 0 and basically not change the image if it's in the restricted domain and then outside of the restricted domain they had this kind of normal training reinforcement that would make sure that the standard behavior was still accessible. What this does is it creates a local minima that's hard to get out of. And so they they call it fine tuning suppression because even when you come in and fine tune it, it still doesn't really work. There's no gradient. They they construct the loss function of the fine tuning suppression in such a way that the gradient is basically 0. It converges. And so there is no gradient at that point. So when you try to go fine tune it, there's and pursue like your usual gradient descent technique. The gradient starts at 0. You can't really get out of it very easily. And their their goal there was to create something that is harder to fine tune to get those capabilities than it is to just train from scratch to get those capabilities. So that was a really interesting and novel thing. But overall, it just feels an ad. We've got, like, a few things here and there, but I don't I don't know. Maybe it's all of the above. What do you think we should be doing now? What should we be advocating for now? Should we be chaining ourselves to the fence at OpenAI at this point, or do we we wait 1 more wait for GPT-five before we do that?

Flo Crivello (47:53) If there was a chain if if there was a gate that would chain myself to right now, it'd be METAS. I think, like, step 1 is we just stop open sourcing the models, like, period. It's just it's insane to open source them, and there is no benefit to it or very little. It's it's only not worth the cost because we don't know what these models have in them. It it takes a lot of work actually to figure that out. There is this paper that always love mentioning, Microsoft paper called a MedPrompt that finds that GPT-four can actually perform better than models that are specifically fine tuned on medical use cases. GPT-four was not fine tuned on these use cases, but GPT-four can beat these specialized models just with a specially engineered engineered prompt that it takes them a while to engineer this prompt. These models are just such huge objects that we have no idea what's in there. It's like a dense jungle and it takes us years to hack our way through the jungle and we still don't know what's inside GPT-4o, honestly. So we're open sourcing models and we have no idea what's in them, and we know that certain little tweaks can make them jump in capabilities tremendously. And so the cat may be out of the bag. You could reach a point where it's only too late. You've already killed yourself basically as a civilization, and you have released a model that is AGI capable out there in the wild or, like, super hacker capable out there in the wild, and you just don't know it yet. The way to all the out there, there's nothing you can do at this point, and you're just 6 months away from some guy figuring out how to prompt the model the right way. That's entirely possible. And there is no advantage in open sourcing these models. And by that, okay, so what are the typical advantages to open sourcing something? So it's people can audit it. So editing the code, aka fine tuning the model, you can offer that through an API. OpenAI offers fine tuning through an API. That's perfectly fine. So why don't you do that? Auditing the code in order to increase its security, I will say not only is that not possible, I will say the only thing that is possible is the exact opposite. So you cannot audit weights of a model, unfortunately there is a field of very active research, you can't look into the weights of a model and be like, look, there's a problem here. So you can't do that, so really there's no point in open sourcing. What you can do however, is whatever security measures were put in place during the training of the model, you train out the model unless the trainers just did this thing you just mentioned, which I hadn't filled about. But you can it's trivial today, at least, to remove the safeguards from a model. And I forgot what they call it. It's like an unshackled model or there's like a bunch of if you go to Hugging Face, there's like a wizard, uncensored. Looks like the uncensored families of models and hugging faces. It's trivial to remove these things. I just don't think there is a good reason to to open source this model. I think that's step 1. I think step 2 is we need a Manhattan project for alignment. We need to spend dozens of billions of dollars. Now I don't know how sensitive to spending alignment is. I suspect not very sensitive, but like, at this point, throw the kitchen sink as the problem and perhaps even mandate private companies for, hey, for every dollar that you spend on training, for every flop that you spend on training, you must spend a dollar and you must spend a flop on alignment. I'd start there. I don't expect any of that to happen because if the kind of California bill that you just mentioned cannot pass, then I don't expect these bills to pass. I foresee what what will need to happen is that there will need to be a catastrophe. There's gonna be like a 9 11 of AI. I just pray that it's not too bad, but I think after we've had that, then we I think then we'll get very serious about regulation.

Nathan Labenz (51:26) Yeah. I think that on your first point, and you can maybe square it with the second 1 a little bit, but I'd be interested to hear how you would do that. The 1 common argument from a purely safety focused standpoint, the Llama 2, Llama 3 so far has been good for interpretability research, it's been good for a lot of technique development, we've got not just Anthropic and OpenAI doing sparse auto auto encoders, but like research groups are getting into it and they can study circuits and they need at least somewhat advanced models to be able to do the because you can't study emergent properties before they emerge. Right? So you've gotta have at least somewhat advanced models. You couldn't do this stuff on GPT-two. It doesn't seem like for many of the most advanced things that you'd wanna study. So if you had a Manhattan project, I guess that would all just be done via structured access somehow. You'd imagine, like, the waits never leave secure servers, but somehow you create a a scheme by which academic researchers can do the things that they wanna do and maybe they just have to be vetted to some degree. If you're gonna try to there could be some sort of cryptographic, you know, way to do this sort of thing. But if you wanna both not let the weights out and allow people to do the, like, development of all these inspection, interpretability, editing techniques, that does seem like a still bit of a hard thing to square.

Flo Crivello (52:59) I think it's like the Manhattan Project. Right? It's like you had all these researchers who had access to a bunch of classified secrets doing very active work on them, and somehow it worked out.

Nathan Labenz (53:10) Sort of. In as much as they did manage to make the thing. Yeah. I do worry about the drift, the mission drift potential of something like that. The government gets involved and it's like, alright. We're racing to we gotta get ahead of China. We gotta stay ahead of China. We gotta create this decisive strategic advantage. Does that is that the right structure to get the sort of deliberative or deliberate science that we need to figure out how to make sure these things are actually gonna do what we want them to do? Maybe. I could definitely see it veering off in another direction.

Flo Crivello (53:43) Yeah. And and I I will say, by the way, that, like, I don't think these positions lately, like, who've known me for a long time know, like, I'm an authoritarian. Like, I hate the state. I hate regulation. This is the most evil institution we have out there. It will destroy everything it touches. So I'm and I, by the way, also my understanding is that of like, and MIRI and all of that stuff is like for the longest time, they were like, hey, let's find a way to solve alignment that does not entail asking about regulation. They did that for 10 or 15 years and they gave up. So, like, we can't do it. There is no we need regulation ASAP right now. Right? This is, the last result for a lot of people, including me. So I agree. I think the state is gonna get involved. Shit's gonna get ugly. Politics are gonna do what politics do. And I think it's worth it because the situation is that dire, I think.

Nathan Labenz (54:32) Yeah. I'm certainly with you in terms of being a very reluctant advocate for regulation or state intervention. It's bizarre to find myself even entertaining recommending that in because I certainly always have resisted it in the past. Wheeling this back in a little bit just to the present day, I wanna ask a couple questions about, just like the current state of play. What are you building right now? How are you thinking about the challenge of you only have access to the best models today, but you know that we've all heard this from Sam Altman and others. Right? You gotta be building for GPT-five. What is building for GPT-five look like for you? How do you conceptualize that?

Flo Crivello (55:15) Yeah. I am but a humble founder building B2B SaaS. What we're building? So, you know, in that Leopold book, he calls it basically unhobbling. So that that's what we're doing. Right? We're building the layers around the model. We're not building the model itself. And so we try to focus our work on things that are simultaneously useful and and commercially useful and and valuable over the short term and aligned with the march of AGI and these models. So we try to build cognitive architectures and layers around these models that will maintain their utility regardless of the model, right? So you can basically think of it as layers that bring you, that make GPT-three roughly as capable as GPT-four and GPT-four roughly as capable as GPT-five. And so regardless of the model you've got, these layers are going to give you a job, Right? And so what all these layers? These layers are like long term memory and rag or like retrieving information selectively depending on like the situation it's at. That continuous learning that's leveraging this model's ability to learn in context, right, and in a way that is sealed by the user. So for example, Linde, you can give it a task, and then as it performs these tasks, you can give it feedback or it can ask you for feedback when it's uncertain, and then it will keep getting better and better. And that's just going to be universally useful regardless of the model. Right? To use obviously through API, but also computer use. Right? And UI use and all of that stuff. I think of what we're doing as basically those like 3 buckets. Right? Like you can think of, does the agent And the agent's got an LLM at the center, but it's got these other layers, which I just mentioned, like the memory, the planning, the critic layer of the or the verification layer, the recursive gold decomposition. You can do all of these things outside of the model that's still in the core of the agent. So we do that number 1 thing. Send those the so you then you get that black box that's really useful, and you have inputs to that black box, and then you have outputs. Right? And so that's just the 3 things we're working on, which is the black box itself minus the LLM, the inputs, and the outputs. So the inputs are going to be, sure, text. Right? That's like the most straightforward 1. Images, audio, Lindy can join your meetings. You can listen into your meetings. Like, this is like a very useful kind of input. And then the outputs are going to be, again, tool use, which is like today, like, the golden variety is like API to use, it's like the most simple 1. Again, eventually giving the ability to not just sit in a Zoom meeting, but talk to you in a Zoom meeting, make phone calls, talk to you on the phone, use a computer, that's what we're working on probably speaking as a company. Yep.

Nathan Labenz (57:56) I find myself increasingly in in some ways, like the more I know, the more confused I am. I often go around calling myself an adoption accelerationist, hyperscaling pauser. And somebody recently challenged me as to whether or not that is even a coherent position. And I'm like, I think it is. It certainly seems like there's a lot of value from fine tuning, like I've definitely experienced that firsthand, and there definitely are significant unlocks that you get from this sort of scaffolding. There's somewhat of a different question there between on the 1 hand I'm like, I don't want to see us just rush into 6 more orders of scaling that seems dangerous And I also believe that like GPT-four is enough in some ways or GPT-five certain I think we do have like probably 1 more half turn to to turn before we get to something that is like generally smart enough to be that the core of that drop in knowledge worker in the next not too distant future. But then I think about things like the, I don't know if you saw the talk from somebody at OpenAI that has been working with Harvey on their custom model And they basically use GPT-four as a base and do some continued pre training. They didn't disclose what percentage of flops the continued pre training is, and then they do post training after that to try to get it dialed in on exactly how they want it to behave and they have massive preference for the custom model as opposed to GPT-four. And I infer from just the general vibe that like the incremental compute there is no more than 10%. It's not and probably significantly less than that. So there I'm like, okay. That does seem like we can get a lot more utility without rushing through orders of magnitude. And also I like the sound of that custom model because it sounds like it's really good at what it does but it's probably not very good at a lot of other things. It's probably not like it's probably worse than GPT-four at a lot of other things and you know that narrowness in some ways sounds really good. Right? If we had something that was just really good at handling legal briefs, that's a big thing. There's lots of value there. If we had a 100 of those things for a 100 different areas, you know, we could really create a ton of consumer surplus which I'd be super excited about. And yet I don't think those things would be, like, posing the sort of risk that I think we're both worried about with more orders of magnitude. So that's 1 kind of line of thought here. I'd be interested in your reaction to that. I'm still also wrestling with which of these things are not you know, listen to the Zuckerberg comments with Dwarkesh too and he's we have this sort of TikTok cycle where we're always building the scaffolding and kind of in doing that we're realizing what it is that we wanna build into the next generation of model and then we're doing that and then a lot of that scaffolding is no longer needed. So, like, how much of the scaffolding that you're building do you think actually goes away? How much of it stays relevant? It's fine to throw some of it away. Right? But you gotta have you gotta have enough that stays relevant. It it and that seems very tough to predict. So I gave you a run on prompt there, but you can give me your answer in as many parts as you like.

Flo Crivello (1:01:11) Yeah. We think about that a lot, like how much of what we're building is future proof. And I think probably a 100% of the sort of this horizon, maybe not literally a 100%, but 80%. And I insist upon like the 5 years time horizon here because look, suppose that you're building something right now that you can get for free 5 years from now, or rather what you're building now, you can't have it at all today. So like what you're building like goes from 0 to 1, but 5 years from now, you would have had it even if you hadn't built it today. That's the fact that you built it today gives you a 100 x cost and speed improvement. And so I'll give you like 1 very concrete example of that, which is our continuous learning system. The 1 I just mentioned where it's like, and you continuously learn from the feedback. The way it works, like it's RAND. Right? And so it's, you give feedback to Lindy and then before she performs any step, she looks into her feedback database and she's like, has this happened before? What kind of feedback did I receive? I'm gonna take it into account before doing it there. Right? It's unreasonably effective. It's pretty awesome. On paper, you could think that like this is made totally obsolete by infinite context windows. Right? Because you just have 1 agent, you talk to it all the time, it just keeps reusing and growing the simple text window. And so it doesn't need to do this whole ragged thing because it just like checks its context windows. I received back a few days ago, but it's very thing. I'm just gonna do it differently this time. But that's gonna cost you a fortune. And so I think that the death of Rag has been greatly exaggerated because it's literally a 100 times more expensive to to kill Rag via context windows. I I would think this is a sort of like gust difference that you can ignore. I I'm reminded of Android versus iOS. Like, Android, they built it on top of Java, which is a notoriously inefficient language because it's garbage collected. And so it's just very slow. It's just very memory inefficient. And I assume that people who built it were like, Oh, who cares? Moore's law, computers infinite, we'll just build it, ship it. And the reality of it is like for the longest time until very recently, the performance difference was enough to make Android super sluggish and iOS super fast. Right? It's like when you smooth on iOS, it's always been better smooth. And Android, it's not until very recently that it it is better better smooth as well. And so I think that this sort of literal 2 to 5 x difference in performance that it gave you, sure, it eventually became moot because it's like, ah, like, it's who cares about the difference between 1 millisecond and 5 milliseconds? But for the longest time, it wasn't 1 or 5, it was a 100 or 500. And people do care about the difference between 105 hundred. So I think it's going to be the same thing here. And so we're working on a lot of things that give you this sort of 2 weeks improvement at at least 2,000 improvement at most for the kind of, like, example system that this just mentioned. More like, we're we're comfortable with this being useful over the very long tail. I didn't quite understand your first question. Like, you're you're grappling with with what? This was your question.

Nathan Labenz (1:04:03) Just how much more mundane utility can we get? Oh. Is it you know, it's the it's the adoption acceleration hyperscaling pauser. Right. So is it coherent? So the challenge that I got back online from, no less than to my from Epoch was basically, like, more scale makes the models easier to use, that makes them more useful, so it's hard to does that position really have any sort of natural center? And so I've been trying to find 1 and I think fine tuning both does unlock a lot of value and it also does have a lot of, the same sort of utility that you're describing where it's like if you have to 10 shot the thing to make it work versus fine tune and then you can just do it on a 1 shot or, you know, 0 shot basis post fine tuning, then that also saves you a lot of time and money. Yeah. But, yeah, clearly, there's a lot of utility that's gonna come from continued scaling. I guess at some point, I expect to begin to advocate for a pause. I'm not my my GPT-4R red team final report was I think that you can and should deploy this, and it won't be it's not too risky to do so because the in the sweet spot where it's, like, powerful enough to be really useful, but not so powerful enough as to be likely to get out of control and cause real problems. And then, like, the correlator to that was, but I'm not sure how long that lasts and I it does not seem that you have any control systems in place or in development that I've seen that would suggest you have a solution to this problem on the horizon. That was before the super alignment team was even announced. Now we're in the post super alignment team era. And I think we're still in that sweet spot. I think we probably stay in that sweet spot for 1 more generation. And then after GPT-five, I maybe personally Yeah. While I continue to love what AI can do for me, I think I maybe start to advocate for let's dial this thing in for actual practical use cases. GPT-five should be enough to be our AI doctor. It should be enough to be our AI lawyer. It should be enough to be our AI many things. Let's bring that to reality and we probably don't need to continue to go to race through orders of magnitude to get those sort of gains. And so maybe we shouldn't. Maybe we should question why we would be doing that. And a race against China would be 1 reason, and I'd love to find some solution to that. But, yeah, that's my so I guess the question is do you think that there is a a coherent center to this adoption acceleration hyperscaling pause concept, or does it slip away?

Flo Crivello (1:06:46) I I think it's a game of brinksmanship. Right? Because it's I really get the lighter image here, which is imagine if laundry detergent could produce gold. And at some point and the more laundry detergent you pile up, the more gold it produces, the faster it produces it. And at some point, it blows up the entire earth. That's why it is. So you're very tempted to keep piling on laundry detergent, and you never know at what point it's going to explode. And I would share your bias. I think like 1 more generation is fine. I think we're like, I think I'm comfortable with 2 or 3 orders of magnitude. I wouldn't be comfortable with 6 or 10 orders of magnitude. I would dispute, however, that, like, there's no point adding more after that point. I think that's what makes it so dangerous. It's like it is always tempting to add more. Because, hey, cool. Now you've got your AI doctor, your AI lawyer. Hey, we could just add a little bit more and it cures cancer. And it solves global warming and solves biology for us. Right? It's like that means just 1 more order of magnitude, bro. I promise just 1 more order of magnitude, I'll be okay. Bro, just 1 and it's never gonna run out of temptation. I And think if anything, the closer we get to that point, the more the stronger the temptation will be. So those are the gears of the regulatory state also slowed, put themselves in motion because the next generation is coming regardless, like GPT-5s coming regardless. So like roughly now is the time to really have this conversation. So we have a chance to to be much more thoughtful about the generation after next.

Nathan Labenz (1:08:05) How do you feel about OpenAI's leadership today? Obviously, I would say there've been a lot of good things that they've done, although some of those have turned to, sand already, such as the Super Alignment team, where they've been on on a winding road with them and have gone from, oh my god, this seems like they have no idea what they're doing to, oh, actually, have a much better plan to now I'm again, oh, man. I don't know. The Yep. The culture there, the sort of exodus and repeated exoduses, the apparent something like a purge seems to be going on. It's a little bit of an inference. Not a crazy 1. All these bully tactics. And yet, I do think they're, like, still fairly enlightened. Sam Altman is engaged with all the the ideas that I would want him to engage with. I don't think he's ignorant of anything that I'm concerned about. So, yeah, what's your take on OpenAI?

Flo Crivello (1:08:59) I'm reminded of, like, my time at Uber, which is like we were, like, hyperscaling, and the chaos is impossible to describe. Like you can't imagine what it's like to be at a place that high yield literally about 800 people per month. OpenAI isn't there, but it's hiding in its own ways. And so I think people underestimate the chaos that happens when you're like they've been. It turns themselves from a little, like, quasi obscure research lab into a corporation generating billions of dollars of revenue in the span of 18 months. Of course, Sam Altman is a ruthless businessman. Of course, he's going to to to to play hardball and and and have a bunch of, like, ruthless tactics. I'd be surprised if he wasn't. To get into this position and to build the kind of company he's got, yeah. It's like when Travis people are like, oh my god. He's so aggressive. What do you think? Of course, it's business. Welcome to. And some of that I'm like, oh, it's just business. Some of that I'm like, it's chaos, the chaos of hyperscaling. Like when when the whole Susan Fowler incident broke out at Uber, I was like, what do you think? We're like, it's only 5,000 people. Of course, it's gonna be like a bad apple. I'm sure Travis didn't even know. None of these people's names. Never heard of it until now. And now all of a sudden he's like the devil for doing all of that stuff. Even the super alignment stuff, think I'm sure they meant it when they said they would invest 20% of compute, otherwise they wouldn't have announced it. I just think then chaos got in the way that it's just business, especially when you scale that fast. So I think that's what's going on. And I just think that it's it shows that you need a forcing function for these businesses to to they're not gonna self regulate. They can't self regulate. They're they're they're incapable of self regulating. So you need to have an external forcing function for them to regulate. And by the way, I think SAMA has been welcoming, inviting that regulation, if anything. So that that's my read on the whole OpenAI situation. Yeah. Those for sure that right now they're going through like a a negative part of their press cycle. I think that's natural. I think it will pass. Just every company goes through that, and it'll pass. But, again, it just highlights the need of, like, external regulation.

Nathan Labenz (1:11:04) I feel like I can definitely tell myself the story around, like, the contractual terms. It doesn't it didn't shock me that there would be a nondispairishment clause in a severance agreement even though it was funny because then he came out and said, this is 1 of the few times I've been genuinely embarrassed. And I was like, don't really think you didn't know about it. And I'm that it almost seems like a overstated apology in in some ways to me because I hear you. Like, the things are going they're going fast. They're going hard. They got a lot of intellectual property. Obviously, they're trying to protect. They have an argument, which I don't think is, a bad 1 that leaks are potentially not in the long term public interest just in as much as they tip other people off to what is possible. And yet I'm still like, okay. But the safety team all leaving, that does seem concerning. The regulation could maybe step in and solve that problem.

Flo Crivello (1:12:05) Yeah. I don't think the safety team was purged. Perhaps, Ilya was because of the coup. Right? But I think they left. I think they left because OpenAI didn't make do on their promise, and I don't think that was even intentioned. I I obviously think OpenAI's leadership is worried about safety. I just think when you're in that kind of games that they're in, it's just it's too hard. You can't they're they're acting like, the the organization is acting rationally in dedicating their flops to capability and not safety.

Nathan Labenz (1:12:36) That's tough. Quick question on the sort of future of live players or the you could frame it another way as like the fate of the second tier of foundation model developers. Who do you think is gonna be in a position that really matters over the next couple of years? Do you think it's just gonna be, like, a handful of sort of big tech companies? Do you think there's a place for bigger, like, Rekas and your Together and Cartesias and Adept and whatever Imad's up to next after stability, are those things gonna matter, or do you think that the cost of the the the price the anti to play in these scaling wars is just gonna be too high for any of them to make it?

Flo Crivello (1:13:20) Yeah. I think at this point, it's the incumbents all set. I think it's Anthropic, OpenAI, Google, Meta. Mistral, I think, came very close. I don't put them in the same category as these players. Neither, think, do do they put themselves in that category. The 1 that surprised me recently was, like, the Mamba folks. Like, these guys, could see I could see better, but I also think, like, the other players I just mentioned can replicate that kind of research, and I am certain have a bunch of models and training, like, looking into this. Yeah. I I think at this point, the the the battlefield is is set.

Nathan Labenz (1:13:54) That's my expectation as well. Do you think that in the sort of next few years, do you think that the big tech companies are going to metastasize and take over everything that I've been teasing around this concept of the big tech singularity. And I see things like what just came out from Google about optimizing shipping. They have some new shipping planning API that they say can double the profits of a container shipping company delivering, like, however many percent more containers with however many percent fewer boats. And I'm like, man, if I'm a shipping company, am I happy about this? Am I not happy about this? It seems like the the the big tech singularity idea is just that anything that they turn their attention to, they're gonna have the data, the compute, and the research advantage to figure out how to AI ify it and dominate it if they want to. And then the the counterargument is there's too much friction, yada yada yada. What's your intuition on that?

Flo Crivello (1:14:50) Yeah. Well, I don't think I I think big tech is very deliberate about where they expand. They're very good at that part of their strategy. And it makes perfect sense for Google to spend as much as they are on AI. From the beginning, it's made a lot of sense because search is basically an AI problem. Always has been. Right? Android, like the reason why Google did Android is it sounds insane, but like they literally built an operating system just so that they could be the default search engine on it. That's basically there's more to it than that strategically, but that's basically the strategy of Android. Right? There's no chance that Google turns themselves into a logistics company. They don't have a bone of that in their body. And yeah, I think they're gonna offer that as an API. I wouldn't be worried if I was like the logistics companies about Google entering logistics. By the way, like if they will to ever until that game, like Google Fiber was the time when they would have done that and they didn't work out, right? And they pulled out. Yeah, I don't expect that to happen. But yeah, I do view it more as an example of the speed at which OpenAI is penetrating every little nook and cranny of the economy.

Nathan Labenz (1:15:51) I wonder if they can even just do it by API though. Going back a few years to when I was doing a lot of Facebook advertising, which I am grateful now that I'm doing much more interesting things, but the there was this general phenomenon of if you wanted to reach a new audience, Facebook was the way to do it. And they were so good and everybody was doing it, and they were so good at pitting you against your competitors that they seem to be, like, taking everybody's margin. But if you wanted to go advertise a bicycle on Facebook or whatever, you they would they'd somehow make your cost of acquisition just less than your profit margin. And everybody was like, I can sell a lot of shit on Facebook, but I can't make any margin doing it. And it does suggest that if Google can offer this API that doubles your profits

Flo Crivello (1:16:40) Yeah. Yeah.

Nathan Labenz (1:16:40) Yeah. And then then it would stand to reason that they would be able to price that at, like, current profits. Do they then get half of the the resulting profits out of the industry? And do they have everybody over a barrel where it's if you cross Google, they can, in theory, cut you off from their, like, profit doubling API. Again, they would take some political hits for that or

Flo Crivello (1:17:03) they could be, you know,

Nathan Labenz (1:17:03) accused of being anticompetitive, but you can get a long way just on the sort of implied threat of Yeah. Those sorts of things. Right?

Flo Crivello (1:17:11) I think if a player gains a monopoly over 1 of your required inputs, they can basically train all of the margin out of you and turns you into their self. Right? But I don't think Google is going to gain the monopoly over AI. Right? I think in the case of Facebook, the network effects are such that you can actually be in a very dominant position. But AI, you can't really. There's gonna be a bunch of players and by the way, they hate each other. And so that that's good for us, those customers, because those are price pool raging, which leads to these models, like, declining in price as rapidly as they are. So, yeah, I wouldn't worry about that. I don't worry about it myself, but it's like I used to. I was like, I have a required input and OpenAI is really the only 1 that's giving me this required input. I might just be going to become like an OpenAI self. And now I have GP Gemini and I have honestly Llama and I have Claude, and and Mistral is in the running. So I'm I'm I'm comfortable with that.

Nathan Labenz (1:18:05) Yeah. If you're a shipper, you're hoping that Microsoft has a similar project launching real soon because then you could imagine, yeah, the it's funny how that that dynamic is really fascinating. If there's 1 shipping API that doubles profits, it takes Okay. Half of the industry profits. If there's 2, then they probably race down to the bottom and the shippers get more of the benefit.

Flo Crivello (1:18:27) It's a fascinating start that I think Flexport is gonna do it, obviously. Yeah.

Nathan Labenz (1:18:32) Okay. Cool. Any other thoughts before we break? This has been, great, and I appreciate, the opportunity to catch up and work through some of this stuff with you. Anything else top of mind?

Flo Crivello (1:18:42) That's all. AGI is coming, and it is time to freak out. That's my message.

Nathan Labenz (1:18:47) Yeah. Let's hope against hope that we can somehow avoid the AGI or ASI arms race, but does seem to be a natural attractor. Flo Crivello, CEO of Lindy, feeler of AGI. Again for being part of the Cognitive Revolution.

Flo Crivello (1:19:02) Thank you so much.

Nathan Labenz (1:19:04) It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcr@turpentine.co, or you can DM me on the social media platform of your choice.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.