In this episode of The Cognitive Revolution, Nathan interviews Emad Mostaque, former Founder and CEO of Stability AI and Founder of The Intelligent Internet.
Watch Episode Here
Read Episode Description
In this episode of The Cognitive Revolution, Nathan interviews Emad Mostaque, former Founder and CEO of Stability AI and Founder of The Intelligent Internet. We explore humanity's future with AI, from the stark 50-50 survival odds to Emad's optimistic vision for universal basic intelligence. Join us for a fascinating discussion about open-source AI infrastructure, the three-tier system of the Intelligent Internet, and how blockchain technology might help fund global public goods in AI development.
Check out Emad's publications on:
Emad's Twitter: https://x.com/emostaque
Emad's Blog: https://emad.posthaven.com/
Intelligent Internet Substack: https://intelligentinternet.su...
The Cognitive Revolution Ask Me Anything and Listener Survey: https://docs.google.com/forms/...
SPONSORS:
GiveWell: GiveWell has spent over 17 years researching global health and philanthropy to identify the highest-impact giving opportunities. Over 125,000 donors have contributed more than $2 billion, saving over 200,000 lives through evidence-backed recommendations. First-time donors can have their contributions matched up to $100 before year-end. Visit https://GiveWell.org, select podcast, and enter Cognitive Revolution at checkout to make a difference today.
SelectQuote: Finding the right life insurance shouldn't be another task you put off. SelectQuote compares top-rated policies to get you the best coverage at the right price. Even in our AI-driven world, protecting your family's future remains essential. Get your personalized quote at https://selectquote.com/cognit...
Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers13. OCI powers industry leaders with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before December 31, 2024 at https://oracle.com/cognitive
80,000 Hours: 80,000 Hours is dedicated to helping you find a fulfilling career that makes a difference. With nearly a decade of research, they offer in-depth material on AI risks, AI policy, and AI safety research. Explore their articles, career reviews, and a podcast featuring experts like Anthropic CEO Dario Amodei. Everything is free, including their Career Guide. Visit https://80000hours.org/cogniti... to start making a meaningful impact today.
CHAPTERS:
(00:00:00) Teaser
(00:00:36) About the Episode
(00:04:33) Intro
(00:09:15) AI Risk
(00:16:58) Sponsors: GiveWell | SelectQuote
(00:19:48) AI Goals
(00:23:52) AI Divergence
(00:27:50) AI & Agency
(00:32:18) Sponsors: Oracle Cloud Infrastructure (OCI) | 80,000 Hours
(00:34:57) Kids & AI
(00:39:50) Intelligent Internet
(00:48:37) Open vs. Closed AI
(00:53:30) AI Runaway
(01:01:46) Building the Future
(01:05:43) Energy & AI
(01:15:19) Hypernodes
(01:27:36) Proof of Beneficial Compute
(01:38:28) Distributed Compute
(01:45:10) Intelligent Internet Company
(01:48:37) Finding Talent
(01:55:33) Pause Letter
(01:59:50) Regulation
(02:04:04) Speed Limits
(02:06:42) Data Filtering
(02:10:54) Outro
SOCIAL LINKS:
Website: https://www.cognitiverevolutio...
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://www.linkedin.com/in/na...
Youtube: https://www.youtube.com/@Cogni...
Apple: https://podcasts.apple.com/de/...
Spotify: https://open.spotify.com/show/...
Full Transcript
Emad Mostaque: 0:00 We're at this point whereby if we don't have AI to help us, I think we're in a very bad scenario. With AI, if we don't build it right, we're also in a bad scenario. If AI is built right and is aligned and works with us, then we're in a very good scenario. Who owns that AI that is your best friend, that is your kid's best friend? And how is it constructed? And do we have visibility into that? We're gonna see economic disruption because the labor theory of productivity is about to be challenged. The world becomes a better place, ironically, through empathy from AI, and that's infrastructure that should be available to everyone. This is a concept we call universal basic AI.
Nathan Labenz: 0:36 Hello, and welcome back to the cognitive revolution. Today, my guest is Ahmad Mostok, previously founder and CEO of Stability AI and now founder of the Intelligent Internet. I've been following Ahmad's work since before Stability released Stable Diffusion back in 2022. And looking back, I think it's fair to say that that release did more than any other aside from ChatGPT to kick start the current AI moment. Certainly, it comes to inspiring researchers and builders all over the world by demonstrating that open source public good AI projects can advance the state of the art across multiple domains, stability was for quite a while in a league of its own. Since then, stability of the company has had a famously bumpy road as it's tried to evolve from an open research lab to a sustainable for profit business. But Ahmad's ability to articulate a positive vision for humanity's AI future while simultaneously grappling with the unprecedented risks remains to me uniquely compelling. We begin today on the risk side with Ahmad's stark assessment that humanity faces roughly 50 50 odds of survival as AI systems become more capable and ubiquitous. Imagining a world in which intelligent robots are everywhere, creating acute risks from misuse, new kinds of hard to model systemic risks, and even the potential for surprising emergent behaviors from the AIs themselves. And also considering how the world's largest companies and most powerful nations are now racing to secure AI relevant resources and establish strategic advantages, perhaps surprisingly, Ahmad doesn't have much hope for traditional regulation or even a pause in AI development. Nevertheless, rather than wallowing in pea doom, Ahmad remains in the arena. Today, he argues that we already have AI models powerful enough to dramatically improve most human lives if we can make them accessible, trustworthy, and aligned with local contexts and values. With that in mind, his new project, the Intelligent Internet, aims to create open, standardized AI infrastructure focused on critical regulated industries like health care and education. The vision involves 3 tiers working together. National or international hyper nodes that maintain bodies of common knowledge and perform large scale training runs, distributed nodes that perform domain specific fine tuning and provide related services, and finally, personal AI assistants that run locally on consumer devices. By making the entire stack, including the bulk of the training data, open source and transparent while preserving local control over the models that people use on a daily basis, Mahara aims to create AIs that enhance rather than displace human agency. Of course, training large language models can be expensive, and in general global public goods often require nontraditional funding mechanisms. So Ahmad is planning to launch a crypto project that will allow and hopefully incentivize people to support the project through a mechanism he's currently calling proof of beneficial compute. I've personally never been much into crypto, and I'm certainly not giving out any investment advice here. But I find the idea of universal basic intelligence, or UBAI as Ahmad calls it, so compelling that I do intend to buy a few tokens. Not out of hope for some future financial return, but just because I want to see this project succeed. If you're finding value in the show and wanna see us succeed, we always appreciate it when listeners share the show online, write reviews on Apple Podcasts or Spotify, or leave comments on YouTube. And we always welcome your feedback. You can DM me on your favorite social network anytime or visit cognitiverevolution.ai, where you can still submit questions for our upcoming AMA episode for at least a little bit longer. Now I hope you enjoyed this conversation, which I've been looking forward to since before I started this show with Ahmad Mostak, AI visionary and founder of the Intelligent Internet. Ahmad Mostak, previously founder and CEO of Stability AI, now founder of the Intelligent Internet. Welcome to the Cognitive Revolution.
Emad Mostaque: 4:41 Thanks for having me.
Nathan Labenz: 4:42 I'm excited for this. You were actually on my list of target guests from the very first episode of the podcast, which has been almost 2 years and 200 episodes now. So glad to finally be making this happen in-depth and a lot to cover. I thought we might start with a recent tweet that you put out that certainly got some people talking in which you said that your p doom is 50%. I would love to hear where you see that coming from, maybe a little bit about kind of how you see that unfolding. People often struggle to visualize the details or have any sort of concrete sense of how that might go down and what you see as kind of the drivers of that risk right now.
Emad Mostaque: 5:25 Yeah. I think the whole PDOM question is an interesting 1 because it's very tough to think what can wipe out humanity as it were. You know? What is the probability of doom from an expected to 30 calculation? The way that I kind of viewed it was basically AI said it was an indeterminate period of time. Some people say by 2030, by 2050, by '20 was like, on an indefinite period of time. It becomes very interesting because it's clear now that we have AIs that outperform humans on narrow tasks. And a whole bunch of them probably outperform humans on general tasks. And now they're being embodied with this leap forward in robotics that we're seeing across the world as well. So when I combine all those together, what I basically see is a road in which you have increasingly complex systems that are outside of our control, that can stabilize all the things that we do, and maybe p do without this technology is even higher, which we can get to in a second. And another road in which this is built in a very haphazard way, just like the Internet starting to break down now. And then there are some very clear scenarios where I see very reasonable path to doom as it were.
Nathan Labenz: 6:35 Yeah. There's a lot of scenarios under which, we don't fare so well. I guess how much of that do you think is attributable to, like, some emergent property of AIs versus some just total mistake where nobody was intending anything bad versus, like, misuse or abuse by, you know, a a human malevolent actor versus maybe other categories if you have them?
Emad Mostaque: 7:02 Well, I think what we're seeing right now is modularized AI and, you know, componentized AI. So you're seeing AIs at all different levels. And, I mean, with the advances in test time compute, you're seeing again, o 1 $200 versus GPT t 4 0, etcetera. You're starting to see that go. Like, why happens if you spend $200,000 a year on inference. Right? Is there any difference between a human bad actor and an AI bad actor that can then direct other humans and AIs? I don't really think there is. Right? It's just in the probability of millions of humans, millions of agents, how many bad actors are there, and are they in a position they can cause a cascade? That's the deliberate side of things. You know? And, again, it can be a whole variety of things. We've seen doom cultists. We've seen other things. In terms of emergent, the AI deciding, oh, you know, let's paper clip humanity or maybe a bit more aggressive than that. You know? Who knows? It's just, again, what we have is this combination of classical systems that are like slow, dumb AI and can be directed by orders from the top. These emergent systems now that are somewhere in between AI enhanced, and then you have full on AI systems end to end. And somewhere in that, there's a lot of room for breakage and a lot of room for damage. Again, p doom describes actual doom of humanity, which is something that's quite difficult in many ways, unless you've got a 10,000,000,000 robots. You know? And, again, what's the interconnective fiber that like to be? An upgraded version of the Internet, if that. If we make that secure, then we can eliminate doom scenario. Then you have the whole bioweapons and other stuff, and those require a lot more deliberateness versus accident. But again, like, we're not doing so well ourselves as humans in keeping things together. You can see things fracturing apart. And so my even though I say it might be 50, it's probably higher without robots, without AI than that. It's just, I think, that, again, if we break this down, we're at this point whereby if we don't have AI to help us, I think we're in a very bad scenario. With AI, if we don't build it right, we're also in a bad scenario. If AI is built right and is aligned and works with us, then we're in a very good scenario.
Nathan Labenz: 9:11 So I know you have a a big vision for that future, which I appreciate, and I I think that's where we're gonna spend most of our time today. But maybe just to motivate that a little bit more, certainly, I think the vision is, as we say in the AI space now, strikingly plausible that there would be a humanoid robot in every home. Certainly, no less than Elon Musk is talking about as many humanoid robots as we have humans running around. And, you know, the if the progress there is anything like it has been in language modeling over the last couple years or for that matter in image generation, video generation, then they will be, you know, quite capable quite soon. Is there anything within the current paradigm? Meaning, like, if if we were not going to totally rethink AI development as you also are daring to do. But if we were just gonna say, are there specific, like, local very bad decisions that you see people making that you think are, like, leverage points that are causing a lot more risk than we could have if we were making better decisions in those key places?
Emad Mostaque: 10:23 Yeah. I think there are some key areas that we're seeing maybe suboptimal decision making. I think first off, we're having privatized closed source models or even open weight closed data models emerging and being used widely. Like the most downloaded models on Hugging Face now are the Quinn models, which are fantastic performance. No one's got any idea what's inside that. Llama is the next 1, and we've seen from anthropic sleeper agent paper and others, you can poison these models reasonably quickly and in an undetectable and untunable way. No one's quite figured out what the defense against those attacks are. So we're rushing these into decision making and augmenting situations where it's probably not quite ready. We need to have more robust elements there. Similarly, we don't have real standards for robotics as robots start to get rolled out. Again, it'll take a few years for these things to go up. But just from an economic theory perspective, there'll be no reason just like there's no reason to hire call center workers anymore, honestly, not to have a robot. And if we don't have protocols and defenses there, that's gonna be very difficult. Away from the deep fakes and things like that, I'm not as concerned except for the key elements of audio and speech. Because I think the ability of speech to manipulate and control is massively underestimated. Like, if you've ever seen an amazing orator and some of the amazing speeches of history, you can see the impact of human voice. Like, if you've had a fishing call like I have from my mother or mother saying I'm in trouble with her exact voice pattern, you can see again the impact of that. So I think there probably used to be some regulation around speech just due to its ability to persuade, control, and customize as well. I said those are probably the areas that I'm most concerned about versus very large scale model emergence, race away behavior. But if there was something in race away behavior, I think probably less concerned about the labs versus maybe some of the decentralized stuff because that can actually go very quickly. Like, we've had 1 of the first agents from Hyperbolic that can provision its own compute now. I think we're moving to decentralized autonomous organisms effectively. Actually, there's probably 1 exception to that, which is that if you look at OpenAI, Anthropic, and others and the relationships with Andrew Earl and the defense industrial complex, that's probably not gonna create aligned AI. Like, many listeners may remember the slaughter box mini documentary about drones, swarms, basically, killing the bad guys. Anyone the government said was a bad guy in the end. Just look at Amderell's latest commercial. And then if you think about entire teams at the top AGI labs now being collapsed defense, you're probably gonna have models that are aligned to command, control, and more extreme things versus models that are aligned to being helpful, humane, and others. So I think that is, again, something that's somewhat inevitable, but potentially dangerous, particularly when they're used in defensive capability perspectives. Like, we're starting to see the first AI pen attacks and other things like that utilizing generative AI, and they're getting very good very quickly.
Nathan Labenz: 13:33 I definitely recommend the slaughterbots documentary for anyone who's having a hard time imagining how dystopian some of this stuff could get. And it's crazy to think about generative AI. I mean, obviously, we don't know a lot of the details about what these partnerships between the OpenAI and Anthropix and the defense contractors look like or exactly what they're working on. But it doesn't take too much imagination, especially in light of recent results on o 1 and other frontier models scheming against their human users to imagine that that could get really weird really quick. I mean, I kind of quipped, but it's, like, definitely more than a quip that if I was a soldier being asked to go into some sort of combat scenario with an AI system, At a minimum, I would wanna know that its scheming tendencies have been fully characterized and resolved before I'm out there as sort of the pawn between the I mean, there's so many forces at that point acting on you. Right? You got the enemy, but with, friends like a scheming, lethal, autonomous, semi autonomous AI, like, who needs enemies? It Yeah. Gets pretty weird pretty fast.
Emad Mostaque: 14:40 But even if we take a step back and we think about the first principles of this, all wars are based on the lie that humans aren't humans, and any reasonable intelligence can determine that, especially in o 1 level intelligence. So in order to get an intelligence to kill humans or to authorize that type of thing or to aid that, you basically have to teach it to lie. Because, again, just a step back would say that this is ridiculous. So you have more narrow AI, and you're lobotomizing it to not have higher order thoughts. So Elon talks about the maximally truth seeking. You're actually telling it not to seek truth and instead to optimize for casualty decrease on your side, max casualties on the other side, for example. The moment you start doing that with large scale and you have swarms of cyber attack agents and others, and you give them a level of autonomy, it gets very, very dangerous very, very quickly because the intent with which you build these is very important. It's like having military training camps. It's the curriculum that you teach the soldiers just like the curriculum that you teach the AIs. And the emergence of these at an industrial scale becomes something very dangerous, and it's part of a bigger trend whereby nations have this concept of capital stock in economics, right? Where we've got our factories, we've got our roads, we have the means of productivity, and this labor theory of productivity whereby you've got this chart of GDP per capita versus energy use. Like, you've got energy, and then it's just how much infrastructure do you have. From a defense perspective, it was how many f 30 fives did you have or how many submarines did you have? Soon it will be how many GPUs do you have to engage in information warfare, and then how many drones do you have and other things. And again, if we're putting hundreds of billions into that as defense budgets will go, and we're training AI that is not truth seeking and that is offensive against humans, that just feels like it could end up in some very bad places. Right? Because we don't really understand them yet. And again, it all will stem from this lie. So we have to embed that from the start. It's like the opposite of Asimov's, like, laws of robotics. Right?
Nathan Labenz: 16:59 Hey. We'll continue our interview in a moment after a word from our sponsors. I would love to hear your thoughts on what the right goals are, and this is maybe something we can flesh out in multiple tiers of AI development. But you seem to be in that you know, in your comments there, reasonably sympathetic to the idea of truth seeking AI is, like, a pretty good goal. If you talk to hardcore AI safetyists or doomers, you will hear arguments like, even that is not gonna save us because if you tell the AI go maximally, seek the truth, you end up perhaps with sufficient power on the AI side, which is obviously a big assumption. You end up with some still, like, paper clip like scenario where it's like, the best way for me to seek the, you know, fundamental truth of the universe is to quiet all the noise on the planet and build the biggest particle accelerator or whatever. Right? But it it's like even that at some, like, extreme push, you know, potentially becomes incompatible with humans. Alternatives we have are, like, basically virtue ethics training from Anthropic. And then we have just sort of blind, like, reinforcement learning from feedback, which doesn't seem like it's gonna end super well for us either. If you had to, like, stack rank or maybe just give your your favorite set of goals that you would want to embody into AI, what would you say those are, and and how satisfied are you with your current answer there?
Emad Mostaque: 18:25 Yeah. I've been thinking about this a lot to have my break. I think for me it's about agency and building AI that enhances human agency, community agency, and societal agency so that we can achieve what we want to achieve in a way that doesn't get in the way of others, like the golden rule comes as an element of that, which is common across all faiths. Do untie those as you would have done unto yourself. If we wanna specify it more directly, we can actually say it's about children's agency Because children don't have agency, and so a large amount of rights can be aligned to the rights of children. Climate, war, suffering, hunger. Like, children should be taken care of so they're allowed to enable their potential. And we should be maximizing the agency of the children that then knocks onto the adults. Because what you want again is for people as long as it doesn't get in the way of the rights of others to be enabled to do whatever they want to do and achieve whatever they want to achieve. And I wrote a piece how to think about AI wise, and this is the basic differential between the agentic world or autonomous world where we're replacing humans and the agency based world, where you're trying to enhance humans. I think Ethan Moloch had a great book along these lines though, and from a Christmas list and others. Like, are we building these systems to augment humanity's potential and individual agency, or are we building it to replace humans in order for productivity increases, etcetera? This again relates back to, I think, religion is very interesting because the stories that have survived over the years. When we come together in these stories of nations and politics and groups and others to scale. And again, like I said, at the core of those was do unto others as you were hand done to yourself. There isn't much of this in the current AI discourse. As you said, it's virtue ethics and kind of other things. When people talk about AI ethics, it's usually like, but whose ethics? The Chinese variant of ethics is very different from a California variant of ethics. It's very different from Muslim variant of ethics or Jewish version of ethics. And I think this is what, you know, Eric Schmidt and Henry Kissinger and others in the latest book of Genesis called Doxa or underlying human agreement on truth and other things like that or principles. And I think you can have absolute truths. I don't think a maximally truth seeking AI actually makes that much sense. I was using that as an example of it's better not to tell it to lie from the start, you know, so to kill humans. But I do think that a diverse range of AI is loosely bound with defined frameworks and organized around increasing individual community and collective agency is probably a good way to do it. And I think there is a route to doing that as well when we look at what this technology actually is. Another topic that you kind of
Nathan Labenz: 21:10 touched on a minute ago and and sort of allude to there again, or at least it comes to mind for me, is the fact that all this technology is being developed in the context of an increasingly tense great power rivalry between at least The US and China, and potentially more players should be considered there too. But even with 2, it's, like, complicated enough, to be pretty vexing. 1 question I have been wrestling with a lot is to what degree should we want the kind of fundamentals of the tech tree between, say, Western and and Chinese AI development to be the same versus how much should we want them to diverge and sort of represent the unique cultural perspectives of each side? My instinct is I think we wanna share a tech tree. Like, the more we sort of diverge in fundamentals, the more it seems like we'll understand what the other side has less, and that just sort of seems to exacerbate these sort of arms race dynamics. But I also do have a fairly, sympathetic response when people say, like, monoculture is bad, brittle, weak, and we need a more sort of diverse you know, almost like take inspiration from sort of biodiversity in the AI eco system or ecology that we wanna create. How do you think about that spectrum of monoculture versus diversity in AI?
Emad Mostaque: 22:41 I think diversity is better and safer in AI. But, I mean, to be honest, there's not much difference between the models today. Like, is the difference between the Lama and a QAN or even a GPT 4 or an o 1? Right? There isn't much. They're actually pretty much the same architecture. It's just an underlying data difference. I think what's important here is objective functions. What is the objective function of OpenAI and all the models they build around Anthropic and all the models they build versus the Chinese and all the models they build? And who can, again, build those things? Falcon has just come out today, and it beats Quinn and it beats Lava and others from The UAE, and they've got a talented team there. We're seeing all these surprises all over the place. But I think ultimately diversity of models is a good thing because different models are usable for different things. And the way that I do it is I think of these models as graduates. You know? Like, does it make sense to have a polymath doing everything? And someone coming up with recipes and creative all the time now. If we're looking at human agency, what you wanna have is highly specialized models that understand context locally. And the more specialized you make it, the lower the energy cost, the more they can be available to everyone. And so I think you're moving to that engineering and implementation phase of AI now. This is separate from the AGI discussion where it gets very complicated. You know, I fell foul to this over the last few years as well. And I was like, my god. I'm building technology that will kill everyone. And I signed that pause letter and all sorts of other things. It's incredibly difficult to conceptualize AGI, be it as a large model with emergent properties or a swarm of models. And I think that's what a lot of this drive has been because, you know, you want to have a pivotal act to stop other people from having it if you have supernormal capability. And I think that's very difficult to sidestep in any way because we won't know until we see it, and we're seeing these models continue to get better. I'm not sure what the natural cap is on that. And both parties in a world that's about to get completely screwed up economically are gonna be fighting over who has preeminence from a soft power, hard power perspective. So I think that there is the useful stuff where diversity is good and there's the AJ discussion which is completely different 1 where I think it would be best to keep that but it'd be very difficult to.
Nathan Labenz: 25:02 Yeah. No. It's, no easy answers. I only bring you the hard questions. So alright. When it comes to agency in practical terms, my instinct would be to say, like, the best we could probably do today would be to maybe tweak the anthropic constitutional approach and maybe make it a little bit more agency promoting. I mean, I think they already have a decent amount of that in there. Certainly, when I talked to Claude, you know, it it kind of wants me to make my own decisions. It doesn't seem like it is trying to, you know, tell me what to do. It does seem like it's trying to to put me in position to be successful. Amanda Askel has talked about how she wants Claude to be, like, a good friend, and that's certainly what a lot of people would think of a good friend is doing. Maybe we could kinda go a little heavier on that and and sort of do a a more agency promoting constitutional approach. Are there any other, like, technical or any sort of approaches that you have in mind for promotion of human agency in AI objective functions?
Emad Mostaque: 26:01 Yeah. This is a question of access and then, again, alignment as to what are you optimizing your business for? How is this technology coming to you? Like, if we give this as a practical example, Google and Facebook both have advertising as their core model, and advertising is effectively manipulation. And, obviously, these AIs can be far more convincing than anything. So the feeling is that over time, they will become more and more manipulative in the way these models are. Again, different from different cultures and things, but that's a reasonable assumption. And again, we've seen that with dark patterns on Facebook and all sorts of other things. This is very interesting because like the inherent nature of these models is that they're probably going to be our best buddies. Like maybe your kid's first love will be an AI and other things like that because they'll be there. First, they have infinite patience. Something like accord may enhance agency but the key question is access and who's running that intelligence? And who owns it? Because it could be copped at any time. Vitalik Buterin has this post, the founder of Ethereum, the revenue evil curve, where it's like infrastructure should be non rivalrous and nonexclusive, but when revenue comes in, you tend to start becoming a bit evil. You restrict access or you optimize, and then when you start optimizing, you act against your end users effectively. I feel AI has infrastructure, particularly when it comes to the stuff for living. So for living is basically all the regulated industries. And we have to think, who is building those black boxes and what is their inherent goals and desires, and could those be co opted? Also, who owns that AI that is your best friend, that is your kid's best friend, and how is it constructed? And do we have visibility into that? Because we have to remember, like, 1% of the people might know about AI, but, like, half the people in America still haven't used AI effectively. You know? Like, Claude is Barry a blip versus chat GPT. And across the world, how many people have used it? Maybe an eighth of humanity at most, which is crazy to think about. Most people won't understand this technology until it's there and it's out there manipulating, and again, the Swiss speech becomes important. We'll care about the ownership, we'll care about the alignment. So for me, it's about open source models particularly for living that you own yourself as an individual, as a community with transparency so you know what's inside it. Working to allow you to achieve very functional objectives. I want my healthcare model to be looking out for me and my healthcare and optimizing that versus a HMO or a drug company or something else. My education model, I want to have a generalized curriculum, but I want to be able to adapt that to my own child without having to have any submission. For government, we want to have models that are transparent and can analyze any policy position and also represent us because how many people feel represented by their governments? And then we can actually have democracy. I think things like this as infrastructure are very important on that, and the building blocks of that and the mechanisms for that are a slightly different type of RL and also a different type of community aggregation. And I think this, again, is 1 particular area of regulated industry AI for living, the important things of life. Then I think there's another block which is personal intelligence. This is gonna be your Apple intelligence, Google intelligence, Tencent intelligence intelligence where you are. And then there are these expert system intelligences, which are the Anthropix and Claude's and o ones of the world. Because the Apple intelligences will be free. Hopefully, we can make the living log free, and the other stuff will be expensive.
Nathan Labenz: 29:34 Hey. We'll continue our interview in a moment after a word from our sponsors. Almost perfect segue to the intelligent Internet, but I just there are a couple of things I have to follow-up quickly before I kind of invite you to share your big vision for the future. 1 I'm really wrestling with right right now is the question of should I get an AI toy buddy tutor, whatever for my kid? My oldest is about to turn 6. I feel like there could be tremendous upside to it if, you know, if we had the right product and it was, again, had the right objective functions and it wasn't overly manipulative. And by the way, the the manipulation and the sort of sense of being optimized against, I have felt very strongly personally in years past when I was running performance advertising on Facebook. It was crazy how often you would see, like, they're optimizing not just, of course, to, like, get users to click stuff, but also, like, optimizing to get us to pay the maximum amount that we can possibly pay until it drops our, you know, our margins down to roughly 0. And they're kind of doing that to everyone at the same time. It's like quite a system that they've that they've created. So I'm very mindful of those big forces are out there and I'm very wary about putting my kid in a position where they're being acted on by such forces. But then again, do think, like, you know, individual ongoing tutoring for hours a day is obviously a proven method for learning and, like, AIs can do that whether it's math or reading or whatever. So you could answer that in any way you want, but if you had even a recommended product, I would go buy it. But how do you think about right now, like, introducing AI into the life of children?
Emad Mostaque: 31:19 Yeah. I mean, like, right now is probably gonna be fine. It's just, again, once you get into the optimization equation, you start to optimize very aggressively. Right? Like, I'm sure Matt is selling space in the larva related space right now. I would. You know, it makes sense. Because you have to make decisions and you can't and models are becoming increasingly deterministic versus stochastic in the way that those are, and we're getting more and more understanding of how they work. I think that you're not at the point right now where, again, there will be the first love, but it's pretty close. Like, you've seen character AI's explosive growth and unfortunately, like, the Daenerys situation, whether it Daenerys and the kid shot himself, very famous. Statistically, that will happen because of the large numbers of people and, again, our instance in mental health. It's incredibly sad. Right? And, just like self driving, we'll hold AI to a higher account. But we have seen people falling in love with their AIs now for a while. Like, I think oh, what was it called? I can't remember the name. But in any
Nathan Labenz: 32:13 case, replica is 1 of the early
Emad Mostaque: 32:15 ones for sure. Replica, exactly. Valentine's Day last year, they shut off the what was it? Adult messaging on Valentine's Day. And I think 66,000 people joined the Reddit because their Valentine's Day's plans were ruined. That's just a snippet of where we are coming because now we have photorealistic voice speech, all these other things. And so we have to think again, what is the intentionality of these agents, of these embodied intelligence, where are they coming from? Because, like when we have a teacher, we wanna know what their CV is or here in The UK, have DBS checks for criminality and other things. I think you'll want the same when you have intelligent agents coming in to your house, particularly with your kids, particularly because they'll be there with them. And again, they will help them and that's how they build the bonds, but again, what are they building it towards? Is it to direct them towards advertising or others? But I think we're quite there yet on intelligence. Again, this is where we have an opportunity to set good standards because it's better than say the YouTube algorithm, which will make them watch baby shark and that dark underbelly of YouTube. And if you've seen it where you've got like Spider Man and Elsa drilling teeth and all sorts of things like that.
Nathan Labenz: 33:19 I've caught glimpses for sure, and it it gets real weird real quick. And I am like, what is it that you are even enjoying about this when I see my kids occasionally falling down that rabbit hole? But there's something highly optimized about it clearly.
Emad Mostaque: 33:33 That is exactly it. They create millions of these videos, and literally, I've seen Elsa and Spider Man giving someone a root canal with very happy kids music, like, generated. Like, what on earth is this? Because it's selling ads. Right? Similarly, you know, the TikTok, doom scroll, and all sorts of other things. So I think but the benefits, he said, can't be outweighed because the bloom effect, the 2 sigma effect of 1 2 intuition is huge. So I think there needs to exist some sort of open standardized infrastructure that we have for humanity that's a finite range thing where we're like, we trust this. Because just like crypto should have been trusted and turned out to be this massive raccoon infested crapness, I think AI do you trust this AI? Is a question, and you want to be able to trust the AIs because they are more capable, and you want them to help you. And so I think that there exists a high level bar for that, and you do need to have standards, regulations, others. It's just no one's quite sure what they are yet. So let's build transparent open AI as it were.
Nathan Labenz: 34:32 Yeah. I think that is the perfect setup to the introduction of the intelligent Internet. And I wanna give you as much time as you want really to just kind of lay out the positive vision that you have for the future. I think I often say that the scarcest resource is a positive vision for the future. I'm always struck by how as many new launches as we get and all these advances, such a tiny fraction of the time is devoted to actually trying to describe, like, what life is supposed to be like in the future and why we're supposed to be excited about it. So I really appreciate that you are doing some of that work, and I would love to hear your kind of, you know, positive vision, what life is like in a future of the intelligent Internet.
Emad Mostaque: 35:21 Yeah. So thanks. So the intelligent Internet, we put out the Primer is this conceptualization of a infrastructure for information transmission and value whereby everyone has their own AIs that represent them on an individual community, country, and societal level. Because what I'm seeing now is AI models that are satisficing in capabilities. Like, does anyone really need more than 1? Like, sure, there's a few things like retrieval and other stuff, memory, self learning that it needs, but it's a level of performance that if we could give it to everyone in the world in the right way, I think the world would be better. And then when you start breaking it down, you're like, how could it be better? Where is the human capital shortfall? Because again, like, you still got 600,000,000 people without smartphones. You still got so many people living below the poverty line. The average global IQ is 90. That's the average. And it's not due to lack of intelligence. It's due to lack of infrastructure. You don't have enough good teachers for everyone. I mean, how many people listening to this believe that they've had all good teachers? How many people are happy with their medical system, their government, and others? So I was like, AI can really change that in a very interesting way. We can build a better system that's interconnected for that. This is important because if we go back to the start of the PDU question, there are 2 types of decision making. When you make decisions in a stable environment, and this is where you should have a PDU or probability, you did expect utility calculation. This is a probability of all these things and then we do expected utility across that. When stuff is uncertain and we move from decision making under risk to uncertainty, you minimize the maximum regret. So if you're go at the start of a desert, you don't know where the oasis is, you're not gonna take any steps, which is the downside of this. So you need to articulate positive visions of the future. And what that is that if you look at the energy equation of how much it costs to do a GPT 4 level query, It was 1000 watts of electricity, and now it's 5. You can have an original GPT 4 level model that runs on an MPU now, which is a 5 watt of electricity empty. Solar power is now less than a dollar per watt. So then you have a chip cost and an inference cost that's probably $50 to have a GPT 4 level AI offline anywhere in the world. And you think about that and you're like, wow. You could give a tutor to every single school in the world. What if you connect to a Starlink? And it's off grid. You look at medicine, you're like, how can this change again? Then again, if you look at all the stuff you have for living, which I think is the most important thing, and we want that to be robust infrastructure, an AI first approach to those where we have gold standard transparent datasets that then create models and systems that we can deploy online, offline could be something that's world changing. I think you need different models from the individual to community to country to societal level. Because to take a practical example, half of everyone who listens to this call will get cancer, which is crazy if you think about it. Everyone who's listening in has had someone in their family or friends who's gone through a cancer experience. And it's awful. You feel that massive loss of agency when that occurs. Because we don't have cures yet, we have treatments, but still it's very traumatic. Medical AI models now are performing human doctors on research, on diagnosis, but also empathy. And how many have had an empathetic experience to their cancer journey? What happens if we give every single person going through cancer, multiple sclerosis, Alzheimer's, an AI that helps them through that journey? How to talk to a family? What's the latest comprehensive authority of up to date information? You know? It's always there. It's always available in every language. The world becomes a better place, ironically, through empathy from AI. And that's infrastructure that should be available to everyone. This is a concept we call universal basic AI or you buy as it were, which I think is better than income and we can discuss that in bit. But then you look at that and again, every condition, you look at education, everyone should have access to an AI that's looking out for them and how can I maximize that child's potential? But then shouldn't all of these feed up in a proper way to a collective knowledge? What's a collective knowledge on cancer? If a paper comes out, why isn't it instantly analyzed and checked and absorbed into our knowledge, be it probabilistic or certain? Why do we have 1000000 different vaccine schedules around the world? I'm not saying vaccines are good or bad, still it doesn't make much sense. People are people. What is an optimal way to teach calculus or machine learning or anything like this? So I think if we can build intentionally, we can build better AI first systems that come together, and that's why we call the intelligent Internet. Because we're at a very interesting point in time whereby your generative AI for health is probably a little bit higher than most, but it's basically nothing compared to what it'll be in 10, 20 years. You'll have an o 1 level intelligence organizing your health if it's built. And your hospital will have the equivalent of a supercomputer. And your country will have dedicated hardware internally against intelligent capital stock as I call it. That's organizing the health system. And then we should have a central organization to have a comprehensive and qualitative up to date knowledge base for cancer, for climate, for education. All the stuff we need for living. And it's a finite amount of energy that's required to build that. And then that's a good infrastructure that we have to support us. And so that's what the concept of the intelligent Internet was. It was like, let's build datasets, model systems, and then the right hardware so that we can deploy this at scale to everyone and lift the intelligence of humanity in a secure and robust way. And so we outline the end state, and in the upcoming pieces, we're gonna outline the ways to get to that from identity to digital assets to physical infrastructure to the models themselves. Because I think it'll be difficult to do this in a decentralized distributed or market competitive. Well, I think you just got to go and do it. We know what to build, know? Just build it. Organize all the cancer knowledge. Why hasn't anyone done that? I mean, think of how crazy it is. Like, again, if you got a diagnosis, God forbid tomorrow, what would you do? You would go to court and you would check this and you would do that and but why isn't that all just in a box? You know, and again, for me, that's infrastructure. Clayton Christensen coined the term disruptive innovation, said infrastructure's most efficient means by which society stores and distributes value. And then information theory called Shannon is that information is valuable as much as it changes state. Deliberately building the datasets, models, and systems, making them freely available as cheaply as possible, as low energy as possible, is a new type of intelligence infrastructure for society that I think will do better, then we can build other stuff off.
Nathan Labenz: 42:14 Just to take a second and steel man the case for the or at least try to steel man the case for the leading developers today, I feel like you would hear a lot of high level conceptual agreement on what you have laid out. And maybe not necessarily at the level of, like, exactly where the models reside, but certainly, they talk a lot about, like, intelligence too cheap to meter, and they have certainly dropped their prices dramatically. Possibly, they're, you know, all operating at a loss on the margin to try to compete for share. Not entirely clear. They also make a lot free. Right? I mean, OpenAI just yesterday released their search feature to all users globally for free. There's a lot that's free. Is there a critique of sort of what you see them doing today that is central, or is it more just that you extrapolate into the future and you see that this structurally, like, won't work for much longer?
Emad Mostaque: 43:20 I think it'll be very difficult for the proprietary model developers because Apple Intelligence, Google Intelligence will be free. So this is 1 of the things I had at stability where I was like, do I do a giant funding round? We never did another 1 after the first 1. We did, like, uncapped notes, $100,000,000, whatever. Because, like, what is the sustainable model on the API or SaaS side when they're coming for you? Like, Gemini flash 2, if it's the same price as the original Gemini Flash, it's 7¢ per million tokens. Yeah. That's crazy. That's like, what, 50 times cheaper than 4 0, and it's a really good model. Right? I think that there is this thing where that type of intelligence as a consultant is available. I think there is this world of open weight. There's this personalized AI that you have Apple intelligence, Google intelligence, others. There's there's this other area which I'm focused on, which I haven't seen anyone tackle, which is regulated industry AI. I don't think you can have black boxes there. I think we deserve better. I think we need open data, weight models built deliberately. And what I have is generalized common knowledge because the data for that should all be common knowledge for education, for healthcare, and others. Again, we have curriculum learning. Then you have your sectorial and localized version, you have a Mexican healthcare model. And the data to build that is finite and you can get to a gold standard very quickly. That should be infrastructure that's non rivalrous and non exclusive in my opinion. I think everyone would benefit from that because everyone wants to have high quality localized datasets, specialist datasets, and more. Again, the data that you need for a medical model for Mexico is in the medical textbooks of Mexico. You know? And I haven't really seen anyone build that deliberately and really optimize against how do we get this to as many people as possible, which is why I said, let's just go and do that. Because that's the biggest delta. Because many of these organizations, you're still fighting for SOTA. Right? You're still going for AGI, but you're satisficing. Like, how much difference is there really between an Anthropic and an OpenAI and a Gemini and a Grok? They're all pretty good right now, and they're good enough to make a difference. So I don't think you need SOTA here, which is also another very interesting thing. In fact, do you want to have a state of the art unproved model for your health care? You want a really robust standard 1 that behaves very predictably, which also is why with the advent of function calling large complex windows is the ideal time for this. For education, for that, again, anything in regulated industry. Do you want a black box checking every medical decision then eventually make no, you don't. Or checking every government decision then maybe make no, you don't. So I just think there's this particular category that's very interesting, but I haven't seen anybody look at, but I think it's probably the most important 1 for flourishing. But then it becomes very interesting because if for all the stuff you have for living, you know, Mazo's hierarchy of needs, you have these open way, open data models that we're trying to optimize for as low energy as possible for maximum impact to get to a level performance of, like, almost equity token, equity MMLU, as it were, then I think that can guide the discussion because suddenly you can parameterize and have datasets for cultural diversity across nations, and you can figure out things like DOCSER. You can have a feedback loop where if you release all the data open from teaching kids and learning from kids, how to increase agency there. On the medical side, again, self learning systems. So you can make everything open with PII and other things, and you can have interesting things like if you have a cancer AI to help you through your cancer journey, it can tell that you're a human because it's attached on the regulator side. And then you can use the same system for your education, for your government, more. That's the functional foundational identity. So there's a lot of benefits that can emerge here that could help with the AGI and state of the art. But again, I view those almost like expert systems graduates, consultants from McKinsey, you know, where there's some more like the graduates that work for you and your team. And the Apple and Google intelligence, the world and others are probably not gonna be doing your health care because they don't really want to. They wanna do your personal life and everything. They want to interact with health care, though they want to interact with education. But I don't think those organizations are gonna go after that. So I think it's just nobody's really looked at that, and, again, I think it's inevitable. And the question is, who? Do we have 1000000 different health care models, education models, and more? When we only really need 1 that's then customizable and adaptable and is MIT.
Nathan Labenz: 48:00 So is this sort of model for the future that you have in mind 1 where because you're hearing, of course, from Sam Altman and Dario that they sort of expect to run away from the competition in the next couple of years. They are not sharing the o 1 chain of thought. They've said this is kind of GPT 2 level scaling of of runtime compute, and they know how to get to GPT 4. They think they're gonna blow the doors off of everybody there. Dario and team have said in their, I think, credibly, attributed but leaked fundraising document that they see a world in, like, 2025 time frame, which is, by the way, 2 weeks away, that the leading developers could get so far ahead of the competition that nobody can catch up because they have these sort of, you know, enrichment loops that keep advancing the state of the art. And if they're not sharing that you know, I also had a recent data point on this when Nathanlabbert from the Allen Institute did an episode on everything he's learned about post training. And 1 of the big takeaways for me was he said, in the absence of access to GPT 4 or similar to create the samples that we do the post training on, we can't do it. You know? There's just no we even at the Allen Institute, billionaire estate funded, we don't have the ability to go do a Scale AI contract or whatever to bring in the expert data that we would need. So do you think that happens or doesn't happen? If it does happen, is the model sort of they become, like, the go to for the pharmaceutical industry and cybersecurity industry and all these sort of, like, things where you really do still need to push the frontier of the state of the art forward. And then at a more distributed level, we satisfice and, like, are really more focused on reliability and sort of explainability when it comes to things like making our own personal medical decisions?
Emad Mostaque: 49:55 I think that's the case. I think that we over indexed on chefs versus cooks. So you know, everyone's on the spectrum. Wait. But why had a great person from being a cook that follows recipes to making recipes? How many recipes do you really need to make? Where AI is actually really good now with function calling large context windows is following recipes. You give it a handbook and it's basically replaced as a SaaS or replaced as an employee or something like that. We want to have more and more employees that follow the DOM instructions. In fact, how often do you need to come up with something brand new for the vast majority of people? For that, there is the o 1 type models. There are all these other kind of things. And again, this is where you have the AGI Raceway. But if you're training, say for instance, on 1000000 h 1 hundreds like Elon's about to or 1000000 TPUs like Demis is about to or 1000000 trading, like, God forbid, Dario's about to, poor guy. You know? Like, how big is that model actually gonna be for inference side? You kind of went straight through the consumer side, and now you're like, well, am I gonna build a multi trillion parameter model that needs a Cerebras wafer or a Google TPU pod to run. You know? Because that's not really suitable for consumers at the side of things, but it might be more intelligent. But it might be 5% more intelligent. Whereas it requires orders of magnitude less compute to have the lower ones. And this is where it becomes very interesting because classically, most of computation has been sequential as opposed to, like, paralyzed. And we're seeing this now again with test time compute and others. Like, who's gonna win if you need to have millions and millions of GPUs running in parallel, running o 1 type things to check something and cross check it? Probably more distributed elements. But what is winning? If winning is AGI heavy, you describe it, then fine. Those are the expert systems, and that's where it is. But will 1 cluster win over a massive amount of post training with IQ 130 things checking? Well, 1 cluster win over, let's take the China example, getting a 100,000,000 people in China to label stuff. This does not mean that I don't really understand where it's going because we haven't proven that we can break through PhD level yet. And even if we did, how useful is that in the context of all the actions of humanity and improving humanity's agency and capability to a brighter future? Like, this is the same question as what do you use a quantum computer for? Actually coming up with the questions that you wanna answer is really hard. This is why, like, I love o 1 for the more advanced stuff, and I'm using it to help me improve my CUDA and kind of other things. But the average person isn't gonna get much mileage out of it because it's too damn smart. And do you need even smarter than that? You have to set it even better questions if it's smarter than that. Whereas again, like I said, this particular area of AI for living, I don't think needs that at all. I think the models right now are good enough, full stop, to make a transformative difference for the world if we organize the data correctly. The category of personal AI, your Apple intelligence, so Siri isn't stupid, and other things like that, it's nearly good enough. You know? The category of expert systems, yeah, it might be a race away thing. But I don't think it's interconnected race away thing. And if it is, then Google will win. Because Google's interconnect fabric is so much better than everyone's, and they're deploying it at a ridiculous pace. The only potential competitor I could see might be x AI. Anthropic, I don't see Tranium and the data funnel scaling that much and OpenAI similarly. I don't see us getting that much.
Nathan Labenz: 53:37 Yeah. Reports of Google's demise, I think, were always greatly exaggerated.
Emad Mostaque: 53:41 I never got that. I mean, I I remember when it first came out, was like, is stupid. They've got the people. They've got the talent. It just takes time to steer the ship. They've come up with all of these things. And again, if you look at the hardware, like, we have thousands of TPUs. TPUs are the best interconnect hardware and individually addressable, which is why Google's models are the only ones publicly available with 2,000,000 token context windows going up to 10,000,000. People don't appreciate how insane that is with like 100% needle haystack. You can do some crazy things that we're just exploring the surface of right now. I think NotebookLM was probably the first example of that. You upgrade a gigabyte of things to NotebookLM, and then now you can dial in to talk to the podcast host.
Nathan Labenz: 54:25 Yeah. The 1 wonders how much longer we'll need human podcasters, but I'll spare you my personal anxieties about that. I also do have a a challenge, which I fail on still every day of spending a dollar a day on Gemini flash. And it is just mostly for me a reminder of how cheap things have become and how much information I really could be processing for very, very little info. And I I still just need to, like, figure out the right structures to do that for me.
Emad Mostaque: 54:55 Just get to watch your screen and tell you when you're being unproductive.
Nathan Labenz: 54:58 Yeah. I saw that demo in the last couple of days. And, yeah, that's the always on observing, occasionally interjecting AI, I think, is gonna be a really interesting paradigm. I really like your chef and yeah. Go go ahead.
Emad Mostaque: 55:12 No. I was just saying, I think, again, if you are boiling these models down to everything, it does come down to engineering. This is why Elon managed to catch up so quickly. Right? Because he's an engineering maestro. Like, 100,000 GPUs, all good. But then, like I said, if you look at the different types of hardware and the interconnectability and the addressable RAM and other things like that and not paying the NVIDIA premium, it matters a lot. This is why I think the latest drama with the OpenAI and their blog posts and things like that, like, hey, there's Elon, there's this, they were talking about buying Cerrebros because of that massive addressable memory and things like that. So it'll be very interesting to see how that all pans out.
Nathan Labenz: 55:51 So I like your chef and cook notion as a simple heuristic where maybe a 1, 2, 3, 4, whatever goes off and figures out how to cure particular kinds of cancer or maybe Claude 4 or 5 or Gemini 3 is doing that stuff. And then on a daily basis, we satisfy and we try to make sure that we're applying the actual latest known stuff in an effective way as opposed to trying to advance the the state of the art, you know, each 1 of us as individuals. Let's talk about the structure for this. I think in the primer for the intelligent Internet, you do a really nice job of walking through the different tiers of the critical resources. Right? I mean, obviously, infrastructure compute infrastructure and data as well as, of course, the algorithms and the human, talent that makes it work are the fundamental inputs. And you've kind of mapped out a structure where there is, like, the highest tier, biggest scale of those, and then a more distributed level and then a more local level and ultimately individual users get to take advantage of this kind of on their own terms. Take us through that vision for how this actually gets built out because it is like a physical capital thing and structured to achieve the goals that you've articulated.
Emad Mostaque: 57:17 Yeah. So, again, if I kind of look at the future, I see very rapidly, we're gonna see economic disruption because the labor theory of productivity is about to be challenged. You start to find people. Again, anyone listens to this, finding human capital is the hardest thing. Whereas AI will follow instructions, and it's very reliable, relatively speaking. I mean, sure, it was a goldfish earlier this year. You know, like, it's Savant PhD, has had his coffee and he just responds to stuff. We're solving most of those things right now. Right? So I was like, there needs to be this capital stock build out. And right now we're seeing it on an individual level because you're getting m 4 MacBooks with 16 gigabytes of VRAM as standard. You're seeing 8 gigabytes on 4070s on laptops. And Cascade had 8 gigabytes on the integrated c's and AMD chips with the 70 ATMs. Those will soon all standardize, in my opinion, to 16 gigabytes for AI compute, be it with an MPU at 5 watts or be it with or a neural processor or more advanced graphics options at 25 to a 125 watts. This is a build out that enables intelligence on the edge. And again, every manufacturer is doing that. But then you think about a hospital, and that's what we call the distributed nodes. You wanna have a standardized stack that can go into any hospital, transform the data, and then add an interface to that to interact with your personal intelligence and more. And then you want what we call hyper nodes, which are national nodes to organize all the health care knowledge of the entire nation or organize all the local cultural data of entire nation and keep that up to date as well. So we kind of we had this stack ranking where there's different levels of computation from inference level 16 gigabytes to, let's say, now, fine tuning level at a hospital level, like a 128 h 1 hundreds, 256, can get any hospital or bank or anything like that pretty much fine tuned. Then you've got the national level, which should probably be in the thousands of h 100 equivalents to organize all of that. Maybe there's an order of magnitude each way. And then when I looked at that, was like, this is really interesting because this is a defined capital build out. They wanna standardize as possible. So you have boxes that plug it. They can just look at stuff because these models are amazing at the infrastructure level. And they offer different types of experience at different levels. But again, 1 healthcare model for cancer is very easy to turn multilingual for everyone in the world, right? And 1 radiology model and 1 education model was theories of those. When that's actually the next part, think we mentioned it briefly in the prime. I was like, that looks a bit like Bitcoin. As it were, which is maybe a story for later or another day. But dedicated infrastructure build out to create an intelligent capital stock that then gives capability. Just like building factories gave capability. We don't need factories anymore. Right? You actually need pure hardware running as standardized models as possible, And those standardized models need to be based on standardized datasets and just have regular releases and be predictable in the way they operate.
Nathan Labenz: 1:00:22 Okay. I have a lot of little detailed follow-up questions. Mhmm. I guess for starters, so much is is being made right now about the energy requirements of AI that I find this kind of confusing. On the 1 hand, obviously, we hear, you know, $7,000,000,000,000 build outs and crazy energy grid expansions. Some extent, I sort of interpret those as, like, trying to expand the over different windows so we can get, like, a couple of new power plants online versus, like, very little new construction, obviously, in The US in recent history. It is, I think, a very good grounding to talk about the idea that a 5 watt machine can power an AI. I wonder if you could flesh that out a little bit more, like, what you know, it seems like an 8 b, a llama of 3.38 b Yep. Is kind of hitting at that GPT 4 level now, and that's quite runnable on a certainly on a new Mac. Right? Can you give us a little bit more there on, like, exactly how you envision that?
Emad Mostaque: 1:01:26 Yeah. So 8 b model is runnable on a Mac neural engine. So a Mac neural engine will use about 5 watts of electricity as will the Intel MPU. But if you want more, like, I've got my m 2 max and I'm waiting for my m 4 max, that will use up to a 100 watts of electricity. Right? If you look at the h 100, the h 100 uses 1000 watts of electricity. So you can use more energy and it'll go faster, but 10 tokens a second is human reading speed. So you can you know, you see these models go like that. If it's faster than you can read, what's the point in some ways? Right? And there's a question of how many tokens does a human need to flourish? And so we could do some calculations around that. We're like, well, if you don't need everything synchronous and you can let it go, then 5 watts of electricity is probably all anyone needs. And again, you're seeing that because you can now run models on a smartphone. Smartphones tend to top out at 25 watts of electricity. This is where it becomes again very interesting because let's rewind a bit. On the stability, we did stable diffusion. That cost maybe $5,000,000 of compute in total, right, to do. That's a decent amount of megawatts of electricity. You know, it's 1000000 or something a 100 hours and a 100 is what? Gosh, I can't remember. 400 watts? So 400 megawatts of electricity. Is that right? Something like that. And that's a good amount. But then to run the model is just a little bit on the edge. So you think about it, the pretraining and then the post training, etcetera, comes out with an artifact, which is this compressed set of weights That then means that you don't have to retrain that thing. Where a lot of the exponential energy usage is coming from is a lack of optimization, and this is why Meta releases Lama, 1 of the reasons. They got 350,000 H100s they're about to go pick up, 10% to use for training. They've improved the efficiency of Lara by more than 10% by releasing it. So the remaining 315,000, it more than pays for itself. You've seen this for example, Genmo released their new video model, Mochi. Day 1 it required, I think, 140 gigabytes of VRAM, within a few days it required 20 and that requires 8. Like, again, there's massive optimizations around these models that can occur as well because it depends on what are you optimizing for. But you look at Google, you look at Microsoft, look at everyone else, they're like, we can't afford not to do this build out because what if we're wrong? What if we can't optimize? Because no 1 knows what the satisficing level of performance will be for any of these models. Like, you look at v o 2 right now that's just come out from Google on the video side or Sora. It wouldn't surprise me if the VRAM and energy requirements of those dropped by an order of magnitude overnight because someone's figured something out. We don't know how much these will require. So we overbuilt to a degree, but we're building more and more of this capacity out. But then it becomes interesting because although we've had more and more of the energy usage, you think about the energy usage to make a movie, how many megawatts is that versus what it will be? The energy usage to make this podcast, the energy usage to write a book, the energy usage to provide a medical input. And I think that AI is very beneficial for this because, again, there's this lovely chart that you can look at of GDP per capita versus energy. It was always that relation. It's broken that. Because in a year, we could go to a kid in Sub Saharan Africa and give them solar panel, an MPU chip, and a version of Lama 4 that speaks their language, and then they have access to how much knowledge. Right? Or a medical version of that or anything like that. And that's crazy because you would have to build the infrastructure in place. You add Starlink to it. It gets even crazier, but you see what I mean. It's like, this is a fundamental change of energy. So I think we have this energy run up now, but then global energy is not gonna keep up consuming unless, of course, I'm wrong and we end up with, like, freaking Dyson spheres running AGI and other things like that. But I feel that probably won't be the case. I think there's diminishing returns on scaling.
Nathan Labenz: 1:05:43 That's a important idea. Just to put a a little bit of my own, you know, back of the envelope math on the energy requirements, where I live in Detroit, Michigan, the cost of a kilowatt hour of electricity is roughly 20¢. I think it's a little lower than that still, but call it 20¢ as a nice round number. 5 watts is 1 2 hundredth of a kilowatt, obviously. And so I would be looking at if I'm running my AI on 1 of these 5 watt machines nonstop, I would be looking at a tenth of a cent an hour in energy cost or, like, $2.02 and a half cents a day, which means, obviously, I'm looking at single digit dollars over the course of a year. And I think that is a dramatically underappreciated point about just how accessible the marginal cost actually operate the thing you do if the chip cost too. But just how accessible that already is for something that, you know, with a llama 3 3 8 b is, like, getting really quite good.
Emad Mostaque: 1:06:52 Yeah. I mean, Hugging Face just released their latest thing on their o 1 replication. They've got llama 1 b outperforming llama 8 b with test time compute. And again, you put it 24 7, you're interacting with the twin. How crazy is that? Where is the lower bound on these things? We're not sure. But it's got to that point where as I said, people don't appreciate that. And the in the example I gave was solar power. If you hook up a solar panel to the grid, it takes up to, $4 a watt. If it's off grid, it's less than dollar a watt for the solar panel. So then what the cost cost is effectively nothing. And, again, the question is where do we satisfice? Like, how good do you need? Like, you again, you look at v o 2 today on the video side. Do you need anything better than that to make a Hollywood level movie? No. Not really. Like, sure, there's some small things, but those can always be tidied up in post. Right? And we figured out consistency and other things. Do we need we have chatbot models now that outperform human doctors on empathy and diagnosis. Do we need better than that? No, we need to optimize the heck out of it and figure out where that lower bound is. And the amazing thing is all these models are still full of junk. Like Ilya Sutzkever just said in Europe, like we've run out of data. How much of that data is junk? Still, even in these top models. We're still getting to grips with this. We saw the Sana team, previously PixArts, create a stable diffusion level model with 25,000,000 images, and we use 2,000,000,000. What is the lower bound of data that you need to have for a language model to achieve x to teach your kid? Does it need to have seen Reddit? No. Does it need to have seen this? If you wanna build AGI, yeah, you need all that data. Maybe. But do you need it all in 1 place? Like, what is the limited data analysis? And we've seen interesting things like I think PlayaAI just did like an AI that was just trained on knowledge up to 300 BC. I haven't seen what it's like, but those are the types of things that very really interest me. Like, what does that look like? It AI. Feels like Right? It feels like we it feels like we have an order of magnitude improvement still to come just from data, honestly. And we're seeing this with fine web and kind of other things and the synthetic datasets and augmented datasets that we've had.
Nathan Labenz: 1:09:18 Yeah. There's a lot the efficiency curve is is truly crazy. I mean, 5 models from Microsoft are an interesting data point about how small your dataset can be and how yeah. I think theirs are, as far as I know, entirely synthetic for that line of models.
Emad Mostaque: 1:09:34 Yeah. They did 13.8 epochs on, like, 400,000,000,000 tokens for the core, but then they have to add in, like, a map of the Internet because they're so boring, and people don't wanna interact with boring models. Because, again, people take them generalized. So when you add back the Internet, they get less boring. But 5 models have the worst sense of humor of any model.
Nathan Labenz: 1:09:54 Very yeah. Textbooks aren't all you need for a sense of humor, I guess. But, yeah, so many things like that. Let's go okay. So that's the low level. It's incredibly accessible and affordable to satisfice. If you
Emad Mostaque: 1:10:09 go up
Nathan Labenz: 1:10:09 to the top end of your infrastructure for the intelligent Internet, this is the, like, national or perhaps even international hyper node level. A lot of questions about that, I guess. But maybe for starters, is your hope for safety and transparency ultimately rooted in open data? It seems like open weights, we can hope for a breakthrough in interpretability, but we don't have it. So it seems like right now, if you were to say, we wanna make something we can feel confident is safe and we feel confident is not gonna be too crazy, you would have to do that by sort of saying, we deeply understand the dataset, and that's why we believe this. Is that right?
Emad Mostaque: 1:10:52 I think it's necessary, but not sufficient. Like, I think the existence of gold standard data sets that reflect individualized cultures and other things are gonna be vitally important when it comes to decision making AI. Again, how do you understand whether people in Mississippi want or the people of Malawi want or the people of Mumbai want? If you haven't mapped out their individual culture, the decisions, and kind of other things. And open data with open models probably makes them more interpretable, but no one's figured out interpretability. Again, I was finishing SAE and other research, but we haven't figured this out yet. I think it will help, and I think, again, the existence of these gold standard datasets will help the entire space because when Luta released the pile, it became a standard. When Lyon released their datasets, it becomes a standard. That's why I originally called the company shelling because I was like, let's build standardized gold standard datasets and just improve them rigorously, which is what we thought the hypernodes would do. The hyper nodes coordinate and improve the datasets and then can go down to the organizational community and individual level and have a self learning system that, again, just improve, improve, and keep it up to date. Then anyone can take those because we release everything MIT permissionlessly, which is the other important thing. Like, you don't what if someone you break the rules and they shut off your AI? If bit screwed then, right, like, for whatever reason. And then I think, again, that can move towards standards because the highest value AIs will be decision supporting and making AIs. And what do those look like and how do we understand what those do? And are those got repeatable frameworks? The most widely used AIs will be the AIs that are available, especially if they're permissionless. I mean, we have 300,000,000 downloads of our models at Stability AI, you know? And that's a lot of freaking downloads because they're just available. But then people built around them because they didn't need permission. So I think that, again, it's necessary but not sufficient because, like, otherwise, they look anthropic and open their eyes models like, how will they incorporate conceptualizations of Chinese or Japanese ethics? Is anyone even taking that seriously? But then what if they use to make decisions in the Japanese government as they kind of are right now? Like, I think you need to be very deliberate in the way that you figure this out on the underlying principles and the feedback loops there. And that was the final part of the intelligent Internet, which was I think people should have a say and be participatory, which is why we're looking at the digital asset space. We're looking at distributed governance. We're looking at all these other things to say, is there a way that can all be brought together? Which is not easy. But, you know, hopefully, we can figure that out.
Nathan Labenz: 1:13:34 Yeah. Well, I'm interested to hear more about it sounds like thoughts there are still in development, but the whereas the user level seems like it almost is gonna take care of itself and as much as everybody's continuing to buy phones already, I don't really know what to expect at the hyper node level. Like, I don't know how many hyper nodes you think we need. Are they billion dollar? Are they $10,000,000,000 investments? And who's gonna make them?
Emad Mostaque: 1:14:04 Well, so I think a comp country's competitiveness was about its physical infrastructure. How many things can you build? It was about its graduates. What's our intelligence smart graduate network? Right? It will become about your intelligent capital stock. How much compute coordinated into models do we have access to? And, again, if you look at the unlinkly of China US versus beyond the Jingoism, it is a question of let's build compute in America. Let's build compute in China because from a test time compute perspective, that is what your competitiveness becomes. You're seeing this across the world, like Malaysia and Johor just built a 9,600,000,000 AI data center. You're seeing billions of dollars across Asia. I don't think you need to have that much beyond the thousands range because although there's like $240,000,000,000 or something like that of AI building going on, when you start looking at how much individual clusters and things like that are, the fastest, largest biomedical healthcare cluster in the world is probably Chan Zuckerberg with 1000 H100s. That's like $20,000,000 a year, which is not nothing, but it's not ridiculous. Know? And so you're like, how much would it probably be in the thousands ranges would I think will be sufficient to organize national datasets and create them and even provide almost free AI to the vast majority of people on a reasonable sized nation.
Nathan Labenz: 1:15:28 You're talking about, like, thousands of h 1 hundreds being enough at the national level?
Emad Mostaque: 1:15:33 Yes. And this is not in the short term. This is like long term h 100 equivalents. Like, once you build a gold standard dataset for education for Bangladesh, how much do you really need to update that? Not that much. And if the outcome is a model that's updated every so often, that's a few gigabytes, then goes onto everyone's phones provided by TCL or Huawei or whatever, then again, that isn't much. And what's the cost of inference? Pretty much nothing. A shift again that needs to be developed deliberately. This is separate to the state of the art. The state of the art costs a lot, and it costs a lot to inference, and it costs a lot to do everything else. But for me, like I said, this is for me is the most interesting part. Give every child their own AI tutor that's aligned with them. Make sure no one's ever lonely to their health care journey anymore. Make governments more representative, finance non corruptible, and things like that.
Nathan Labenz: 1:16:26 I'm surprised by how small you're projecting that to be. Are there is there any assumption there about, like, having a general if I'm Malaysia, for example well, Malaysia, said built a $10,000,000,000 1. But let's say I am Yeah.
Emad Mostaque: 1:16:38 Just now. Small country
Nathan Labenz: 1:16:40 and I've got my Chan Zuckerberg scale thing and it's thousands of h 100 equivalents. I can't train a foundation model from scratch on that. Right? So I I would I can do a lot of fine tuning. I can do a lot of data processing, but or at least it would seem hard to do foundation models from scratch.
Emad Mostaque: 1:17:02 You just need 1 entity to do it. Like, my Meta doing llama is good enough for almost most things. Right? And they did 20000000 odd h100 hours for the largest version of LAMA. And that's what? Let's say $40,000,000 spent for LAMA 70b 3. Then everyone could just take that and fine tune it, but to tune it requires 128 to 2 56 H100s. And the order of magnitude cost is 100,200 thousand dollars to create Arabic version of that. And so again, we're blinded by the massive compute and the edge stuff when actually it doesn't require that much to do the tuning. The thousands enables you to actually give access to your people to the technology, especially if, again, you don't require rate limits or just have spicing a level. And it also allows you to do the tuning. So you probably start out with a few 100 and then you build up from there. Again, when you look at the disparity of computing energy and other things, it's very interesting. Like I was in South Africa, think the fastest cluster there was like 50 or a 100 h 1 hundreds, but then the clouds are building even larger ones. This is why it's very achievable if we're coordinated, But then do you wanna be giving your people that level of healthcare AI or that level of healthcare AI? Do you want to have your companies in your country be able to call on these experts or even more experts? So this is why I think there'll be a very interesting capital stock question. And when we look at things like 5 gs, the amount of AI spending has not caught up to the amount of 5 gs spending around the world. This is far more important than 5 gs when it comes to a country, productivity levels. You know, again, like, just think about it. Having an expert in your pocket, that's your Huawei smartphone. Having your hospital suddenly have all the latest stuff and being available to all the people in your community, that's a $50 type of thing, you know? 100,000 it scales. And the more you spend, the more competitive you'll be, the more productive you'll be, and more access you'll have. And again, I just I was just shocked by how many people were actually looking at this particular area and what was required for that. So I think if we go back to the start, 1 entity will need to pretrain base models. That's our common knowledge. Then you have sectoral datasets where you sectorally specialize in and localized datasets. So you can have the generalized model customized for Mexican law, Indonesian accounting, Japanese medical, and everything is completely transparent across that. And the amount of compute is primarily that pretraining at the top, but then require the hundreds to tune of equivalent.
Nathan Labenz: 1:19:52 So it seems like I guess with maybe 1 caveat that the LAMA models are not open data. It seems like you feel like they already have enough raw power to serve as the base for a lot of this fine tuning around the world. Do you worry, like, what happens if long before it doesn't get open sourced? And do you feel like alternatively, other end of that would be like, do you feel like it's just not viable because it's not open data and there needs to be a a true fully open data alternative?
Emad Mostaque: 1:20:26 Llama 4 is be amazing for a vast variety of things. I don't think it's suitable for a regulated industry. And again, regulated industry is typically the stuff for living. So it's just someone just needs to go out and build those and those standardized systems and release them MIT. And then that makes the world a better place and then enable access to those. So I think llama is fantastic. I think Quen is fantastic. I think where those end up in a few years, nobody knows. They're run by Alibaba. They're run by Meta. And again, those are corporations. I think they want to build infrastructure to teach all the kids, honestly. Like, eventually, you think about the infrastructure that will teach your child, an AI that's fully open and owned by you to act as a regulator that then calls into a Lava based service that can then call into an Anthropic or Google or OpenAI service, it'll be a mixture of models. Right? And you'll have the protective model there. And then your kid's ability might be that you have a base level. This is why we have this concept of universal basic AI, which I think is important. That's your base level, but then you might want more. And then you pay more for more. I think that's absolutely fine. So, again, I don't think that what we're looking at, what we're doing replaces Llabner. I think there needs to exist open weight models, and I think the 2 best are the 3 best are DeepSeek, Lama by Meta, and Quen by Alibaba. And I think those serve their purpose, but they will never release the dataset that underlies that because they're not trying to build infrastructure. They have a different objective function. Just like I don't think Apple will release all the data inside Apple Intelligence, just like I think Google won't release all the data inside Genma. And even if they did, like I said, it doesn't really align with the reasoning of the organization.
Nathan Labenz: 1:22:19 Yeah. So how do you this maybe gets a little bit to the crypto question or at least the, you know, the incentive engineering, if you will. You've got this concept in the intelligent Internet primer of proof of beneficial compute. And I'm curious to hear how you think this can best be organized or catalyzed so that these, I guess, mostly it would be countries around the world and and maybe a mix of other private institutions too if they want to. How do they all come to see it in as being in their interest to contribute to this project. That seems like a it'll be an amazing trick if you can pull that off.
Emad Mostaque: 1:23:01 Yeah. It would be nice, wouldn't it? I mean, I've been looking at all the business models again. The last podcast we had with yourself and others was, like, talking about business models and AI. And, you know, like, the stability experience was interesting. We took it up to tens of millions of revenue run rates in just a few years of others. We're still burning, but it was getting towards profitability. Got a bit tight at the end. I think the latest CEO said there was less than 90 days of cash, which is better than Apple was back in the day. But that's why I didn't do another giant round. So was like, where's the sustainable thing here? Well, I kind of took a step back and I looked at what I really want to do. Like, I worked on United Nations AI initiative on COVID 19, United Nations backed 1, a lot of things. I got into AI when my son was diagnosed with autism. I built an AI system, etcetera. I was like, how do you build infrastructure that's available to everyone? And what does that look like? And thinking about the future, 1 of the things that I've alluded to in this podcast is that the labor theory of productivity, which is a center of economics, is broken. And then the nature of money is probably gonna change because what does a optimist or an o 1 or other things need? Some sort of GPU inference based currency as it were. Some people have said that Bitcoin might be the future of these agents. Again, you've seen AI agents now go across crypto channels and other things. And I was like, if I've got a massive amount of compute coming online, and I know that, again, the amount of compute for your city's hospital is about to go from that to that, and I can build a standardized stack for that, why not use that to secure a brand new type of currency similar to Bitcoin, but with the properties of Ethereum? And we sell the currency and get supercomputers for health and autism and cancer and other things like that. And the more we help people, the more moneyliness it has. The more we get these distributed nodes running that, the more secure it is, because the amount of flops securing the network increases. This comes at a time of deregulation in The US and massive amount of additional assets because almost all of them are rubbish and full of raccoons. But you still have Bitcoin ETFs and even nations thinking about Bitcoin sovereign reserves. But for me, the sovereign reserve thing never made sense because that was always a balance of payments or commodities. I don't have enough oil. I don't have enough corn in my granaries. I don't have enough intelligence. So I was like, if I could build that as an institutional asset, and again, we'll finalize the details of that release in the new year, that could be something interesting whereby people could buy it, they could hold it. It's secured by increasing amounts of compute, and then you can allocate it so you can take your coins and allocate them to cancer research or Alzheimer's or autism or Mexican education. And then we give that universal basic AI or the research grants or others. That plays on this transition period. It plays on the structural demand growth for digital assets that are decent, of which there aren't really any. And it's a finite amount of capital that's required, not cheap, so hundreds of millions or billions. That means that everyone will never be alone again on their cancer journey. That means that every child will have an AI that allows them to achieve their potential. And I was like, that's a good deal. And so we should give this a go because I can't think of any other way to get people participating. I couldn't think of another way to get this technology out there without turning evil as it were. Because, again, you wanna have an aligned model where by helping people, you gain value. And it seems there's big demand for it, but like I said, bring the final touches. That's why the prime was like, here's the goal, and now it's about let's release details of how we get there and get people involved on that.
Nathan Labenz: 1:26:37 In terms of the incentive you wanna create, if the mechanism is as yet not fully defined, is the idea supposed to be kind of similar to a a Bitcoin where I'm, like, buying in now in anticipation of some sort of, like, compute appreciation in the future? Or, like, what is my individual or even national incentive to get involved?
Emad Mostaque: 1:27:04 It's Bitcoin, except for usually, what happens is it so, like, we actually think you're just copying Bitcoin straight, but with some adjournments. But rather than mining Bitcoin and doing sales because you want provision compute for open source effectively, and then you want to use it for positive stuff, be it inference to provide universal basic AI or building these datasets and models. This is why it's on advisory board of Render and others where there's millions of GPUs. Rather than selling it and it going to a treasury or going to Lamborghinis, you sell coins and then we will build the fastest healthcare supercluster in the world and make it available to everyone working in healthcare to build open models and release them at MIT. The more you build, the more you help, the more status you get. And like I said, there's no high status digital assets right now. Then what will happen, we believe, is that as you build the full stack, a nation can buy the reserves, and then you put the money back into the nation to put a cluster in that nation to provide this. And the economics works well. People can donate, and they get the tax write off, but then they control the direction of voting of the coins. So working through this, but we're like, this could be a way that you could get allow everyone to participate. They can direct the network compute for something that has defined amount of things. Let's just, at the moment, follow the Bitcoin mechanism. So 21,000,000 coins and everything. And that would be very positive because what we need is a lot of compute right now to build the models and datasets. And then we have an inevitability of compute for health care, education, all these other things, plus verified accounts. That's a very interesting setup. And again, it's in this particular area that we don't see anyone else in the eye doing. But it's like, we should have as soon as possible nobody being alone in their cancer journey anymore. Or when my son was diagnosed with autism, the feeling of loss I had, that should not be there for anyone. And that's the type of thing people can get behind. And I think this is where crypto has succeeded and failed. People forming communities and being able to participate, especially with the USD regulation will be a big boom. Having some sort of traceability over funds not going to allow those, clusters that are very well defined, that community organizations and academics can use is gonna be great. And then should there be any cost to everyone having that cancer AI? No. It should be free. Education should be free. And if we can externalize it with the desire of people to hold digital assets for appreciation or for status or for anything else, even better. So like I said, that's taken quite a few months to design. We're nearly there. The primer was the first bit, and we'll be releasing more details of all of this soon. I think it's complementary to what other people in the space are doing as well. I think everyone will benefit from the datasets. Everyone will benefit from these things. And the final bit, like, to them very worried about what money looks like over the next 5, 10, 20 years and how the economics of everything works when labor theory of productivity is just broken.
Nathan Labenz: 1:30:05 Yeah. There's a that's a dramatically under theorized problem, I would say, for sure.
Emad Mostaque: 1:30:10 I I'm using o 1 to help me write the paper on it right now, actually, for the new year. It it is crazy, though, because you think about The Philippines. Why would anyone hire a call center worker again? Anything about physical embodiment, you think about laborers, like, in 20 years, will a robot be a best plumber than any plumber? Yeah. And how much do you pay for your plumber today? It's expensive when the pipes burst. Right? That robot will be $2 an hour, but your own home robot will have all the knowledge of an expert plumber. Probably 10 years away, to be honest.
Nathan Labenz: 1:30:48 So I would participate in this network for multiple reasons. 1, and it's funny because I don't still really understand Bitcoin, I have to say. It continues to kind of confound me that it just runs up and up and up. You know, it doesn't seem like there's anything really there still yet other than future appreciation, and that's like a hard trick to replicate. So Yep. The incentives that you outlined are, 1, contributing to the public and common good. 2, some sort of voting control over how pooled resources are allocated. Yep. And third is appreciation. Although I I have to confess, I'm not quite sure. Like, is there a redemption mechanism that
Emad Mostaque: 1:31:34 No. It's a bit at the moment, with the way we designed it, it is like Bitcoin. So it appreciates for certain reasons of Bitcoin. Bitcoin's a hedge against the world of doing to crap. This is a hedge against the intelligence age disrupting economics, and the amount of flops per coin is likely to outpace that amount of flops per dollar because it will be provisioning more and more and more compute. You know? So that's kind of the theory. We'll see what it's like in practice, and we still got a lot of improvement to go here. But it does seem like, again, we're at this transition phase whereby something like this is unequivocally good, especially when you make it in those real terms. Like, again, is any podcast listener right now not want to have all the climate knowledge in the world organized and available to everyone or that cancer AI available to everyone? And if they can contribute to that and have participation in that with the upside and with the US administration changing to more positive, even better. A $100,000,000,000 has gone into Bitcoin ETFs this year. Again, order of magnitude to build these things, it's like maybe 1000000000 or something like that, which sounds crazy just to toss around, but you've seen what AI funding rounds are right now.
Nathan Labenz: 1:32:39 Yeah. 1000000000 is very I mean, at the level of countries, it's obviously extremely affordable. So
Emad Mostaque: 1:32:48 So the level of VCs, you know and again, the wonderful thing about this is once a model is trained once, once we get to that level of performance with that cancer model to help everyone in all these languages, you don't need to train it again. And I don't think we've ever seen anything like that. Right? It's like having infinitely replicable graduates or chefs, you know, or cooks. Sorry.
Nathan Labenz: 1:33:11 What technology aside from continuing to define the protocol itself, what technology is needed for this to work? 1 space
Emad Mostaque: 1:33:21 I've
Nathan Labenz: 1:33:21 been watching really closely is distributed training, and it seems like we've just seen, like, several different papers and proof points that show that that is going to work at least with, like, multiple large ish nodes. Not necessarily clear if it works at the level of, like, everybody contributing their spare laptop compute long term. But I'd love to hear your thoughts about sort of distributed compute and maybe any other technologies that you see as necessary that don't yet exist.
Emad Mostaque: 1:33:55 I think decentralized trading doesn't really make that much sense versus decentralized tuning. Having army's pages improve the datasets, but Deloco, Distro, and others have shown really interesting kind of advances. And, like I said, I'm on the advisory board of some of these, and the CryptoX AI space is now like 40,000,000,000. If look at something like BitTensor, that's a $12,000,000,000 network that's trading hundreds of millions of dollars a day for optimization. It's like the most on steroids Kaggle you can see. You have millions of GPUs in the other network available. I think, again, the data side will be more interesting. And again, this sequential stuff as we get o 1 type models that can run on my m 4 max or now m 2 max, soon to be m 4 max. Looking forward to it. Nanitexture display. The probably 1 area that is probably most important to properly decentralizing this and ownership and controlling governance is verifiable inference. And again, we're making some big advances in there. Hyperbolic and Allora and others are doing a lot of work around that so that then anyone can contribute their compute. But the data equation of this and building gold standard datasets is more important than the pretraining. For pretraining, just have a cluster and just train on that cluster. Like, it's not rocket science. You know how to do it. Like, will it be a lawnmower type model versus or transform model versus something hybrid Jamba style? Who knows. Right? But just make your pick. It's pretty straightforward to pre train. Host training and optimization, distributed stuff becomes very interesting there. The verifiable inference allows anyone to contribute their compute. And then it's about having secure computation for running these things in regulated industries, and that's some of the TEE work that we've seen. But most of it's coming together now, and especially, like I said, against the context of 2 other things, which were large context windows and function calling, like making it more deterministic on the outputs. And the final thing we just need is maybe continuous training for the individualized stuff. But, again, we've seen big advances in that. So it continues learning.
Nathan Labenz: 1:35:56 So on the verifiable inference, I did 1 and, again, I'm weak on crypto. I did 1 episode on this with professor Daniel Kong little more than a year ago now, I think. And he sort of framed the problem as, like, if you are paying for an API, you might want the provider, whether it's OpenAI or whoever, to have some way of guaranteeing to you that they are actually running the model that they said they're running as opposed to subbing in some cheaper knockoff or whatever and changing things underneath you. I get that. I'm a little less clear on what you have in mind when you imagine anybody contributing compute through verifiable inference is the idea that, like, I run inference locally and then I send up my, like, activations and the gradients are sort of computed centrally or something like that?
Emad Mostaque: 1:36:51 Yeah. Like I said, I think distributed training on millions of GPUs doesn't make sense, but sending a packet to you that contains Indonesian classical architectural law and checking that for consistency and rewriting some of the data there does make sense. Putting it through a llama 70 b or a b model. So I think distributed data augmentation makes sense. Distributed fine tuning makes sense. And then comparing those things. And then the mining operations as well. But I don't think distributed training makes sense or pre training, shall we say. So post training data augmentation. And for those, you really want to have verifiable, you know what the models are. Otherwise you might get junkiness as it were. Yeah. So I think that unlocks something. It's not required, it just helps. Like right now what we're looking at is literal h 100 clusters in various countries and how we can bootstrap that because they're inevitable. So you might as well put them there and use them for healthcare later. Right? And then that gives you enough to then tune models in every country and spin up teams and national champions, and then they can decide what they want as part of this network. Right? And h 1 hundreds are a lot easier to track than 40 nineties or m 4 maxes, etcetera. But if you can access that, then, like I said, you'll be able to build the better models even better because you'll build better datasets even better and faster. I think, again, this is not synthetic data. It's augmented, filtered, clear data utilizing the LMs.
Nathan Labenz: 1:38:22 Yeah. That's quite interesting. And that's basically the answer to the sleeper agent question too, I assume, which I was also wanting to ask. Like, how do we make sure that people are not poisoning the dataset from any number of different perspectives?
Emad Mostaque: 1:38:37 You put a massive amount of compute in putting a fine tune over all of it, checking it for consistency, adapting it as appropriate. Like, again, you can afford to go over the top because then you have a gold standard base. It's, again, like you have a good curriculum for your kid. You feel comfortable about that or comfortable about a mechanism and a methodology. So I think, again, this is a different type of AI challenge to the other people looking at, but I think this is the important 1 because it's what affects our lives in a much more direct way with a particular category of model. And this is why as well when I was looking at it, I was like, you can build tokens which are security tokens, and you can build things that have utility and redemption mechanisms. Ultimately, Bitcoin is the most successful currency. It's $2,000,000,000,000. It's moneyness. It's a hedge. It's these. Why don't we replicate that for an AIH? And why do we think about the currency for an AIH? But have this additional thing that I think is important of, I want to have people able to have some participation because many of the listeners here are like, how can I even get involved in this? You don't know how. But if you can say, I've got this and I've helped direct the compute towards cancer and it's helping people go through their journey on cancer by giving them the chatbot, great. Or breakthrough done in the cancer cluster, great.
Nathan Labenz: 1:39:53 So are you with the Intelligent Internet, the company? How do you understand your role in this whole thing? Are you, like, raising capital to create that 1 central cluster that will do the pretraining and then distribute that out to the national hyper nodes? Or what is the solution?
Emad Mostaque: 1:40:13 The plan is to basically the company to bootstrap this, get it going, and then we stack GPUs and we use them ourselves and give them away to build fully open source things, datasets, models, and systems. And that will be a big enabler. We saw this at stability where we gave away 20,000,000 a 100 hours, and it led to all sorts of amazing things from open file to RWKB to stable diffusion and more. People just need to have sometimes a bit of compute. So I think it's getting it going and then trying to make it distributed and decentralized within a few years and giving people control of their own AI. So every nation who should decide the education dataset for the people of India or Bangladesh or Pakistan, the people of India or Bangladesh or Pakistan. Likewise, who should build the generalized datasets for everyone? It should eventually be some sort of multilateral foundation or something like that, but it does require a bit of bootstrapping. So we're a team of, like, 26 people. We haven't done a funding round properly yet. Lot of demand for whatever coin we do. But I think this is the right way. But, again, we're finalizing all the details of that.
Nathan Labenz: 1:41:21 So you think you can get over these capital requirement thresholds with a coin type approach without needing to go to, a traditional equity market?
Emad Mostaque: 1:41:35 Yeah. The again, CryptoX AI is now $40,000,000,000 trading hundreds of millions of billion over $1,000,000,000 a day now, actually. We could just go that route or we might do a combined 1 where we do equity than coin than that. And I think that in doing that, you'd probably be the highest revenue generating AI company in the world because demand for high quality digital assets is only going up. Pricing power and APIs and SaaS is going down. And nobody's really looked at this. I mean, even for Elon and Sam Altman and others, like, how difficult is it for them to raise billions of dollars? It's not because they've got the pedigree and things. We've built hundreds of state of the art models of different types of team, and that's why many of them join me from stability. Having a liquid market for a coin where you can sell coins and literally say, this is a cluster that will just look at Alzheimer's. I think there'll be no shortage of people who want that. I think having the sovereign reserve aspect to this build up this reserve and we'll put all the money into AI for your nation and get the teams enough. I don't think there'll be any shortage of that. Again, time will tell but initial things have been very positive And it was again something I was thinking about when I left at the March. I was like, this feels more close to what I originally wanted to do. It's needed. No 1 else is doing it, and it's the right time with where regulation other things are. Because ultimately, crypto is a $3,000,000,000,000 industry that needs high quality digital assets that people can allocate to. And we need to have some sort of intelligent money and these other things. But it's not easy, you know? Like, we're trying something new, but it keeps it exciting.
Nathan Labenz: 1:43:16 It's a never dull moment in the AI space. That's for sure. Yeah. You mentioned a number of projects from stability, and and previously, you you made a comment along the lines of finding talent is is like the hardest thing or is often a limiting factor. Obviously, stability was in the news a lot for lots of different stories over the year or so before you left. When people would, you know, send those links to me, I always basically said back to them, I just watched their research output and stability has continued to matter, you know, certainly as of the time of those conversations with just, like, a bunch of really interesting good projects, some of which were, you know, extremely catalytic, like stable diffusion in terms of changing awareness and what people understood to be possible. Some were just, like, quiet contributions to the commons, like the lion dataset and specifically the aesthetic dataset, which is 1 that there was nothing else really like that. And that made a huge difference actually for 1 of the little projects I was doing in my company, Waymark. I also love the mind eye and mind eye 2 papers even though those are kind of it's not easy to or not hard to imagine a dystopian version of that in the future as well. But the, you know, the fact that mind reading works in today's world is pretty crazy. What would you say are your tips, tricks, heuristics, methods for identifying great researchers? Like, how did you assemble this team and create that capability, which I would say not that many different companies around the world have done basically from scratch in a short period of time without, like, certainly Microsoft level resources? What did you what do you look for? What do you see that tells you somebody is gonna make a difference?
Emad Mostaque: 1:45:07 I I think it was kind of treating the researchers like creatives. And so we first created the communities and, again, you know, did a bit of hype, maybe over hype sometimes. Yeah. You live in your land, but it was a crazy time. And so we have about half 1000000 people in the various communities that we kind of built, incubated, etcetera. We made it easy to operate, and we gave very fast grants to anyone that was doing promising research. So I basically kind of led and coordinated the research, and I've got a good eye for that type of thing. But I really looked for people that it didn't matter what the background was. Are they passionate, and are they willing to take a risk and try different things? So many of the research grants we gave, and again, can feedback, were like, try and do different things. It doesn't matter if it fails. Whereas researchers viewed as a cost center, generally. So you can't really take a risk. You know? When it worked, we put on the afterburners and we scaled it. And then you wanted to empower the teams. So I think last year, we had something like 20,000 applicants for researchers who made 120 offers 83 100 offers, 83% were accepted. And before the last in my entire time at stability, not a single researcher left for a competitor. And they were offered a lot of money because they enjoyed working there because they had the creative freedom, the adaptability. Obviously, they were concerned about all the noise, and that's very interesting things behind that. But they just got on with things because we enabled them. And we said, like they're like, this creating runners file is like, doesn't matter so much. What have you learned? How can you go from there? And we ended up achieving state of the art in image, video, 3 d, audio, protein folding, like small language, a whole bunch of different things. By having that mentality, you don't need that many people, and you don't even need that many PhDs. We had 14 PhDs out of the 80. I think things like fast AI and others are also great kind of grounds, and again, the willingness to try these things is where communities like Eluthor, which we incubated and spun out into a 5 0 1 c 3 and Lion were great because you had all these diverse talents kinda coming in. I think it depends on what you wanna do, though. Like, we were building instead of the art because we were like, we need to catch up, But I was like, I can't scale LLMs right now because there's no business model for that. I thought video and other things are the way, but again, then I realized that, like, right now, video is about to satisfy, so they'll only be like doesn't matter which 1 you use. It was very difficult to figure out the sustainability. So instead, we were just like, let's do the research and let's enable them and have I think the teams are happy. The rest of the organization maybe didn't do quite so well. It wasn't great to see you. At the very least, the researchers were happy, which again, think the proof was in the pudding. When it came towards the end of it, we spun out the Black Forest Labs team and some of the other teams. And I think, again, they were happy, and I think having researcher happiness at the top was very important.
Nathan Labenz: 1:47:55 That's certainly something you see coming through in the Sam and Elon emails as well. It's just the incredible intense focus on top talent.
Emad Mostaque: 1:48:06 Yeah. And I think, you know, what type of top talent? What are you trying to do? So we found the ones in the larger organizations didn't tend to fit in very well versus, again, the Mavericks because we were trying to push state of the art and try different things. There are so many projects that didn't emerge because because we made mistakes, not a mistake. It just didn't work. So stable diffusion was 2 50000 to A100 hours to train the original. It was maybe 5,000,000 in total of tests or something like that. And failed experiments etcetera. So you never see all those. See the other side of things. And, again, what are you trying to do? Like, we had a bit of a crisis debate because I like, are we trying to build AGI? Do we need to compete against all that? That was something I was thinking a lot about last year. Like, are we competing against DeepMind? Because we again, 1 route is what you've seen with David amid Jenny. You know, he gave the grant to help Jenny get going, and then he did everything else himself. I don't think he's taking money from anyone else. But he's like, I'm building a research and innovation labelling and making it sustainable to the abysmal. And we could have just focused on media, which is what stability is doing now. And given our role, that would have been far lower and cheaper to do. Could have done that. We were like, we had to compete against DeepMind and OpenAI and others. Maybe we would have succeeded or not, but then it became very confusing to do that. And I think the 1 thing that is important if you're trying to build a great team in AI is give a very defined function and then build teams that are passionate about that function and then give them the resources they need to do that. And then that will work very well. So that's probably the final missing piece which is a clarity of thought which is why it took a bit more time on the intelligent internet. Now it's very straightforward. If you want to have the biggest impact on autism, as a researcher, you will come and join me. I will have a dedicated team to autism building open infrastructure for autism. For government, for cancer, and others. And so that's why we're setting up this time. And then the monetization mechanism is very straightforward. And then hopefully, lots of GPUs will be coming their way to make them even happier. But you don't need that many, to be honest, compared to what many people say to have an impact.
Nathan Labenz: 1:50:14 Couple maybe less big picture questions that I I really appreciate how generous you've been with your time, and it it's it's been great to get a deeper look at your thinking as you spin up the intelligent Internet. You mentioned that you had signed the pause letter in, like, mid 23 when that came out, maybe right after GPT 4, the original 1. I sometimes call myself an adoption accelerationist hyperscaling poser. I don't spend too much time advocating for a pause because I do still think there's, like, some room at least to advance the state of the art before it gets, like, too crazy. And, you know, I do want those cancer cures, of course. Where are you now on if you see these sort of what people really need to live better lives either already or soon decoupling from what is going on to advance the state of the art. Where does that leave you in terms of should governments enforce a pause? Obviously, there's a collective action problem there too if they wanted to, but what do you think people should do or should what should the goal be for that sort of thing?
Emad Mostaque: 1:51:25 So when I signed that letter last year, I think I was the only AI CEO to sign it, actually. Elon signed it, but he wasn't AI CEO yet. I did it because I wanted, like, my base assumption was that if we got to AGI under my thinking then, would be like Scarlett Ehansson in that movie Her. You know, goodbye and thanks for all the GPUs because humans are kind of boring. That's the kind of narrow a AGI thinking. The reason I signed it was I was like, there needs to be more discussion about this. Just like right now, the amount of discussion about labor displacement and massive economic turmoil that will be caused by this technology is like nothing. Like, I talked to talk they're not even thinking about it. It's insane. Again, like an example of the Filipino call center sector is dead. Why would you hire another call center worker? And you look at swathes of industry and Devon and these other things like, that's gonna be bad. Now where I am is that it's too late Tavathor. Which is again why I said back then, let's get this out and maybe we move too far towards AI doomerism, killerism, and things like that. But we do have to again be very measured in the way we think about things. An example of this is copyright law and IP. So, you know, I got sued for $1,600,000,000,000 at stability by Getty. Even though we didn't use any of their images in the thing and we felt out all the watermarks and stuff. And in fact, original stable diffusion was trained by Robin and his team before they joined stability. So we made sure to have an arm's length. We didn't even look at That's kind of deliberate. Today in The UK, the UK government is advocating that you can they'll have an exemption where AI companies can train on any copyrighted material you can see. You know? And that's similar to Singapore. If you can legally access it, you can train on it, which kinda makes sense because when you have a Tesla Optimus robot, is it gonna close its eyes and its ears when it sees copyrighted thing? Like, oh, no. I'm hearing Taylor Swift. Let's turn off my ears. Probably not gonna happen. Right? But these are complex and nuanced topics just like pausing in others. So was like, that's the last chance we have to do that. Now the biggest companies in the world are AI companies. Microsoft, Google, Amazon are AI first companies. Jeff Bezos is back at Amazon. You know, Sergey Brin is back at Google. You know, Bill Gates, I don't know if he's back at he's back at Microsoft, I think, as well. Governments are trying to build their intelligent capital stock. They realize this is the biggest thing. You can't stop it now. You can't take time to think before the next wave of disruption occurs. And I don't think that governments really can afford to regulate like Europe is falling into its trap. Right? But again, people will just build models and have extra competitiveness where they can. Again, Singapore is a great example that The UK may follow because you can't afford to be left behind. You can't afford not to build the AI chips. You can't afford not to support. Again, I would use Europe as the way we want exception to that. And the final part of that is America. Under the new administration. Crypto and AI are gonna go full pelt under the administration. There's no way it won't. And you have to keep up with America. How else are you gonna come to manage it? And so it's gonna be very interesting to see how that plays out.
Nathan Labenz: 1:54:33 So if it's not a pause anymore, is there any other regulatory action that you would recommend? I mean, people are groping or grasping for anything they can get their hands on. Liability rules come up a lot. That was kind of at the by the end, maybe not so much the heart of the SB 10 47 debate, but at 1 point, it was more central to that idea was maybe we'll hold the foundation model developers accountable. But then, jeez, how does that impact open source, and how does that impact the relationship with local fine tuners and developers? Any thoughts on sort of who should bear responsibility for incidents of AI gone wrong?
Emad Mostaque: 1:55:20 Yeah. It's very interesting to see all these things evolve. Like, I've had some similar discussions to what Marc Andreessen said about how many AI companies can succeed, which is another reason to go decentralized after that experience. Gosh, there was a lot of very strange ones. Liability, these things won't work because, again, you're in a competitive environment, which from a game theoretic perspective means that you can't afford to lose competitiveness over a defined period of time, especially as the Overton window has shifted on politics now to kind of be more techno friendly after literally the US government was trying to strangle the sector or make it so only a few people can do it. I think that the biggest impact simplest thing is this. If AI is used in a decision making process in any regulated industry, it has to have transparent data logged because you are what you eat. These models are ultimately the data. Like, if you look at all these different architectures, again, we funded RWKV getting off the ground or MAMBA or any of these. All the performances are the same for the same dataset. It comes down to data. You are what you eat. And decision making AI is the ones that will kill us or make us successful ultimately. The rest is kind of helpful. And so introducing a level of data transparency over that is important because then, like, people will spot just like when we release stable diffusion and there was Lyond. They were like, oh, there's weird shit in here. Like, yeah, we don't want weird shit. You don't want to know about nuclear bioweapons, don't have nuclear bioweapons in the dataset. I think the other thing is building good standardized datasets for knowledge and other things. Again, you said you can't go and achieve an Alen AI, the level of scale AI and other things like that. It's kind of embedded in these models in g p t 4 and others. Building a gold standard of datasets for common knowledge and cultural knowledge just makes everything safer because why wouldn't you use those as your base? And again, think about models that the curriculum knows was going through high school, university, and then being specialized. Like, what are we really training when we've got trillions of tokens? What's in there? Have you looked in these datasets? They're crap. Build better base datasets that you can iterate on and then things will be safer. Have a simple legislation of any decision making AI in any regulated industry needs to have transparent data and things will be safer. Like, doesn't necessarily need to be interpretable, but we know that the AI functionality is a derivation of the base data. Beyond that, I can't see what other regulations will stick honestly, but 1 that I would highly recommend is regulations around speech data. Like we built a state of the art speech, well, we do not release. It is dangerous. I don't think people appreciate how dangerous it is, but let me give a practical example to people listening here. Imagine someone that you cared about more than anyone, you respected more than anyone. Replicate their voice and create a version that you can call them and that you can have discussions on Zoom with them, which is all capable now, how much would you listen to that AI versus the normal AI more than anything? And if that AI can then access Obama's speech patterns and Churchill and other things, that wave of manipulative speech AI is coming. We don't have the protections against that. And I think those should be legislated heavily and regulated heavily, especially when you think about our kids. Our kids literally have no defenses against that. So those are probably the areas that I'd look at.
Nathan Labenz: 1:58:47 1 of the random idea I've had and I haven't really developed too much is speed limits. And this is basically just the idea that, like, maybe individuals or organizations or whatever should sort of only be able to do so much inference per unit time or only use so many flops per unit time. And it's kind of like just to keep things, you know, operating at a sort of human speed. Maybe we need something like that. You're probably gonna have the same game theoretic objection though, I suppose.
Emad Mostaque: 1:59:18 You can't you can't do that. It's like saying you can only hire so many people. Right? Like, again, you will lose competitiveness in this increasingly competitive environment, and organizations others will come out of it. And what is the role of government, right? I mean someone said it's the history of the monopoly and political violence. Ultimately it's about protection of people and representation of people. And if this is the big leap up, you have people with AI, outperform people without AI, you can't really have these things except for when there's unions and others involved. So you look at your rate limits and you look at the longshoremen strike in The US, that is a rate limit. They will only allow a certain amount of robots. Is that optimal for a society then? No. It just so happens they have a physical barrier. And so we'll see those things maybe emerge in certain areas where you're regulated on the amount of AI that you can have. But if you're keeping a free market, which again, unions are counter to in some ways, then you can't have that because who knows what goes on inside an organization, right, and other things. And can you coordinate this globally? This is the bigger thing. Absolutely not. Like, again, you think about small nations, they can suddenly outcompete. Look at the role of capital here, like the Saudis and others of the world, they're like, once this stack is built, like, that's why DUA has gone heavily into cerebris. Like, yeah, build the silicon wafers. So I think that ultimately, you can just set good defaults, and you can use market forces and others for that. And again, push for data transparency in some of these edge cases like speech. You do need some protection because I haven't figured out, like, unless we wear AirPods all the time, how do you protect against that? You know?
Nathan Labenz: 2:01:10 Yeah. I mean, it seems like there's probably gonna be a more of those things too popping up that we don't even necessarily have any precedent for. It's hard to say, of course, what they are, but I certainly expect a lot more weird stuff to start to happen.
Emad Mostaque: 2:01:25 We live in interesting times, Yeah.
Nathan Labenz: 2:01:28 No doubt. When you talk about dataset filtering, if you don't want the models to know about bioinuclear weapons as you put it, you know, don't have that in the training data. Is that sort of akin to your earlier comment that you think scaling is gonna kinda level off, or is there there seems to be a relation there where it's like, if they generalize out of domain to a certain degree, then they'll fill in those gaps is kind of what a lot of people worry about. But sounds like you don't expect that as much.
Emad Mostaque: 2:01:59 Well, we embody these base models with a bit too much stuff even with the SFT. Right? Ultimately, they're just a bunch of, like, freaking weights. They're a bunch of ones and zeros. Right? It's not like a logical system. So you look at the composition of that, how it's built up, and you're like, I want a base level of reasoning model. Like, it doesn't make much sense that we have all the data in these models and knowledge in these models either. Like, that seems a bit wasteful. Like, how much these models know about they don't need to know? And you think about, again, a model that's a medical doctor, should it need any more data than a medical professional gets from kindergarten to medical school? No. I think Yalla Koon has talked about this in some of the JEPLI kind of things. Can it infer and reason? Yes, because it'll have that capability. But again, the question is, what are you releasing onto the world as infrastructure or services? Because you can control things at a server slow, infrastructure level is a bit different. Like, what base model are you release stable diffusion style? People use it for good things, they use it for bad things. But it was inevitable that you would have reasoning models like that. Right? And again, this is especially important when it comes to long context window models. Like again, 2,000,000, 10,000,000 is a freaking lot, and so reasonable reason that, he says, when Gemini o 1 comes in Gemini 2 or the equivalent lama version of that, yeah, I'll be able to figure out just about anything. It's inevitable. But in day to day, how many people are gonna load it with all the bionuclear stuff in the context window? And that's something that is relatively controllable. So I think that we just need to tidy up the stuff in the base models, and that's just not even a question of filtering. It's a question of how do we feed them. Like, what is the curriculum? And then once you have gold standard version of that, I think things become safer because there's less likelihood of them going initially rogue because there'll be a function of the data that goes in. Now once you make the magentic and able to traverse the Internet full of rubbish that it is, then maybe a different story. But, again, this is an incremental thing. There are no easy answers. Otherwise, someone would have answered already.
Nathan Labenz: 2:04:05 Oh. It sounds your basic vision on the safety side is structured access to state of the art combined with open data, open weights, open access, satisficing day to day usage models?
Emad Mostaque: 2:04:24 Yes. From a functional perspective, universal basic AI for everyone, get smarter and smarter, and we all agree on. That can be localized and specialized with the power going back to the people. Then personal AI will be driven by market forces with your Apple and Google and other intelligences. And then you will have specialized highly intelligent models. That's where regulation can come in, access can come in for the most capable models out there. But I think it'll be very difficult. So instead, I think what we should do is focus on if they're in decision making positions, at least have the data transparent.
Nathan Labenz: 2:04:59 That could be a great note to end on. Is there anything we didn't touch on or anything else you wanna share about the intelligent Internet before we break for today?
Emad Mostaque: 2:05:08 No. We haven't really pushed it out there. Have a read. I think it's a nice document along with I had to think about AI. We'll have a whole bunch of stuff in the new year. And, again, happy to see feedbacks, iterative process. But, you know, it's been a lot of fun.
Nathan Labenz: 2:05:20 Alrighty. Well, we'll certainly keep watching, and maybe I'll be lucky enough to have a chance for another conversation once you've finalized all those details and and put that out there. For now, I'm Madhav Moustak, founder of the Intelligent Internet. Thank you for being part of the Cognitive Revolution.