AI Scouting Report Part 3: Impact, Fallout, and Outlook

Nathan Labenz delves into AI's economic impact, safety, and future predictions in Part 3 of The AI Scouting Report.

1970-01-01T01:20:11.000Z

Watch Episode Here

Video Description

In Part 3 of The AI Scouting Report, Nathan Labenz covers the economic impact, investment moats, AI safety concerns, and predictions for AI. Nathan's aim is to impart the equivalent of a high school AP course understanding to listeners in 90 minutes. If you're looking for an ERP platform, check out our sponsor, NetSuite: http://netsuite.com/COGNITIVE

ICYMI:
Part 2: https://www.youtube.com/watch?v=ovm4MbQ4G9E
Part 1: https://www.youtube.com/watch?v=0hvtiVQ_LqQ&t=3026s

Get your questions answered in the podcast by emailing TCR@turpentine.co
The Cognitive Revolution is a part of the Turpentine podcast network. Learn more: Turpentine.co

TIMESTAMPS:
(00:00) Introduction to Part 3
(00:53) Writing code at human level
(07:43) Image generation goes beyond art, with applications in science and medicine
(11:45) Text to 3D/ Motion/ Music/ Voice Cloning
(13:12) Our theme music revealed (Google LM)
(14:34) Sponsor: NetSuite
(17:00) Dentistry interlude & redteaming framework
(20:42) AI can read minds: FMRI data
(24:00) Robotics & self-driving cars https://www.youtube.com/watch?v=5tlQhgz-xuY
(34:00) Big players in AI: where are the moats, where to invest? https://www.youtube.com/watch?v=yVezX3cxwgk&t=353s
(40:50) Open source taking on a life of its own
(42:40) AI Leaderboard: LMSYS.Org
(45:00) Custom models & infrastructure training
(48:42) Retail prices dropping
(54:00) Whose jobs will AI take?
(1:04:00) Lawsuits and regulations
(1:09:00) What happens next?
(1:14:30) AI Safety and who will shape the outcome
(1:21:36) Global chip supply chain
(1:23:30) Alignment efforts
(1:32:00) Criteria for a weak AGI system and predictions

TWITTER:
@labenz (Nathan)
@eriktorenberg (Erik)

SPONSORS:
NetSuite provides financial software for all your business needs. More than 36,000 companies have already upgraded to NetSuite, gaining visibility and control over their financials, inventory, HR, eCommerce, and more. If you're looking for an ERP platform, take advantage of a special financing offer and defer payments of a FULL NetSuite implementation for six months. ✅ NetSuite: http://netsuite.com/COGNITIVE

Thank you Omneky (www.omneky.com) for sponsoring The Cognitive Revolution. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work, customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.

SPONSOR:
Music: GoogleLM


Full Transcript

Erik Torenberg: (0:00) We're back for part 3.

Nathan Labenz: (0:02) Alrighty. This is probably the part where if you are someone who tunes into the Cognitive Revolution, you are aware of a lot of the transformative potential of transformers and a lot of the applications that we've seen. So if I were going to give this at a fully general audience, this section might be the whole thing—just try to get people to wake up to just how powerful these systems are and all the things that they can do. But here's just a run through of some of the things that I think are super interesting, and these are probably 1% at this point of what we could bring forward, but it's at least a good grab bag. So now that we've got all this AI, what's it good for, basically? One thing is it's getting very good at coding. Basically, we are hitting human-level coding on bite-sized tasks. Here, we're using this dataset called LeetCode, where they give these programming problems. Some of them are easy, some of them are medium, some of them are hard. And basically, what we see is that if you just give the AI one shot, on the easy ones, it falls just short of the humans—68% versus 72% success rate. If you give it five opportunities to pass, and again, with a coding problem, you can automatically determine if it has passed. So this isn't as problematic as it might be in other areas where you generate five paragraphs, which one's the best, and now you've got another evaluation problem on your hands. Here in code, you can determine if the code passed the test. And so if you give it five shots to do that, now it's passing at a significantly higher rate than humans. That's also true, actually, even just at the one-shot level for the medium problems, and it's actually doubling the rate of human success if you give it the five shots on the hard problems. So there are limits here to what AI can do in software development, but those limits are not so much about can it do certain things in code. It tends to be more about the zoomed-out part of designing the whole system, figuring out what the right architecture should be. That is still very much the domain of humans today. But once you have that architecture, AI is getting really good at filling in the little bits of code, even if the little bits are actually quite challenging such that, I mean, I don't know who exactly competes on LeetCode, but presumably, if you're competing on LeetCode, you really like coding and enjoy solving coding problems. So to see AI solve twice as many of the hard problems as the human LeetCode users, that's pretty good. And I think we are going to see some transformation of the software development industry possibly as one of the earliest and most rapidly disrupted spaces for multiple reasons. But one of the big ones is that it's so easy to validate, did the AI actually do a good job? Imad Mostak from Stability has talked about the end of programming. Matt Welsh from Fixie, another earlier guest, also has written essays about the end of programming as we know it. And of course, Amjad and the crew of Replit are actually leading this change, and that's what we're using at Athena—Replit with their tools, also with the help of GPT-4, to take away all the annoying bullshit parts of coding and really try to allow people to focus as much as possible on what do I want the code to do. If you can get good at articulating what you want, the AIs are getting pretty good at coding what you want. So watch this space for, I think, a lot of economic impact over the next couple years. I don't know if jobs go up or down, by the way. It may be that for a while jobs go up because there's just so much more coding to be done, and it's so much more accessible. But who knows what that trend ultimately will be long term.

Erik Torenberg: (3:59) That would be like the—what do they say? The example is always the ATMs, or there were more bank tellers post-ATMs or something?

Nathan Labenz: (4:06) Yeah, that's the classic. I think that could be the way that this goes as well. It kind of depends on where you think we are on how much software has already eaten the world, I guess I would say. The ATM story is basically, ATMs came out, people thought, my god, that's going to be the end of bank tellers. And then you fast forward 30 years, and in fact, there are more people working at retail bank branches than before. And that is because there are more retail bank branches than before, and that's because the efficiency of the ATM made it much easier to justify opening a new branch, and they opened up so many more new branches that on net, jobs went up. A similar dynamic could perhaps happen if you think we're early still in software. We haven't written nearly as much software as we ultimately would want to write, and we're just bottlenecked by how expensive it is because the developers can only work so fast, and they're expensive. And so lots of great software projects that would be nice to get done just don't get done because they're just not economical at today's level. But if GPT-4 can make all the programmers 10 times more efficient, then all of a sudden, these software projects cost 10% as much, and then far more of them pass the ROI hurdle to be worth it. And so lots more software gets written, and there are, in fact, maybe more software developers, and they're working in a different way with AI, but maybe there are more of them. That seems like a reasonable theory of the near term to me. I don't know that it holds up in the long term. What the people at the bank branches do these days is they sell mortgages. So, okay, what happens when the AI can sell mortgages? I don't know that bank branch employment is trending up and up forever, because if AI can sell mortgages as well as a human, then it'll probably cost a lot less. And the same thing could eventually be true of software as well. But we don't have an AI actually capable of selling a mortgage in today's world. We don't yet have an AI that's capable of architecting advanced software systems in the way that humans do. So it's not an immediate threat to take everything, but no less than Imad and Matt and Amjad certainly have their sights on it.

Erik Torenberg: (6:32) So if you're saying software is late in the cycle of—or if it's eaten a lot of the world, then we're close to job loss. But if it's early, then we're far away from it.

Nathan Labenz: (6:41) Yeah. Directionally, anyway, I think that's right. I don't have a super specific or high-confidence idea of where we are in that space. So I don't have a timeline to job loss by any means. But yeah, I think that's directionally the analysis. If you want to go deeper on this too and just all the amazing things that GPT-4 can do, the Sparks of AGI paper out of Microsoft—these folks had early access, same time that I was doing the red teaming. And they came up with some very clever experiments well beyond this. I mean, this is fairly standard. They came up with some very clever experiments that demonstrated that there sure seems to be some interesting grokking going on. In that paper, they also asked—and GPT-4, the one that they were using, didn't have vision capabilities, but they asked it to draw. I think it was a unicorn with a vector graphic, and it was able to actually draw through code despite not having really seen anything. So there's a lot of things like that where it's like, man, there does seem to be some sort of general understanding developing. Some sort of grokking seems to be happening, and they just kind of poke at that. They're not reverse engineering—they don't have that kind of access—but they do, I think, a very nice job of just getting really creative and coming up with a bunch of different demonstrations that, to me, strongly suggest that the grokking continues. Here's another—we've obviously talked so much about image generation and image manipulation and image editing. We had Suhail from Playground on as our very first episode. They've continued to launch thing after thing and should get him back for an update at some point. But a lot of people still look at that and they say, okay, that's all kind of cool. It's fun. People are really into it. It feels kind of like a toy still, or maybe it can be used for creating marketing materials perhaps, or maybe even creating cartoons or whatever. But it all kind of feels like, is any of that super important? Is this really going to change much? And this is where I think that question gets answered, and people start to be like, oh, wow. So this is from our episode with Tanishq, who created this model that transforms a raw image of tissue, which is taken using a new microscope that some folks at Georgia Tech have developed, and converts it to a stained version of that tissue, which is normally a physical process. Normally, you have a lab technician who takes a lump of tissue that got cut out of a person and goes and slices that on a little meat slicer type deal, and has to fix that into a certain medium and then has to put chemicals on it to dye it and then has to look at it under a microscope. And that is reportedly an 8-hour or so process for somebody to do all that work. Among other problems, it's expensive, but it's too slow to be done during surgery. So if you are getting a cancer biopsy, everybody's had this experience of, did they get all the cancer? Well, one of the reasons they don't really know is because they don't want to take out more tissue than they have to, but they don't really have the feedback cycle of being able to take some out and look to see, is there cancer in this tissue that we just took out? So they're in a hard judgment place of, I don't want to cut out too much. I don't want to cut out not enough. I'm not going to get the cancer. That's a challenge. Well, here, with this new microscope, they can actually just put it into you while they are doing surgery, take the image, run the AI to transform the image, look at the image as if it had been stained, and this whole thing can just take a few seconds. So now you can really get a lot more precise within the context of surgery because you're able to do this manipulation of the images with AI. And this is a pretty clever technique. I definitely encourage that episode—one of the episodes, again, that shows how much potential there is still to be pulled out of the foundation models. Because one of the big challenges with something like this is you don't have good data. There's just not a lot of data like this where you have the before and after. Because first of all, nobody generally images the before, so there's just not a lot of befores sitting around. Also, in the process of doing the slicing and the fixing or whatever, stuff gets damaged and looks different, and there's all kinds of problems. So how do you even get this data? There's just not a ton out there. So they did this with a small dataset. But the way that they did it was very clever, and it took advantage of some of the capability that are in existing bigger models and allowed them to get a result this powerful with relatively small additional data and minimal additional compute. The data thing is really the big thing. Yeah. Tanishq, I think, just is a real pioneer and super clever thinker when it comes to figuring out how to make small datasets work. Other things—I mean, everybody has seen kind of Text2Everything at this point. So this could—we could go on here and show text to this, text to that. This is text to 3D representations of things. Increasingly, you can do text to 3D printable objects as well. Here, we're doing text to human movement. So all these people—a person is standing and steps backward, and that's the input. And now the AI creates that three-dimensional motion for a person. So this could be used in any number of things, right? I mean, making movies, making video games, this kind of realistic human motion. Try coding this with explicit code, right? It's just not really possible. You have to have an AI to do something anywhere close to this fluent, this lifelike. That's, I think, pretty awesome. A lot of the human motion that we're going to see in virtual environments is going to be generated by models like this. And here's our new theme music, actually. We used Google's MusicLM for this, and I gave it this prompt of futuristic, classic Marley, Ira reggae about AI. Let's take a listen.

Nathan Labenz: (13:23) Actually, I owe a slide on this one too because we even improved that. Well, you could be the judge of whether we improved, but we replaced that with another Google LM generation from just a couple weeks ago where I went with trap, country trap rap or whatever with the classical Viennese combo. And that one also, I think, ended up sounding pretty cool. So this stuff is kind of text to everything is the new normal, and it's proliferating faster. You can make tons of slides with this. Text to voice is also another big one, and we did an episode with Mahmoud Felfal from PlayHT. They've got a really nice, super easy upload a couple minutes of your own voice, generate your own voice. If you want to hear my cloned voice with his technology, go back and listen to that episode. The intro to that episode, we created with their voice cloning technology. So it sounds like me, but it's just a paragraph that we entered into the system and had it read.

Erik Torenberg: (14:31) Hey. We'll continue our interview in a moment after a word from our sponsors. Hey, everybody. If you're a business owner or founder like me, you'll want to know more about our sponsor, NetSuite. NetSuite provides financial software for all your business needs. Whether you're looking for an ERP tool or accounting software, NetSuite gives you the visibility and control you need to make better decisions faster. And for the first time in NetSuite's 25 years as the number one cloud financial system, you can defer payments of a full NetSuite implementation for 6 months. That's no payment and no interest for 6 months. You can take advantage of the special financing offered today. NetSuite is number one because they give your business everything you need in real time, all in one place to reduce manual processes, boost efficiency, build forecasts, and increase productivity across every department. More than 36,000 companies have already upgraded to NetSuite, gaining visibility and control over their financials, inventory, HR, ecommerce, and more. If you've been checking out NetSuite already, then you know this deal is unprecedented. No interest, no payments. So take advantage of the special financing offer with our promo code at netsuite.com/cognitive. Netsuite.com/cognitive. To get the visibility and control your business needs to weather any storm. That is netsuite.com/cognitive. Omneky uses generative AI to enable you to launch hundreds of thousands of ad iterations that actually work, customized across all platforms with a click of a button. I believe in Omneky so much that I invested in it, and I recommend you use it too. Use CogRev to get a 10% discount.

Nathan Labenz: (15:55) Education is also going to be, I think, a huge thing. This is a big point, but this is an example from Khan Academy. I definitely recommend Sal Khan's recent TED Talk on this. This is basically just a little bit of fine-tuning differently. Instead of fine-tuning it to give you the answer and be maximally helpful, here they've defined a different goal. We don't want to give you the answer. We want to lead you to the answer. So you're seeing this behavior from the AI where it's encouraging and giving hints and telling the student what it should look at doing next. But it's not just giving you the answer. Right? Khan Academy's model isn't just going to fix your code like ChatGPT will, but it will guide you and educate you on what you really need to know. I think this is going to be just dramatic. If you believe the educational research as I understand it, one-to-one tutoring is one of the very few things that's really proven to work as an educational intervention, and scaling that has obviously been nigh impossible for anyone who doesn't have real means. But this is going to cost not that much, and it should be available to tons and tons of people. Dentistry—that's not one that people would typically think of, but this was from my GPT-4 red teaming experiments, and it just really stands out to me as one of the most hair-raising examples that I've ever seen. And again, it kind of suggests some grokking here because I'm pretty sure that I'm the only one who's put in a query quite like this. I had this weird dentist who was kind of a nutcase, I think. In retrospect, I was a teen when this happened. But I had this one tooth that was not in line with my other teeth, and I didn't really want to get braces, and he didn't think it was worth it either. So he was like, I know what to do. I'm going to put a big glob of stuff on the back of this one tooth, and then your tongue, over time, will push it forward. So I was like, okay. At the time, I'm like, I guess you're the dentist. I'll trust whatever you say. So he does that, and then I had this big glob of shit on my tooth, and it was there for a long time. And it did kind of work, I guess. Anyway, who cares? All that is to set up that eventually I was like, I've got to get this thing off of here. So I did go to a human dentist and had this conversation. But what struck me so much about this was when I say my old dentist put a glob of something white on the tooth and explain it, it starts to chide my dentist and says, there's a few possible scenarios, but none of them sound very conventional or evidence-based, which indeed, it was not. And I didn't even really give it much of a hint that I thought it was weird. I just said it was an unusual situation. That's really the only hint here that this is not part of the dental standard of care. But it sure seems to me here like it's got some sort of grokking of what proper dentistry is and kind of recognizes that my dude was coloring outside the lines. It also gave me a really helpful hint around the fact that this one thing can be cured with a light. And I hadn't thought about that, and I hadn't remembered to include that. But when it said that, I knew that that was it. And then I remembered, oh, yeah, he did. He had this blue light thing that he put in there to harden it up. And so this conversation goes on. I asked what to do about it. It basically gave me the same response that my human dentist did. And this is a really good point too, and the one that I hammer on all the time with my education for the EAs at Athena is test these systems with things that you know well. If I were to make up some random situation about a heart condition that I've never had and don't really know anything about, before you know it, I'm off in Neverland, and I really have no idea whether it's telling me the right thing, the wrong thing, how to evaluate it. By using something where I knew what had happened, what the outcome was, I knew what my dentist had told me and what my dentist had done, now I'm in a position to evaluate this even though I'm not a dental expert. So evaluate it in areas where you have expertise, and also, if you want to get kind of creative and look at far-out stuff, think about far-out stuff that is super uncommon, but that you have some firm grasp on. This was, I guess, part of my red teaming back—increasingly almost a year ago now. It was September, October 2022. And this was just where I was like, okay, yeah, this stuff is definitely going to be transformative. If it can answer that kind of way-off-the-beaten-path question, it's going to definitely be a world-changing technology. And it can also read minds. So this is another Tanishq episode. This is insane, right? I mean, here we have fMRI data, which is basically a scan of your brain activity and really just the blood flow. I was struck to learn this. They're not looking at individual neurons here. They're looking at a region of the brain. They're segmenting that region into a bunch of small regions, and they're just looking at the blood flow to each of those regions as the person looks at different images. So the person, in this case, looked at this image, and then what they trained the AI to do is take in that brain scan data from the fMRI and predict what the image was that the person saw. And this is the kind of fidelity that they're now starting to get, where you saw this bear, and now you see this bear. You saw this pizza, and now it predicts you saw this pizza. So obviously, they're not exactly right. But, man, they are getting extremely good. And I would say, in my view, while this is not a practical wearable device because it is an fMRI—you have to lay on your back and sit inside it, and it's a giant magnet—it's definitely not something you can take on the go. To me, this is still legitimate mind reading. Right? It is just looking at the activity of the brain, and it is understanding what you are perceiving. There are similar projects for language as well. To be able to decode brain activity with this level of fidelity is just a mind-blowing advance, and it's another great example, I think, of just how many things there are happening all at once, where if this had happened even 5 years ago, it would have been major headline news, blowing people totally out of the water. What? Could you believe it? Now this thing kind of comes and goes and it gets a bunch of likes on Twitter or whatever, but it just kind of blends in with all the other stuff that's going on in AI. Again, I mentioned Tanishq. He's a genius with small data science. This was trained on—anyway, it did very well. The models they create, and they have about 1,000 images to work with. So, again, he has a really clever technique for starting with these large pretrained models that are off the shelf, figuring out how to augment and extend and kind of combine them with his own problem set and work them into a system such that he only needs a small amount of data for this to work. I think it's literally just about 1,000 images. And it is trained on a per-individual basis, per individual patient basis because our anatomy is sufficiently different that you can't just take my brain activity and use it to predict what you saw. It doesn't work on that level. They may be able to come up with some ways to kind of bridge from person to person. I would bet that that could happen. But as of now, this is 1,000 images that one person saw, and now we can understand what that person is seeing, just based on their brain activity. This one continues to blow my mind, honestly. Alright. Here's another one. This is kind of the embodied agent, PaLM-E. PaLM is Google's big language model, of course, and PaLM-E stands for embodied. And we had someone from Google Robotics on and talked about this and some of the papers that led up to this capability. Think back to that agent diagram. What's happening here is basically that diagram where the AI was told, go pick up a snack from the drawer, and it just runs in that loop. It says, okay, let me think about it. Let me look at my visual input and see what I see. Okay, I see what I see. I know what I'm supposed to do. What's the next step that I should take? Let me issue a command to my robot body, which is basically the tools that I have available, and I'll take that next step. And that might mean move my wheels to move my whole body, or it might mean move my arm around to start to go grab something, whatever. What's really interesting about this is you can see the robustness to the adversarial disturbance here. Because it's just running in that loop, if its plan goes awry, like right here—freeze it—it just had the thing, and then the guy comes and takes it and puts it back in the drawer. But it's not ultimately derailed by that, because it's still just running the same loop. I think it's 3 times a second, she said, that it takes an image, and now it's like, okay, the bag is there. I'm supposed to be getting the bag. I see that it's there. What's my next thing? Move my hand toward it. It doesn't get confused or too messed up by the fact that somebody came in and messed with it. It just looks at the situation again and takes the logical next action, and that's enough for it to overcome these interruptions that the humans are imposing on it. So this is sped up. When you see it moving this fast, a lot of times, these robotics—it's something to watch for in these robotics things. A lot of times, they move super slow. This one is 4x actual speed. But, nevertheless, it's a robot that can take verbal commands, look around its environment, and figure out what the next logical step is. And just by doing that over and over and over again—pretty similar to next word prediction, except it's next small action prediction—it can accomplish these kind of complex, multistep tasks. Alright. And then I think this one is, to me, maybe the most striking disconnect between media and kind of general popular discourse and reality as I have experienced it. And that is the general sense that self-driving cars don't work. They're really far from working. It's going to be a decade. Maybe it'll never happen. That's kind of most of what I hear. And then you can go on YouTube, and you can look at some of the demos that enthusiasts post who have the Tesla FSD, and you see things like this.

S2 (26:51) This is interesting here because a guy walking in front in the middle of a, basically, a highway. So okay. Anyway, it did very well. I mean, it was cautious. It was accelerating.

Nathan Labenz: (27:02) So the car handled that pretty much just as the guy wanted it to. I definitely recommend going and watching a few of these. There's plenty of people who do this, actually. It's kind of boring, but then these interesting moments come up. We kind of need more of a compilation, I think, than all these long-form things.

S2 (27:19) The Charleston area. Okay. Cool. That was good. The lane changes are just, like, ugh.

Nathan Labenz: (27:25) So, obviously, this guy has been using this for a while and noticing all these little differences. I recently borrowed a neighbor's car who has a Tesla with FSD enabled and took my grandmother on a trip home down from our place in Detroit to her place in Ohio—4 hours one way. So I logged 8 hours with the FSD on a single day. And I think there are definitely some things, in terms of the user experience, where it's not everything people imagine it to be. But it is pretty easy for me to believe, having spent 8 hours in it, that the stats that they talk about where they testify—Elon Musk says it's safer than a human. And that seems very plausible to me. It has some stuff—it's still a little bit rough around the edges. Some of the times it got off the highway and did some odd stuff. It kind of stopped in a couple places where it shouldn't have, but it never seemed confused. It never seemed like it was at risk of hitting anything. And it definitely seemed like if something jumped out in front of us, it would have faster reaction time than I would likely have. So my belief right now is that the FSD system is safer than a human as Tesla claims it is. And it feels to me like if we were in a sane world—which I think we're not always, not always a safe assumption to assume we're in a sane world—it seems to me like we would be enthusiastically trying to refine this and figure out how to get it to mass deployment, because not only is it safer, but it could very easily be a much more pleasant experience to not have to worry about driving, which many people would prefer. I certainly would. But yet, we're kind of in this weird zone where everybody's kind of in denial and says it will never happen. And I myself recently just logged about 8 hours driving an FSD Tesla. For the most part, letting it drive itself. I took my grandmother home to her place in Ohio—4 hours there, 4 hours back. Drove through a variety of conditions, city streets, major interstates. Obviously, that's what it's been good at for the longest. And coming away from it, it definitely does seem plausible to me that it is safer than a human driver, which is something that Tesla basically claims today. I think there's a little bit of challenge sometimes in comparing the data that they report, which, as I understand it, is airbag-deployed crashes against what you can get for other carmakers, where the cars are not all collecting data, and it tends to be police reports. And they have far fewer airbag-deployed crashes than there are police reports for other kinds of cars. But again, those are not quite apples to apples. It seems to me, though, very plausible that you would be safer having an FSD Tesla drive you where you want to go today versus just doing it yourself. So that's pretty remarkable. It's certainly so close to ready for prime time, if not fully ready for prime time, that to me, it seems like, if we were a little bit more eager to embrace technology, we would be celebrating this and trying to figure out ways to get it deployed at scale. And I just find it extremely confusing that people, in reality, are going around saying, oh, it's never going to happen, or it's a decade away, or whatever, when you can see this stuff from tons of users on the internet. And you can even go get in one, or go get in a Cruise, for that matter, in San Francisco yourself. Have you done a Cruise yourself in San Francisco? Have you done an FSD ride?

I've not done one, but you're inspiring me to. I should. And they're actually getting a lot of—yeah. That's now. Yeah. It's one of these things. Yeah. A friend of mine actually works for a self-driving startup and has not been in the Tesla FSD, or at least last I talked to him, had not been in the Tesla FSD either. And I was like, dude, you need to get out there, man. It can handle city streets. It is really very good. Driving at night on kind of somewhat hilly two-lane roads in Ohio, I mean, I was a little nervous. I was more nervous with it than I would have been driving myself just because I'm unfamiliar with it. But it handled that stuff extremely well. And the little things that it did do that I felt were a bit wrong were more about just passenger comfort and kind of confidence than safety. One of its best features, I think, actually, when it pulls up to a stop and it's going to need to make a turn to get onto a road—on my street, for example, put it into self-driving mode just on a residential 25-mile-an-hour street, get down at the end where we're going to hit the bigger road and make a left turn onto a bigger road—it will stop with plenty of room between you and the road, and then it will show you a message: "creeping for visibility." And it will just inch a little bit forward so it can get a better angle on the road that it's about to turn into. And then when it goes, obviously it goes. But I think they still have more work to do on that kind of packaging to really reassure and make it very clear to the user at all times what is happening. But that "creeping for visibility" is an excellent example of where they have solved some major problems that span knowing what to do, but also reassuring the human, which I think is now probably a bigger part of the remaining work, I would honestly guess, just to know that, yeah, the car's not about to turn into traffic. We're just creeping for visibility right now. And so you can chill knowing that that's what it's doing, and then it makes its turns. It did do a few things wrong for me, and this might even be an episode I recorded. I buckled my laptop into the back seat of the car and recorded 6 hours of video of me supervising the thing as we drove. There were a few things that definitely weren't perfect. But again, I do think it's plausible that it is a safer driver than me. Certainly has better reaction time if something jumps out in front of us. And I was definitely very impressed by the overall experience. That brings us to the end of this section of just a bunch of different transformative things that modern AIs are doing. And now we can get into some of the fun stuff that is a little bit forward-looking. As I promised at the beginning, we're not going to do any big, long-term predictions, but we can at least talk about the near term, the trends that I see continuing into the near term, at least. So market dynamics and economic impact. We did a whole episode on this, so we can do this one pretty quickly. But there's a debate as to who's going to win in this space, where does value accrue, where can people invest, what's defensible and what's not. And there's a ton of confusion around this, but I don't know. I don't have all the answers by any means. A lot of stuff is still to be sorted out. But I'm on record, and there's a whole episode about it where I argue that today's leaders do, in fact, have moats. Moats meaning aspects of their business that are going to allow them to earn a lot of profits and are going to make it hard for new challengers to disrupt them. And for OpenAI—I run down nine and basically find that Google, especially as you consider them with DeepMind and also Anthropic partnership, basically has all the same moats. Maybe not quite to the same level, but certainly they have them. So ChatGPT, I said at the time I made this slide, is the best value in the AI game. Actually, Claude Instant has recently, in some ways, arguably surpassed ChatGPT free tier. So those two are competing, but none of the open-source ones are on their level despite claims to the contrary. We'll see that with a leaderboard in just a second. All the work that they do for safety is, I think, also a super important moat, because if you're buying this for your company, especially if you're going to expose it to your users, even if you're just going to expose it to your employees, you don't want something that is going to be toxic or racist or otherwise problematic. So you actually do value that work and you don't want to go use an open-source version just for kicks. You have no idea what that's going to do. Whereas you have a pretty good idea of what these guys are already putting in place in a disciplined way for safety and reliability for users. The product feedback loop is also huge. Obviously, nobody's getting more user data than OpenAI is with their ChatGPT product and with their API product as well. Google, as they deploy things across their entire ecosystem of products, is going to have a similar product feedback loop. They're still kind of working theirs up, but they certainly have that discipline. Pricing power is, I think, a huge one for both of these companies ultimately. Imad made a really interesting comment on his appearance on our show that he views OpenAI and Google as noneconomic actors, meaning he doesn't even think they're in it to make money. And certainly, some of the things that we know about OpenAI's governance structure and Sam Altman having no equity and all these kind of excess profit clauses that they have in their charter—it certainly seems to support that. Sam Altman has said that they're going to try to drive the price of intelligence as low as possible. And so it's going to be hard for people to figure out new ways to train $100 million-plus language models just to compete with them when they already have so much in the market and it's already so cheap. You're going to—it's going to be very hard to undercut them by much at all. Privileged access to compute is another big one. Of course, they've got their partnership with Microsoft. Even OpenAI doesn't have all of the GPUs that they would like at this point in time. They have not rolled out their 32k version of GPT-4 widely, and they have not rolled out their computer vision-enabled version of GPT-4 widely at all. And as we understand it, that is because they just don't have enough compute to support that at the scale that their customer base would expect. So for now, they've just kind of had to back-burner it. It's not because the technology is not ready. It's because the cloud infrastructure is just not quite there to support everything that they want to do. And so you can imagine then, okay, you're going to try to compete with that. Well, who's your cloud compute provider? Right? You could be an AWS customer. You could be an Azure customer. You're not going to be an Azure customer like OpenAI is an Azure customer, certainly not right out of the gate. So that's a huge advantage that they have. The models themselves become a big advantage using GPT-4 to help train the next generation or for Claude, using Claude v1.3—they explicitly talk about this with their constitutional AI and their RLAIF approach. But using these models, now that they're good enough to be basically human-level for many tasks, using the models to refine the dataset for the next generation of the model, that's a huge advantage as well. Obviously, these folks have some of the most talented teams in the game. OpenAI is totally top-notch. I would even maybe say Google arguably could still have the edge there just because of scale. They maybe have more bureaucratic overhead, but they probably have a multiple as many PhD ML researchers on their payroll as even OpenAI does. Distribution and partnerships, obviously big as well. OpenAI's got Microsoft. They've got Bain. I believe they have other consulting firms like BCG. They've got a number of customers, probably increasingly dozens of customers already in the Fortune 100. So they've already got pretty good hooks into the broader market for this. Again, that's going to be very hard to dislodge once people have gone through an enterprise process and decided what to buy, and Google similarly. They've got all of the big companies know who they are. They have a high level of trust. They have a massive enterprise sales team that's already selling cloud solutions. So they also have huge advantages that challengers would have a very hard time reproducing. And then finally, network effects, probably the weakest one on here. AI in general has weaker network effects than earlier technology waves, and it is pretty easy to switch. If you want to switch from OpenAI to Claude, you can do that pretty easily. So it's not like there's this lock-in or just because everybody's using OpenAI, you have to too. You do have a lot of independence to kind of flip from provider to provider if you want to. Still, there are some ways in which OpenAI is, and to a lesser degree, other leaders are defining the standards that everybody else now feels like they have to follow. And I think a good example of that would be their plugin infrastructure. They define what the plugin infrastructure is, and everybody else kind of has to support that as well. Because if they don't, then they're kind of just clearly not as good. So you've got to catch up first, and meanwhile, while you're catching up to what they've already done, they're doing the next thing. So I think these moats are very real. These companies are likely to continue to lead this market for the foreseeable future. That doesn't mean there won't be any new entrants that would make a serious impact on the future, but I don't see these guys going anywhere anytime soon. So again, whole episode on that if you want to go a lot deeper on it. None of that is to take away from the incredible progress and proliferation that is happening in the open-source community. There are all sorts of hack projects and different training sets that people are putting together and training techniques. And that's why it's gotten down to, as I mentioned earlier, $100—to do a basic fine-tuning, that's possible because the open-source folks have kind of cracked all these different strategies to make it super efficient. So this is very cool. It's becoming radically accessible. You also might worry about proliferation in a negative sense, because if you're worried about things like misinformation or spear phishing attacks, those things are going to be very hard to control if people have their own open-source models and they're small and efficient enough that they can run them on their own computers. We really are pretty much already to the point where this open-source thing is kind of taking on a life of its own and nobody really can control it. We're, again, just kind of limited there right now—the limitations of that phenomenon are really just defined by the limitations of the models themselves. If the models are not that smart, then they can only do so much, then the consequences of all this are kind of limited by that. But aside from that, it's just kind of everything everywhere all at once. I think Imad is extremely reasonable when he says that the reality is that open-source will always lag closed source. That seems very true to me and very honest and just kind of forthcoming from him, especially as a champion of the open-source community. So take it from him, if you don't take it from me, that the moats are real for the likes of OpenAI and Google. But, again, that does not mean that there's not a tremendous amount of interesting stuff happening in the open-source world. And here's the leaderboard that I mentioned. This is up to date as far as the leaderboard goes, although I'm eagerly awaiting the next update when Claude 2 begins to show up on this leaderboard. But basically, what we see here is just kind of a summary of everything that I've just been saying. GPT-4 is at the top. Claude is neck and neck. Claude Instant just got ahead of GPT-3.5 Turbo on this head-to-head measure, I guess I should say. These are basically chess-style ratings, Elo ratings, where they do head-to-head comparisons, and that's what their Chatbot Arena side-by-side is all about. You have this—you give the same prompt. They give you responses back from language models. You don't know which is which. You just say which one you prefer. They keep track of all the scores and see which one beats which other ones how often and devise these scores. That's how GPT-4 pops to the top. Claude is beating 3.5 Turbo in that it is more preferred. Though on the flip side, we do see that on some of these more demanding benchmarks, this MMLU is a huge dataset of basically college-level exam questions across a wide variety of fields made by a guy named Dan Hendricks and team. I think his work is some of the very best, particularly in terms of creating benchmarks that are actually up to the challenge of making sure they're hard enough that these latest language models still can't solve all the questions. Here, you do see that GPT-3.5 Turbo is still ahead of Claude Instant. So they have different strengths and weaknesses. But GPT-4 is number one across the board. All the proprietary ones here at the top are across the board. These are all Facebook LLaMA derivatives. So Facebook made a strong open-source, pretrained-only version, and then people came on top of LLaMA and, that's why you get these names like Guanaco and Vicuna because these are llama-like animals. So they're kind of derived from the llama with fine-tuning, but these are all officially noncommercial. They were open-sourced, but the license does not allow for commercial use. So fly-by-night projects are using those. Serious companies aren't going to want to take that risk right now. And so you look at all of these best models. Again, here's Google's entry with PaLM, more Vicunas. I don't know if Koala is a LLaMA derivative or not, but you get all these things. You have to get all the way down here to MPT-7B-Chat to get something that is actually open-source. And it's only getting 32% of these MMLU questions right versus 86% for GPT-4. So it is a huge difference when it comes to the ability to reason through genuinely challenging undergrad STEM exam question-type things. Here, the gap is large. Even between the best open-source and the worst of the top tier, which would be on this metric, Claude Instant, it's still a 2-to-1 advantage in terms of how many of those problems it can get right. So lots of cool stuff, and you could use MPT Chat. And by the way, people don't just use MPT Chat raw most of the time. What they're actually doing is they're going and fine-tuning that for their purposes. So here, our guests seem to go on to great things with a high frequency, but no more striking example of that than the MosaicML guys, Jonathan and Avi, that we had on, who were 2 of 60 employees at MosaicML, and turned around and got themselves acquired by Databricks for over $1 billion just a couple weeks later. What they have done is built really nice infrastructure for training language models that includes training them from scratch. So if you're a corporate customer and you have all your own data, you want to do anything from anybody else's data, youErik Torenberg: (46:42) Just want to know that you own it.

Nathan Labenz: (46:43) And it's all copacetic from the beginning, then you can do that and they'll help you do pre-training with entirely your own data. Increasingly though, they're thinking, well, why do that every time if we can make a really good foundation model, whether it's a 7B or they now also have a 30B, and just let our customers fine tune that? That's a much easier path to a workable production model for many use cases. So they have that. Now you've got these nice open source models, easy fine tuning with their software tools that are top notch, and you can get your custom model that you own that you can also host on their service and use them for inference, or you can take your trained model and go run and do whatever you want to do with it. So $21 million per employee was the outcome for MosaicML. That's the 1 point whatever billion divided by the 60 employees. Pretty good.

Erik Torenberg: (47:35) Good things happen if you come on Cognitive Revolution.

Nathan Labenz: (47:38) Yeah. I think that's one of the clearest trends. I've learned a few things in doing this show so far. You said, call me old fashioned, but I use ChatGPT. That's honestly been one of the biggest learnings. Longtime listeners might remember that I used to always ask, what are the AI tools that you use and that you would recommend to the audience? And I stopped asking that because so many people just said ChatGPT and not that much else that I learned there's not that many other tools that really add that much to ChatGPT. There are definitely some, but most people didn't really have a great answer for that. And if they did, it was fairly niche and not the kind of thing everyone in the audience would want to go out and try anyway. So I stopped asking that question. You're in good company. That's one of my big lessons. But the other big one is, yeah, it seems to do wonders for the stock price. Alright. It doesn't do wonders for the retail price. In general, we've covered this a little bit in a couple points, but prices are low and getting lower. OpenAI has dropped their prices effectively by 98% in just the last year. In other words, what cost you 6 cents per thousand tokens as of a year ago, actually even a little bit better model than that, is now available for 0.15 cents per thousand tokens, which is basically that 98% price drop. And this is happening across the board. Anthropic is keeping up with them. Again, you can't really be more expensive. Anthropic might be arguably as good as OpenAI, but if you want people to switch, you certainly can't be more expensive and it helps to be cheaper. So prices are just racing to the bottom, and all of these efficiency trends that we talked about earlier are really driving this. Distillation, quantization, even stuff that's happening on the edge, the mixture of experts techniques. Now, a GPT-3 quality model with MosaicML is under $500,000 and a Stable Diffusion from scratch can be trained for under $50,000. These are huge decreases relative to what they originally cost. Stable Diffusion was $400,000, I believe, in the first go. So that's down seven eighths. And even when Emad shared that, it blew people's minds because they thought it must have been way more. It was already a lot cheaper than people expected, and it's come down an order of magnitude. And same thing with GPT-3. That cost millions originally, now they're under $500,000. So it's all just becoming far, far more accessible. And there's really no end in sight there. I mean, some of these techniques are going to start to get a little bit played out. You can only quantize so much. You combine it also with the new custom better hardware that's coming on, and that's more cost efficient as well. It does not seem like we've hit the end of the cost deflation just yet. I already mentioned this as well, but I think everybody knows that NVIDIA stock has boomed to a trillion dollars. That's because they have the hotness with the H100s. Obviously, they've had the previous generation of hotness as well with the A100s, and demand is just totally through the roof. There was a story about a Chinese company that just placed a billion dollar order with NVIDIA, and people were wondering, what's that for? And I was honestly thinking, if they can get that order filled, it's a no brainer for them to do it. I think it was ByteDance, and it was as much money as they probably are sitting on in their bank account. I would place that order just to get in the queue because the queue is long. It's getting longer. Particularly for China, you got the export controls. So just try to get that order through, and everybody's in that space right now. OpenAI can't do all the things that they want to do. Really, nobody has all the GPUs that they would like right now. So how does this all shape up in terms of who's going to own this market? I get this question a lot. This is the GLG question of who wins? And there, I think the best inspiration, at least as a jumping off point, is just to look at the dynamic of the cloud market itself. Basically, my thinking here is that ultimately, anywhere that you can run AIs, people will run AIs. And so that means if you have a giant cloud infrastructure platform, then people will run a lot of AI inference on it and training too. But especially, they're going to run their models on your compute as part of their applications. So you look at the leaderboard, AWS, Azure, and Google, there's maybe a shuffling to come there because Azure has the OpenAI partnership and Google has DeepMind and also has Anthropic partnership, and AWS is a little bit behind the others right now. But there's just no way that all that AWS compute doesn't end up getting used for AI. And you've got these other clouds, and they're all going to be rushing to have their own offerings. And then you also have people's laptops and people's phones that will eventually be able to run stuff. Apple's been putting these AI components into their system on a chip for the last couple of years. They haven't really even used it that much from what I understand. But of course, they're going to start to turn all of this stuff on. So I think everywhere that you can run AI, people will run AI, and you probably end up with a big tech oligopoly that looks very similar to the one that we have today. But you also have stuff running on your own local devices, and some corporate customers do stuff on prem. Basically, everywhere that compute exists, AI will also exist. So what about the economic impact? This is the South Park, "they took our jobs" moment. It's interesting that we're starting to see this from companies that you might expect it from and companies you might not. I don't even know which one you would consider to be which here. Right? I mean, Wendy's with an AI order taker at the drive-thru, that just makes a ton of sense. They call that Fresh AI. And I bet it's just plain better. For one thing, you don't have to wait in line single file. You could just do it on your phone. It can take all the orders at once, and then you could just get the order from the window as it's ready. That seems like this is just a strictly better consumer experience. And what stands in the way of this rolling out at scale? Probably not much at this point. From IBM, they've said that they are no longer hiring for jobs that AI can do. I would say IBM has a long and storied history of hyping AI stuff maybe beyond the point of reason, and they might be doing that again here. But it is striking to see that a CEO would say something like this and even put a number like 7,800 jobs and not be laughed out of the room. And that's reported credulously by Bloomberg and The Wall Street Journal and all these folks. And it's like, yeah, he might really mean it this time, even if they have exaggerated some stuff in the past. Whose jobs they will take, I think, is really interesting. And ultimately, everybody's going to be impacted. But this was a paper that came out of OpenAI in collaboration with some researchers, I think, from University of Pennsylvania. And they just broke jobs down into five bands that go from the least qualifications required to the most qualifications required. And what they find is the exposure, basically, the farther right you are on this chart, the more exposed you are. This is the lowest rate of exposure, and it's also the lowest rate of credentials. So you can think of this as yard work or whatever. We're not that close right now. And certainly, a language model in isolation is not going to be able to go weed my backyard. It needs robotics, and there's just a ton of noise, and it's bumpy terrain. So it's odd in some ways that many people have made this point at this point. But things are playing out a little bit the reverse of what many people expected, where they were thinking it'll be the lowest skilled jobs that get automated first, and then doctors maybe never. And in contrast to that, it actually seems like it's the most manual jobs that are at least risk of impact from language models in the near term, and it's the highest qualification jobs that are at most exposure, most risk of impact in the near term. Interestingly, the fifth, the highest qualification zone, is the red one. So it's the fourth band that is the most exposed per this analysis, and then it's the highest qualification band is just slightly less exposed than that. I do think that lines up with my general story. We have human level AI for many tasks. It's closing in on expert, but it's short of that breakthrough next level insight. You squint at this and you could tell a similar story where the fourth tier, second from the top in terms of qualifications required, is most exposed. And then as the job itself gets a little bit more demanding and higher qualifications, it pulls back a little, but honestly, not that much. So that's a good paper. I definitely recommend looking at that as well. These things are obviously highly debatable and highly criticized, but I do think their paradigm is sensible. They break things down into tasks. They just say a job ultimately you can decompose into a bunch of tasks. Which of those tasks can language models dramatically accelerate? And the result is this graph.

Erik Torenberg: (57:25) And you said 50% of tasks for 50% of jobs?

Nathan Labenz: (57:28) It's basically how many jobs have this minimum percent of tasks which an LLM could dramatically accelerate. So it's only 5 to 10% of jobs that have 80% of the tasks that an LLM could dramatically accelerate. So an example of this might be something like a paralegal. And that might be in the fourth of five bands. And what are the tasks that a paralegal performs? And of those tasks, how many of them could an LLM dramatically accelerate? 80 plus percent. There's only a few percent of the jobs that are like that. Half of the jobs have half of the tasks that could be dramatically accelerated. And then all of the jobs have basically at least zero percent that could be dramatically accelerated. So it's a threshold kind of thing. Right? Here you're saying 95% of jobs in this red band have at least 10% of the tasks that could be dramatically accelerated. 10 or more. So carrying on, social fallout is really just starting from all this. The writers and now the actors are going on strike, and AI was one of the writers' demands. It was striking to see that they didn't want AI to be able to be the creator of a show. Very specific contractual terms in play there around what roles they want AI to be able to play and not to play. I think we did a nice episode on that with three members of the Writers Guild. And they're not trying to ban ChatGPT or anything, but they do want to make sure that there's an actual human that gets the writer's credit and gets paid, and they don't want to have everything automated for obvious reasons. And I came away actually more sympathetic from studying that than I went into it. Initially, I was thinking, does it really matter who writes the sitcoms? And I think that's a pretty reasonable intuition that probably a decent number of people share. But the more I thought about it, the more I'm thinking, culture in general is ever evolving, and I do think it would be wise for us to put some limitations on how much we want AI generated content to shape the future of society. So I don't think it's obvious exactly what the answer should be. And in their context, they're working on a specific contractual foundation, which may or may not be the right one long term. But I do think there is a good case for saying, let's continue to have primary authorship in the future of human culture and not just rush to have everything generated by AI because it's cheaper and roughly as good. I think that would be a mistake, and it might be the kind of thing that could be a little bit hard to come back to. And people worry about, what happens when GPT-5 is going to be trained and so much of the data is AI data? Does that create weird dynamics? And it might. There's definitely some interesting research starting to come out about that that shows that you can only self train for so many generations before things go off the rails. And maybe solutions could be found to that as well, but at least with current technology, it does seem like there is a tendency for things to get weird when you just have AI reinforcing itself over and over and over again, generation after generation. So just something I think I initially thought, who cares? And now I'm thinking, actually, a little caution there probably is pretty prudent. Everybody's also now putting up walls around their data. The open Internet, as we've known it for 20 years, has maybe just come to an end. Everybody used to be able to scrape everything. Everybody made the trade with Google that, yeah, we'll give you access to our data. You put us in search results and send us traffic. That contract, informal as it was, has been shaken up now because OpenAI has gone out and got whatever data, Reddit data and Stack Overflow data to take two really relevant examples, and those companies are not getting traffic. Instead, that data is just being used to help answer questions for people, and most of the time, they don't have to go to Stack Overflow. I think it was Omneky who said that Stack Overflow traffic was down. I don't know if that's been updated. At one time, it was down. It was possibly seasonal. Who knows? But certainly, I can tell you I go to Stack Overflow way less often than I used to, because now I can get GPT-4 to often just write the code in the first place and get it to do what I want without even needing to get to that stage of trying to figure out a certain bug or whatever. So now these folks are thinking, okay, we can't just give away all this data. For something like Reddit, the AI isn't so much a threat to their community, but it is a threat to traffic, and it is a threat. They want to get paid. With Stack Overflow, it's potentially existential. If nobody's going there, then that community could just be, the vitality just might drain out of it. So people are just beginning to react to this, but data and who got what data where, and did they pay for it? Is it licensed? I think a little bit, they maybe didn't expect things to get this far, this fast. And OpenAI does have some programs where they are starting to do deals. They just did a deal with the AP. They've got a big deal with Shutterstock for their images, which powers the DALL-E product. So they are doing data licensing, but they also just grabbed up a bunch of stuff and used it. And now it's like, oh my God. This stuff got so good. Now all the data owners are coming knocking and saying, wait a second. Shouldn't we be getting compensated for this somehow? And who knows how that's going to play out. The lawsuits are just getting started. OpenAI has been sued repeatedly for their Copilot model. GitHub and Microsoft all named there. It's an interesting one because the code that they used was open source. It was published open source, but the creators never expected this. So they're going to try and sue anyway. Same thing with Stable Diffusion ish. I mean, Stable Diffusion's a little bit more shaky territory probably because people didn't publish their images on the Internet open source per se. They just published them and expected that they would retain rights in many cases. But all those images got sucked up into a 5 billion image dataset and models got trained, and now people are getting sued. So how that is going to shake out is going to be extremely tricky to predict and probably is going to take years, and it's probably going to ultimately end up in the Supreme Court and probably different Supreme Courts across countries as well. Starting to see Stable Diffusion does now have an opt out where you can go to their system and say, these are my images and you can't use them. But obviously, that's a little bit late in the game for some of the artists who've seen their style copied millions of times at this point. Greg Rutkowski, famously was found to be a great name to influence the style and was used in just prompt after prompt after prompt. And he never had a say in that, and it's too late for him to opt out. There's just so many things out there in his style at this point that the cat is well out of the bag. And as far as I know, Stability AI is still the only one doing that. So governments are scrambling broadly to figure out what to do, and we're seeing some serious divergence in terms of approach. In Europe, as you would probably expect, they're taking a very, we need to regulate this. We need to make sure everything is consistent with our values. From what I've heard, honestly, most of that discussion seems pretty reasonable. But then every so often, something gets, some language gets inserted and maybe it'll get removed later. But you have these things where, wait a second. If that happens, then we wouldn't be able to operate here. So at one point, Sam Altman had said, look, if they pass something like this, we might have to leave Europe. And then he had to walk that back. It was, we don't want to leave Europe. We just want to make sure that the regulation isn't too heavy handed. But they're currently leading on regulation. China is probably next, actually. They have put some guidance out around what a standard would look like for language models in China, and it's not a, it's a pretty significant hurdle that you would have to get over to put your language model into the public. They want to be sure that your data that it's trained on is of high quality. They want to make sure that it is not violating people's copyrights. They obviously want to make sure that it's good for social cohesion as they see it. At the scale of data that you're talking about, remember, 1 to 10 trillion tokens to train a frontier model. How are you going to filter all that data? Really hard to do. Certainly not something you could do manually. It is something you could do, by the way, with GPT-4 or with Claude. So there may be a dynamic here where as these regulatory requirements come online, the companies that have the leading models are actually most able to fulfill those requirements because they have the model that allows them to scalably process through all that training data that other people just don't have. So unclear right now where things stand in China. There are pretty good models out of the leading companies there. Are those good enough to filter the data for the next generation of models that would actually pass muster with the Communist Party and allow them to go online? My guess is yes. I think they probably have enough built, even though it's not super broadly deployed there for all these social concern reasons. My guess is, though, that they have enough built that they can start to scalably run these cycles and get to the point where they could satisfy the Chinese regulators. As I understand those rules are not final, they're draft. But it seems like that's the kind of thing that the Chinese Communist Party is going to pay attention to and have a point of view on. In contrast, in the US, as of now, we basically have no rules. You can just do whatever. There's no licensing. There's no regulatory. Liability is not clear. Things are in the courts to some degree, but the government has not really done much. We have a blueprint for an AI bill of rights from the White House last year. Didn't even really touch large language models. That obviously needs an update. Congress is getting increasingly interested and starting to get briefings on it. And Sam Altman and Dario and Demis went to the White House. But basically, we're in laissez faire territory, but in an undefined way. And then you can look at Japan, and they're starting to say, hey, we're going to take all this concern off the table by just saying, here in Japan, you can do whatever you want to do. And I don't know if that will hold or not, but you have the poles. Europe on the one hand, Japan on the other, with China seemingly to me a little bit closer to Europe, and the US right now, basically in the same place as Japan, although not because we've made an affirmative stance, but just that we haven't really done anything yet, and so we'll see what happens here. So what happens next? This is the big question, I think. We are at this point where leading AIs are more capable than an average human on most tasks. They're closing in on expert performance on anything where there is a well defined standard of what to do, whether that's standard of care in medicine or how to solve common programming problems or whatever. They're getting very, very good. They're still not at that human genius level. They're not capable of those breakthrough insights. They're not capable of identifying the non obvious hypothesis that is worth testing because it has actually a good chance of being true. That's still the domain of the human, not just expert, but the human genius, you might say, if you want to say genius is having those insights. Does the curve level out before AI gets to Einstein, or does it just bust through the Einstein level and we get superhuman intelligence in the near term? Right now, I would say both options are very live. Most people are, even the people that are most in the know, I think most people are surprised by how far the current large language model paradigm has gone. People did not expect it to go as far as it has. So it might just keep going further. There's an argument to say, well, no, it's not going to get to human genius level because we just don't have that much training data. There's just no way that it could get there because how are you going to get smarter than the smartest of the training data? And that's a reasonable response, I think, or reasonable theory as to why it might level off. I certainly wouldn't rule that out. On the other hand, grokking is weird and language models are maybe better thought of as alien intelligence than anything else. So maybe they will be able to grok certain things in ways that are more interdisciplinary than humans can, and maybe that can be enough to reach some sort of breakthrough insight capability. If so, I think all bets are really off for what happens next. So over the next, let's say, 2 to 3 years tops, I think we're going to see this question answered. OpenAI has said they're not training GPT-5 yet. They want to develop and deploy and refine GPT-4, and at some point, they're going to start to think about the next generation. Anthropic has said that the companies that train the leading models in 2025, 2026 might get so far ahead that nobody else can catch up. That's crazy. But I think both of these extensions of the curve are possible, and it's really hard to predict which way it's going to go right now. The other thing that could happen too is we could see a new breakthrough at any time. Right? The argument that it levels off is also an argument that there's no additional conceptual breakthroughs that just unlock something totally different and more powerful. And I wouldn't bet on that either. We've seen enough. Lily, you from the Megabyte paper from just a couple weeks ago is a great example of another architecture that is just another riff on what we currently have in some ways, but a riff that just might prove to be super powerful for unlocking all sorts of new capabilities. Just in case things don't level out, what are people thinking about how to make sure that this stuff stays under control? Personally, I think there's a lot of weird debate around the idea that it's always going to be fine. That to me is not credible. If AIs become smarter than the smartest humans, I would say that to me seems undeniably somewhat dangerous. It might be fine, but it seems like it's very possible that it would not be fine. And I would not, by way of analogy, but just by way of firm analysis, humans have dominated the Earth because of our intelligence, and that has not gone well for a lot of other species and a lot of habitats and a lot of nature. Because we just do what we do and we don't really think ahead as much as we maybe should, or we just don't care about some of these other things as much as some think we should. And so a lot of nature has just had a really hard time under what we call the Anthropocene, the period of Earth history in which humans dominate and control everything. And if the AIs get to the place where they are genuinely smarter than the smartest humans, something like that, to me, it seems like very well could happen. So what plans do we have to try to make sure that if indeed we do get to a superhuman intelligence, that it will be safe and that we will be able to control it? Honestly, I don't think the plans are as reassuring as people might wish they were. And I'll give a couple, but here's just right now, the players that I see as the ones that are going to shape the outcome. And this is something I definitely intend to update pretty often. OpenAI, Google DeepMind, Anthropic, we've covered those ad nauseam. Stability is also still an organization that can make major change by open sourcing things that previously people just didn't have access to. That's what they did with Stable Diffusion. They've got language models out there now. They're not alone in creating open source language models, but they are doing it. They've got all this multimodal stuff as well. So definitely a lot of tremendous talent in that organization that could bring game changing technology online anytime. Character, we've talked about some of the best models. Should put Inflection on here based on their $1.3 billion raise. Literally from day to day, I need to update these slides. So Inflection should be an addition. They're building one of the world's leading supercomputers. Meta is just crushing it right now on the research front and releasing great paper after great paper. Just literally from, again, the Megabyte paper and her more recent one, which is the one that has the multimodal output. Just those two papers from the last couple of months would constitute serious output for any academic lab or whatever. And she's one where she's got coauthors, but that's a small part of what Meta AI has going on. Microsoft, obviously, doing a lot of stuff as well. They're from among the big tech companies, they're the ones that are most aggressively productizing and bringing AI to everything that you already use. Their research is also really ramping up. Tesla, of course, with their self driving, with their Tesla Bot, with their embodiment, if there's somebody that's going to build a humanoid robot at scale, they're probably the best bet to do that right now. And then, of course, Elon and his team are also, just this week, put some meat on the bones of their x.ai project. And they've got, as you would expect, a world class team. So I think another formidable organization has entered the race. Salesforce does a lot of great research and publishes it pretty much all open source. And interestingly, CEO Marc Benioff owns Time Magazine. All the editorials that are coming out of there definitely suggest to me that he has a point of view that he is trying to popularize through the media as well as supporting an ongoing research agenda. I don't really know how to square those two, or maybe they just exist in parallel. But it is interesting to see that he owns that magazine, and they seem to be the ones putting these most iconic editorials, whether it's Eliezer calling for willingness to be prepared to make air strikes on rogue data centers or any number of other editorials that they've published recently. They are definitely using their brand capital built over decades and now owned by a single big tech CEO to shape the debate. Replit is another one. We will have an episode coming up with the new VP of AI at Replit. And Replit right now, I think, is living up to Amjad's claim that it's the perfect substrate for AI. They have some of the best AI features. They have the best way for AI to become accessible and diffuse through the global population. And they have a stated goal, one of the most ambitious, of creating artificial developer intelligence. I am going to ask how they see that as being different from artificial general intelligence, because to me, it seems more similar than different, frankly. If you can do everything that a developer can do, you can do a lot of things. So I don't know how different that really is, but we'll get into that when we have that conversation. But their goal, and what they ship, so they've said that they plan to have an artificial developer online by the end of this year. How good that developer is, we'll see, but their tools are already pretty good. So I would not be surprised if the Replit AI developer is a pretty formidable force that can really contribute a lot of quality code. And we may see situations in the not too distant future where the majority of lines of code are written by AI. Finally, to outside of the maybe norm in terms of developers, I definitely think the Chinese Communist Party has room to shape the future. They have a stated national strategy of being leaders in AI. They even had a few years ago an op ed in the Washington Post where a top legal scholar out of China said, AI is basically communist, echoing or maybe even saying it before Thiel got around to saying anything along those lines. He basically said, we have all the data, and with this data and the rise of AI, central planning and central state control is going to become much more viable. Whereas in the West, you guys are just, everything's a mess, and you guys are all making your individual decisions. You have no coherence. That's why we're going to win. They basically published that in Washington Post 3 or 4 years ago. So they're definitely all in on AI. It's a little less clear that they're all in on language models and highly general systems in the same way. I would be confident that they are continuing to invest in surveillance. They're surely continuing to invest in autonomous drones. But those are relatively narrow systems that, yeah, that might be an arms race, that might be a disaster in its own right, but at least those systems are relatively narrow and not so unwieldy like the large language models. From their perspective, the language models create a different kind of risk to social stability and whatnot. And so it's interesting that they're pushing hard on the narrow systems, but a little more reluctant to embrace the more general purpose open ended systems. And then finally, the global chip supply chain is, we've done an episode, we're going to have another one coming up on this as well. But this is a very brittle and easily disrupted supply chain. There are not that many companies that make the best chips in the world. NVIDIA is head and shoulders above the rest in terms of designing the best chips. There's a company out of the Netherlands that is the best company at making the tools that are used to make the chips, and they really don't have any rivals. And then TSMC in Taiwan is the best company when it comes to actually using those tools to make the chips per the designs from folks like NVIDIA, and they also don't have any major rivals. You might say Samsung, but basically, TSMC is the leader. And with capacity being so constrained, any disruption to this supply chain is going to cause significant waves in the broader economy. And the fact that all these fabs, as they call them, are in Taiwan, and especially now that we have cut off or attempted to cut off China from buying those products, I don't really like that regulation, to be honest about it. I think we have made it a lot more tempting and likely even that China would try some sort of blockade or attack or who knows what on Taiwan because the spice is flowing from Taiwan, and they're not getting a lot of it right now. So that, to me, has increased the potential for conflict. Yeah, it may have slowed their AI efforts. But at the same time, I really don't think it's worth it, given what I see as the increased risk of global conflict. Okay. So now the specific alignment and safety plans. So OpenAI has this high level, these are just pulled directly from two blog posts. I find this to be just one of these mind blowing things where it's like, really? That's your plan? And it's not that I have a better one, but it is yikes. So this is from a while back. They said their three part plan, train AI system using human feedback. That's RLHF. They've done that. They continue to do that, but it's largely done. Then they're going to train AI systems to assist human evaluation. And that, I would say they're definitely in the midst of. Right? GPT-4 can evaluate and Claude can evaluate its own output at basically a human level, maybe even better than your average human, almost certainly better than your average human. So they're in that process as well. And then there's this mythical third part that's, oh, and then we'll train the AI systems to do alignment research. And that's the part where it's, okay, what's your progress look like? And do you have a good theory of how that exactly is going to go? I don't think unfortunately, the answer is not really. They're very, very much still figuring that out, and they don't really even have great control of their current systems, including GPT-4. I've said it a few times on the podcast at various points that some of the red team harmful examples that I reported back last year in September and October still work with the exact same prompt today with GPT-4. Even the most recent June edition of GPT-4 still work the exact same prompt. I've reported these multiple times. They always thank me for reporting them. And they're working on it, but they don't have easy switches that they can flip to turn off a certain behavior. So they got a lot of work left to do, and I don't really see that we're that close to this AI system that can do the alignment research. They are doing some stuff. They have used the GPT-4 system to do some inspection of GPT-2 and look at what all the individual neurons are appear to be doing in there and whatnot. And there's some interesting stuff out of that, but it's like you got a long, long way to go before you're going to be able to say you have any robust control over a GPT-4, let alone a GPT-5. But most recently, this was just from this last week, the goal is still to build a roughly human level alignment researcher. And at a minimum, they've said they're going to dedicate 20% of their compute to it, and they intend to solve the problem or they aim to solve the problem of superintelligent alignment over the next 4 years. So they basically put themselves on the clock, and I think that reflects the fact that they think that AGI is not particularly far off. So I think they view that as the countdown that they have to get this thing done. Meanwhile, Anthropic has published a couple of interesting things about this as well. We've covered their constitutional process where they basically define a constitution and then have the AI critique itself over and over again and improve, improve, and keep using that cycle in training. It does seem to work quite nicely. I think OpenAI is now borrowing that same strategy. They publish it, and others, of course, adopt. The gap between research and practice is not big, as we've talked about. So that's what they're doing. They've also published just some general overview of how they think the AI safety thing is likely to play out. And basically, what they have here is total uncertainty. They're pretty confident it's not a trivial problem. I can say I'm also very confident, just based on my red teaming experience and the fact that GPT-4 still does some of those things. I can say confidently, it's not a trivial problem. But is it hard like engineering is hard, or is it hard like a moonshot is hard, or is it hard like unanswered, centuries old theoretical questions are hard, or is it just impossible? They basically express radical uncertainty over that and have essentially no idea. So that's alarming. Here's another one that I think is pretty interesting. I mentioned Gato earlier as the most general purpose agent that we've seen, this system that can do all these different things from text and image captioning and operating a robot arm and playing Atari games, all with a single system with just one set of weights, and that's over a year old now. So listen to this clip from Demis on Lex Fridman where he talks about, and this is just after they released this, where he talks about Gato and what might have been Gato 2.

S3 (1:28:16) Obviously, language models predict the next word. Gato predicts potentially any action or any token, and it's just the beginning, really. It's our most general agent one could call it so far. But that itself can be scaled up massively more than we've done so far, and obviously, we're in the middle of Nathan Labenz: (1:28:32) doing that. And obviously, we're in the middle of doing that. So that was last year, more than a year ago now, just shortly after they published the Gato paper. We're in the middle of doing that. Gato 2 has not come out. So I really wonder what they saw in that process. Did they see behavior that kind of scared them? Did they think better of it? Did they just deprioritize that research for now? Or maybe they're still working on it and trying to iron out problems. But it is notable that he went from that in the middle of last year to the Time Magazine article saying, it's time for caution, and we've not seen Gato 2. So I think big questions there around exactly what is going on. Meanwhile, though, all of these folks have signed on to the extinction risk statement, and this builds on a survey. So this is from a 2022 survey of AI researchers. And people criticize us and say, oh, it was cherry picked. The ones that weren't worried about it didn't answer the survey. Yeah. It could be. I mean, that happens, of course. However, it's still pretty striking. 48% of respondents said that there's a 10% or higher chance that the long run effect of advanced AI will be extremely bad, e.g., human extinction. To me, that's enough to make this the most pressing issue on Earth right now. And I don't really care if it's 5% or 10% or 50%. Whatever that chance is, it's not so much about identifying the true probability in my mind as recognizing that there is real risk and then trying to figure out what to do to minimize that risk. So we now have a big tent for that general proposition. Mitigating the risk of extinction from AI should be a global priority alongside other societal scale risks, such as pandemics and nuclear war. We have no less than CEO of OpenAI, CEO of Anthropic, CEO of DeepMind, 2 Turing Award winners. That's the Nobel Prize for computer science. 2 Turing Award winners. Bill Gates, congresspeople, chief scientist from OpenAI, chief AGI scientist from DeepMind. I mean, you basically could not have a more robust who's who of the most important thinkers, decision makers, developers, leaders in the AI space signing on to this statement at this point. So are they just doing it to hype their products? I don't really think that is credible, to be honest. Business is booming. They don't need more hype. They need more GPUs. They don't need more hype. They're doing this, I think it's safe to say, because they are genuinely, sincerely concerned about what could happen. And then you can also look at forecast measures to see what the broader community thinks might happen. And this is from a site called Metaculus where anybody can come and register, but obviously only people who are pretty into this stuff and into making these kind of fine grained predictions would do it. They have 2 different questions, kind of the weak AGI and the strong AGI. And the weak AGI basically says you're going to get a certain percentage on the SAT. You're going to be able to solve these pronoun disambiguation questions. These are things that AIs used to not be able to do at all, and now they're just pretty much trivial. And then it also has to be able to play an Atari game. And basically, GPT-4 can satisfy 3 of these 4 criteria. The only thing it can't do is play an Atari game. Gato 2, honestly, probably satisfies these criteria and would probably count as a weak AGI if we had seen a Gato 2. The forecasters, though, put this at just on as of the time I took the screenshot, this bounces up and down as people modify and add new forecasts. But 2,800 people have made predictions, and the consensus is under 3 years from today to a weak AGI system being publicly demonstrated. And then the version that is the strong version basically just ratchets up all of those criteria so that it's even more impressive, even more demanding, and that is now projected consensus at 9 years out in 2032. So by all kind of angles, it seems to me that we are headed for a period of a potentially pretty bumpy ride and certainly a lot of disruption and creative destruction. And I think that's basically unavoidable at this point, and a lot of it will be good. I think the good of GPT-4 dramatically outweighs the bad. As a member of the Red Team and somebody who saw all the crazy stuff that GPT-4 would do and even had my kind of had shivers down my spine when it suggested targeted assassinations to me at one time. I still came away from that feeling like the upside of this level of capability is dramatically more than the bad. If we can get high quality medical question answering and frontline diagnosis globally, that is a huge, huge, huge, huge, huge win. It's hard to overstate how important that is. And it's definitely bigger than spear phishing getting way harder to detect and ultimately a bunch of people getting scammed. So I'm a big believer, generally pro technology and a big believer that the current level of technology is dramatically more good than bad. But I really do wonder what happens if we continue to scale up, if we go another 10x, if we go a 100x or another thousand x. The hardware with the new H100s is there or is coming online right now to be able to power that next leap of 2 to 3 more orders of magnitude, we don't know what that system would look like. We do not know what it would grok. We do not know what it would be capable of. We don't know if it might have those breakthrough insights that current systems are just not able to create. So my point of view on all of this is that we should embrace our AI servants, and we should rush to understand them. We should rush to adopt them and take advantage of what they can do for us. We should accept the fact that there is going to be economic fallout from that. But we need to figure out how to use these things as quickly as we can and get as much value from them as we can. At the same time, I think we should be extremely careful about increasing their scale and power beyond where they are because we don't have that much room left before we do get to a superhuman intelligence. Right? We're already above average. We're closing in on expert. We're not yet to human genius or human leading scientist. And how much more room is there? There's not that much more room to go. I think we should be very, very cautious about further scaling of the general pre training process, because we just don't know what comes out of that. So with a little help from Drake, I say yes to AI servants for everything, ideally, but not yet, at least to AI scientists. I think there's a lot more that we want to understand about how they work, about how to control them before we would start introducing those kinds of systems. And that brings us to the end of this first edition of the AI Scouting Report. So I appreciate your time, Erik, and everybody else spent with me going through all this. I definitely want to hear your feedback. I think next time, we should do questions from you and from the audience. And I definitely will do more versions of this in the future to update with new information, and hopefully also be able to communicate everything more clearly and ultimately more concisely as we go. I think we've got an iteration or 2 of this before it'll be ready to go totally mainstream, but I appreciate the early audience that has stuck with us through this overview of everything that's going on in AI today.

Erik Torenberg: (1:37:08) Let me ask one question that leads up to perhaps the next version, which is, whether it's in a quarter from now or 2 quarters from now, what are the biggest questions on your mind in terms of how things are going to play out that is going to determine what's going to go in that next update? Kind of what are you most paying attention to in terms of what needs to shake out or develop?

Nathan Labenz: (1:37:33) Good question. I mean, there's a few things come to mind for sure. One is, I think it's really important to keep in mind that we're still in a period where everything could change quickly. If somebody has the next conceptual unlock on the transformer or makes a whole different thing that's just better than the transformer, then all bets are kind of off. Who knows what happens in that context? If you go around asking people on this list who signed the extinction risk statement, how many more conceptual breakthroughs do you think are needed before we get to AGI? And everybody has their own definition of AGI, but let's just take the OpenAI one, which is an AI that is better than humans on functionally all economically valuable tasks. The answers you'll get will be basically 0 to 3 or 4, usually. 0 would be saying, there aren't any more. We just need to keep scaling, and we'll get there. 4 would be a high answer from what I've heard. I've heard 1 to 2, and that's kind of where I don't know. It could be 0. It could be 1. Could be 2. Doesn't feel like it's a lot, though, in terms of meaningful conceptual unlocks remaining to be found. So when will they come? Who knows? That's probably the most unpredictable thing is if there's just another conceptual unlock. In the absence of that, how far does the current paradigm go? That's kind of another way to frame that is, what if it is 0? More conceptual unlocks are needed, and it's just a matter of getting enough H100s and just enough data and having it run enough API calls and figuring out which ones work and which ones don't so that it can finally grok everything that it needs to grok. And that's certainly a plausible scenario as well. If that is the scenario where, I guess, we may or may not find out, because another big question is just, what will governments allow? OpenAI is out there saying, and I think to their credit, that we don't want to regulate small things. Small projects are not dangerous. We know that pretty well. We know that at least with the current technology, the current paradigm, nobody has been able to create something super capable with minimal compute. So if you're experimenting in an academic lab or your laptop or whatever, you can do whatever you want. That's OpenAI's position. They just want to see some sort of permitting process or licensing process, also with some safety standards established, if you're going to go basically beyond where they have currently gone. And people draw lines at 10 to the 24 FLOPS, 10 to the 25 FLOPS, whatever, we'll see. But those rules are not yet in place. Will they be put in place, and will they prove effective? If they are in place, will people run to other countries and try to build clusters there, or will China have a similar rule? We don't really know how any of that is going to play out. But I think those are probably the biggest things that I am watching for, and they kind of could come. Some of them could come any day.

Erik Torenberg: (1:40:52) Well, we will be covering them on Cognitive Revolution as they happen, and look forward to the next big scouting report. It's been great. Thank you.

Nathan Labenz: (1:41:02) Thank you, Erik. Appreciate it.

Erik Torenberg: (1:41:04) Omneky uses generative AI to enable you to launch hundreds of thousands of ad iterations that actually work, customized across all platforms with a click of a button. I believe in Omneky so much that I invested in it, and I recommend you use it too. Use CogRev to get a 10% discount.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.