AI Discourse Deranged: Assessing LLM Generalization Takes and Polarizing Regulatory Debate

Watch Episode Here

Video Description

In this episode, Nathan and Erik discuss research out of Google Deepmind suggesting LLMs can’t generalize and unpack several perspectives from AI thinkers, Hemant Teneja’s responsible VC commitments, and why now is not the time for an ideological war on AI regulation. If you need an ecommerce platform, check out our sponsor Shopify: https://shopify.com/cognitive for a $1/month trial period.

Episode Notes: https://docs.google.com/document/d/1nKSEPkVajUzjBAvQNUqUD3p5Pz-xaz0dp-DVR8KJCc0/edit?usp=sharing

RECOMMENDED PODCAST:
Every week investor and writer of the popular newsletter The Diff, Byrne Hobart, and co-host Erik Torenberg discuss today’s major inflection points in technology, business, and markets – and help listeners build a diversified portfolio of trends and ideas for the future.

Subscribe to “The Riff” with Byrne Hobart and Erik Torenberg: https://www.youtube.com/@TheRiffPodcast

SPONSORS:
Shopify is the global commerce platform that helps you sell at every stage of your business. Shopify powers 10% of ALL eCommerce in the US. And Shopify's the global force behind Allbirds, Rothy's, and Brooklinen, and 1,000,000s of other entrepreneurs across 175 countries.From their all-in-one e-commerce platform, to their in-person POS system – wherever and whatever you're selling, Shopify's got you covered. With free Shopify Magic, sell more with less effort by whipping up captivating content that converts – from blog posts to product descriptions using AI. Sign up for $1/month trial period: https://shopify.com/cognitive

With the onset of AI, it’s time to upgrade to the next generation of the cloud: Oracle Cloud Infrastructure. OCI is a single platform for your infrastructure, database, application development, and AI needs. Train ML models on the cloud’s highest performing NVIDIA GPU clusters.
Do more and spend less like Uber, 8x8, and Databricks Mosaic, take a FREE test drive of OCI at https://oracle.com/cognitive

NetSuite has 25 years of providing financial software for all your business needs. More than 36,000 businesses have already upgraded to NetSuite by Oracle, gaining visibility and control over their financials, inventory, HR, eCommerce, and more. If you're looking for an ERP platform ✅ head to NetSuite: http://netsuite.com/cognitive and download your own customized KPI checklist.

Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.

X/SOCIAL:
@labenz (Nathan)
@eriktorenberg (Erik)
@CogRev_Podcast

TIMESTAMPS:(00:00:00) - Opening discussion of recent viral tweet claiming LLMs can't generalize
(00:03:27) - Waymo Safety Research
(00:07:09) - Use of GPT-4V for medicine
(00:10:12) - Response to Google Deepmind’s research claiming LLMs can’t generalize
(00:15:08) - Sponsors: Shopify | Omneky
(00:21:38) - Reproduction of the DeepMind research showing noise allows generalization
(00:26:02) - Helen of Transforrmers
(00:30:15) - Sponsors: Oracle | Netsuite
(00:32:36) - Continuation of Google Deepmind discussion
(00:36:49) - Google Med-PaLM LLM resulting in novel insights
(00:38:27) - Disagreeing with Jim Fan about how transformers are not elixirs
(00:42:47) - “The AI Executive Order will only continue to look more foolish from here”
(00:44:09) - The AI Scout mindset
(00:45:22) - Few humans can extrapolate beyond their training data
(00:47:17) - Meg Mitchell’s perspective on AI safety and data measurement rigour
(00:48:45) - When does GPT-4 deceive its user?
(00:49:58) - Voluntary Responsible AI Commitments from VCs
(00:54:00) - Analysis of the commitments as basic common sense practices
(00:56:30) - No need for an ideological war
(00:57:00) - Discussion of critical reactions framing the commitments as capitulation
(01:00:00) - Concerns about the voluntary commitments enabling harmful regulation
(01:03:00) - Rebuttal that self-regulation can avoid heavy handed regulation
(01:08:00) - Self driving cars: the invisible lives you can save
(01:11:00) - Mysticism when it comes to AI
(01:12:03) - Suggestion that product liability law could address concerns
(01:15:00) - Challenge to defend rejecting product liability for AI systems
(01:18:00) - Closing call for moving beyond polarized ideological discourse

Full Transcript

Transcript

Nathan Labenz (00:00) Oh, look at this. It proves that language models cannot generalize. And this is basically insane. Instead of Helen of Troy, I was thinking tweet is like the Helen of transformers. If we start to mislead or embrace pretty obviously wrong headed conclusions about what is, it cannot be good for our downstream discourse of what should be done about it.

Erik Torenberg (00:27) The concern here is that this is a Trojan horse or a wedge into sort of a governing body that has the reputational credibility and then the legal ability to regulate who or who not can innovate.

Nathan Labenz (00:41) The alternative is we're going to shit on the people that are trying to establish the best practices then that's what's going to bring down the heavy handed regulation. If you want to prevent that regulation show me that there's no problem. Show me that you have it under control. This may be the time to build but it's definitely not the time for ideology. Hello and welcome to the Cognitive Revolution where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas and together, we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my cohost, Erik Torenberg.

Erik Torenberg (01:21) All good on your end?

Nathan Labenz (01:23) Yeah. I got a few bones to pick today. But aside from that, you know, everything's going well. It's been a little bit of a quiet period the last, you know, 10 days as I've really been digging into all the OpenAI releases and, you know, trying to feel out what they're good for and what they're not good for. I think that'll be subject of another episode because I wanna do at least a couple more experiments before I give a a summary of my findings. One spoiler is the the vision component, which I was expecting to be a huge unlock. Think it really is gonna be a huge unlock. And in part, that's also because it's quite cheap. You can pass in 12 images for one cent, And then you do pay also for what it generates in response to that. But 12 images for a cent gives you a lot of ability to kind of take slices out of videos or just take periodic screenshots of stuff, all sorts of monitoring solutions. A lot of passive stuff, I think, can happen with the vision because it's just so easy to, like, collect that sort of information. And since it's so, effective and cheap at processing it, I think it's gonna be a really big deal. But that's not what we're here to talk about primarily today. Basically, I have just had a burn my saddle, you might say, over the last week or so with a couple of aspects of the AI discourse online where I'm just like, guys, let's all be better than this. So I want to kind of take these topics one by one, take them apart, you know, kind of analyze a bunch of the different contributions that people made to the ongoing discussion and kind of, you know, give my message to all these people. And again, you can hold me accountable as we go. Before doing that, I wanted to take a moment, and this might become a bit of a ritual, to give a strong kind of nod and pay respects to the value of accelerating the adoption of existing AI technology. And I had kind of two findings that were just relevant in the last few days that I wanted to highlight, if only as a way to kind of establish some hopefully credibility and common ground for the critiques that are to come. But not only that, because I think these are also just meaningful results. So the first one comes out of Waymo. And they did this study with their insurance company, which is Swiss Re, which is a giant insurance company. So here, I'm just gonna read the whole abstract. It's a kind of a long paragraph, but read the whole abstract of this paper and just, you know, reinforce because it's kind of a follow-up to some previous discussions, especially the one with Flo, about, like, you know, let's get these self drivers on the road. So here's some stats to back that up. This study compares the safety of autonomous and human drivers. It finds that the Waymo One autonomous service is significantly safer towards other road users than human drivers are as measured via collision causation. The result is determined by comparing Waymo's third party liability insurance claims data with mileage and zip code calibrated Swiss Re human driver private passenger vehicle baselines. A liability claim is a request for compensation when someone is responsible for damage to property or injury to another person, typically following a collision. Liability claims reporting and their development is designed to, using insurance industry best practices to assess crash causation contribution and predict future crash contributions. Okay. Here's the numbers. In over 3,800,000 miles driven without a human being behind the steering wheel in rider only mode, the Waymo driver incurred zero bodily injury claims in comparison with the human driver baseline of 1.11 claims per million miles. The Waymo driver also significantly reduced property damage claims to 0.7 claims per million miles in comparison to the human driver baseline of 3.26 claims per million miles. Similarly, in a more statistically robust dataset of over 35,000,000 miles during autonomous testing operations, The Waymo driver together with a human autonomous specialist behind the steering wheel monitoring the automation also significantly reduced both bodily injury and property damage, per million miles compared to the human driver baselines. So zero injuries caused out of over 3,000,000 miles driven. That would have been an expectation of over three injuries for the human baseline and under 25% the property damage ratio for the Waymo system versus the human baseline. Now there's a lot of stuff. You know, we have had a couple episodes on these like self drivers recently. So a lot going on there. This is not necessarily fully autonomous. There's some, you know, intervention that's happening in different systems. It's not entirely clear how much intervention is happening. I'm not sure if they're claiming zero intervention here as they get to these stats or kind of the result of a system which may at times include some human intervention. But I just wanna go on record again as saying, this sounds awesome. I think we should embrace it. And, you know, a sane society would actually go around and start working on improving the environment to make it more friendly to these systems. And there's a million ways we could do that, you know, from trimming some trees in my neighborhood so the stop signs aren't hidden at a couple intersections, you know, on and on from there. So that's part one of my accelerationist prayer. Part two, here is a recent result on the use of GPT-4V for vision in medicine. In our new preprint, this is a tweet from, one of the study authors, we evaluated GPT-4V on 934 challenging New England Journal of Medicine medical image cases and 69 clinical pathological conferences. GPT-4V outperformed human respondents overall and across all difficulty levels, skin tones, and image types except radiology where it matched humans. GPT-4V synthesized information from both images and text, but performance deteriorated when images were added to highly informative text, which is an interesting, detail and caveat for sure. Unlike humans, GPT-4V used text to improve its accuracy on image challenges, but it also missed obvious diagnoses. Overall, multimodality is promising, but context is key, and human AI collaboration studies are needed. My response to this, though this comes out of Harvard Medical School, by the way. So last I checked, still a pretty credible institution despite some recent knocks to the brand value perhaps of the university as a whole. My response to this, which I put out there again to try to establish common ground with the accelerationists, even more so than self driving cars where you can get legitimately hurt, when an AI gives you a second opinion diagnosis, that's something that you can scrutinize. You can talk it over with your human doctor. There's a million things you can do with it. And so as we see that these systems are starting to outperform humans, I'm like, this is something that really should be made available to people now. And I say that on an ethical kind of consequentialist outcomes oriented basis. I would even go a little farther than the study author there who says, well, more studies are needed. I'm like, hey, I would put this in the hands of people now. If you don't have a doctor, it sounds a hell of a lot better than not having a doctor. And if you do have a doctor, I think the second opinion and the discussion that might come from that is probably clearly on net to the good. Will it make some obvious mistakes? Yes. Obviously, the human doctors unfortunately will too. Hopefully, won't make the same obvious mistakes because that's when real bad things would happen. But I would love to see GPT-4V get more and more traction in a medical context and definitely think people should be able to use it for that purpose. So that brings us to the close of part one. I'm not expecting any major challenges there, but how do I do in terms of establishing my accelerationist bona fides?

Erik Torenberg (09:46) Yeah. I think you've you've done a good job. You've extended the olive branch, and now now we wait with bated breath.

Nathan Labenz (09:57) Alright. So we got two things that I really wanted to and people who listen to this podcast, if you're listening to this, you're going to have seen some of this already online. Certainly if you're on Twitter, you've seen this kind of thing. So the first thing was this paper that came out of Google DeepMind and became a sort of super viral thing where the notion was Google DeepMind research shows that LLMs can't generalize. This kind of thing, they fly off and they're reaching millions of people before you can even kind of dig into the paper and figure out, well, what is this even actually talking about? So naturally, I'm a few days late by the time I get around to reading the thing and really understanding what's going on. But basically, I think this discourse was just totally misguided, took a very small study with a very sort of narrow and focused result, which is like a fine line of inquiry and a fine thing to publish and kind of seized on one sentence of it, which I'll read, and blew it way out of proportion in a way that I think is fundamentally misleading most of the people who are encountering all the surrounding tweets. Just to get into a little bit like what this paper actually said. It's amazing because it's a very, very narrow and focused study. The actual title of the paper, for one thing, is Pre Training Data Mixtures enable narrow model selection capabilities in transformer models. I think that framing right off the top is super interesting. The authors are talking about curating training data in order to enable certain behavior. But the result got flipped into, okay, if you set up your pre training data in a certain way, then you may be able to control what the model can and can't do. That flipped to, oh, LLMs can't generalize. And that is just not something that follows, from this paper. And here's where I do think the authors kinda overstepped a bit. The key sentence that everybody's, like, highlighting and sharing around is in the regimes studied, we find strong evidence that the model can perform model selection among pretrained function classes during in context learning at little extra statistical cost, but limited evidence that the models in context learning behavior is capable of generalizing beyond their pretraining data. And that's probably worth breaking down a little bit more. The idea of the model can perform model selection. That basically means that it can identify what kind of problem it is facing at a given moment in time and apply the right lessons learned from its training to that particular type of problem. In this case, it's so narrow, it's so toy. So they have two kinds of data that they feed into this language model that they train. One is points on a line. Can be any line. Just draw a straight line, take some points out of it. That can be what the model is faced with. And then its job is to predict so you have like x y coordinate, x y x y x y. You have these points that it's given. Now give it another x and it has to predict the y. Basically, can it learn the function represented by these points? If they're all on a line, then the function is aligned. Can it extrapolate and predict from a given x coordinate what the y coordinate point on the line is going be? The other task in the same data mix is just sine curves. So if you give it some points from this sine curve and give it another X value, can it predict what that value is going to be given that X coordinate X, Y, X, Y, X, Y, X, Y, It has to predict the next y. So it's the same task, and there's just two different kinds of functions that it's supposed to learn, straight lines and sine curves. And it has no trouble learning those and doing them well. What they then look at and say, oh, well, here's where it kind of falls short is what if we put those two things on top of each other? What if we take a combination, linear combination of a sine curve plus a line? You can just imagine that as a sine curve that's gradually going up or a sine curve that's gradually going down, because that's what happens if you add a sine curve and align together. And there they did find that it wasn't succeeding on that task. Going back to the title of the paper, Pre Training Data Mixtures Enable Narrow Model Selection Capabilities and Transformer Models. What they're saying is if you take these two kinds of things and you just train on those two kinds of things, then what we see at runtime is it can very effectively distinguish between those two kinds of things and do both. It will recognize which problem it's facing. And then additionally, when we tried overlaying those two problems at the same time, it wasn't really able to do it in the regimes studied. But everybody seems to have kind of blown again past this notion of like in the regime studied and is saying, oh, look at this. It proves that language models cannot generalize. And this is basically insane.

Erik Torenberg (15:08) Hey, we'll continue our interview in a moment after a word from our sponsors.

Nathan Labenz (15:13) For one thing, this is not the kind of generalization that's really in question in the broader debates around, like, how powerful is AI going to get and to what extent and how should it be regulated. Nobody is really, at this point, seriously questioning frontier models' ability to take a little bit of concept A and a little bit of concept B and blend them together. And we've all seen a bazillion of those examples where you say, write a poem in the style of Shakespeare about the nineties Jordan bulls or whatever. And it's like, there probably isn't a lot out there. That's not something it's likely seen in exact combination in the training data. You can try all of these you want, and it can clearly do them. So it can clearly take a problem of a certain structure and a subject or a problem of another structure and find some meaningful combination of those and work with that. The frontier models can't. The little toy models that they designed in the study didn't. But clearly, just get your hands on ChatGPT and you don't even need to pay for the $20 a month to see that these kinds of little of column A, little of column B combinations do clearly work. It's clearly within the capabilities that they have. So, okay, it's fine. You did this thing. You found this one thing. It's a toy model. It's a toy problem. I do think there's something interesting there. If you design your data set carefully, you may be able to start to get some control over what models can and can't do. But you have to design the data set carefully to do that. It's not just a generic notion that it'll never happen. And again, just broad kind of experience and at this point, common sense kind of shows that. The real questions about whether frontier models can generalize are not about that. They're like, can they learn things that people don't know? Can they infer things from training data that are unobvious even to the world's leading experts? And this doesn't really have anything to say about that, I'm afraid. At most it says, if you strategically keep certain data sets out of training, then you may prevent AIs from generalizing into those domains. And indeed, that is a real interesting proposal. Think right now, one of the more credible and concerning near term frontier model risks in my view is the idea that people might be able to use them to create pandemic agents. And people would say, well, who would want to do that? And look, there's a lot of crazy people out there for sure. If you give everybody a sort of biochemistry and virology expert, I think somebody is going to in fact use it for bad purposes. So there's a proposal that's like, okay, well, maybe we should just kind of exclude most academic virology from what language models are trained on. And maybe we could still have some specialist ones that the actual biologists use. But is that a knowledge base that really everybody needs to have in their pocket? Or maybe it should be a little bit more closely held That is at this point, largely academic discussion. I don't even think the policy discussions well, inner circle, think tank kind of policy discussions have gotten that far. I'm not sure that any actual governmental policy discussions have quite reached that level of nuance yet, but it's at least interesting. And this does support that something like that probably makes sense if you didn't already think it made sense, which just on general priors it kind of always made sense. But this study certainly gives a little bit of a bolster to that. But again, it's just a totally different kind of generalization that people are on the far doom case that people are really worried about than sort of the ability to combine two somewhat different problem sets. Okay. That was all just discussion. Sure enough, somebody already reproduced this paper. And I want to give credit to this person, Samuel Muller, I believe. So this person in a collab notebook. So this is also a great example of how some of these small things, hobbyists, individuals with literally no compute, but just like a little ephemeral virtual machine in the Google Cloud can reproduce some of these results. So this guy, Samuel Muller, goes out and does this and shares all the code and, you know, it's all kind of open to look at. Basically, what he finds is if you add a bit of noise to the training data and do everything else, you know, as far as he could tell the exact same way that they did it in the original paper, then it does generalize. And you get over this hump of not being able to handle the sort of combination case of the lines and the sign curves. And this is like one of the oldest techniques in terms of making AIs more robust. Going back to kind of early deep learning, and I'm not a super expert historian on the timeline of these advances in the early 2010s. But it's been known for a long time that if you want to make your computer vision systems more robust, you train them with various perturbations. You have the original, you have like a weird compression of it where the aspect ratio changes. You add noise to it. You maybe add some waviness to it. You can do all these sort of programmatic manipulations that are all kind of weird. But when the AI sees all those variations and is still able to extract the signal through that noise, then your downstream performance gets a lot better. It seems to sort of make the concepts more robust and kind of prevent the super specific overfitting on a particular dataset. And so that's the that's the technique that this guy in a collab notebook applies. Just adding a little bit of noise to these points that are either on a sine curve or on the line. And then sure enough, you know, you see significant generalization. It prevents the overfitting and it kind of seems to start to work. That didn't take long. That took like a couple days. But I'm not sure how many people saw that. Certainly some decent number of this tweet did get a healthy number of likes and certainly reached some people. But I think it was definitely not a shining moment in AI discourse. And I'll read you a few tweets and give a couple of responses to them. But first, let me just kind of pause there and say, does that overview of the research itself make sense or any questions on kind of what was found, what was claimed, then what was kind of subsequently found in the reproduction?

Erik Torenberg (22:21) That was a good overview. I'm just curious if if people like, know, you'll mention this to me in a second, were here. What would they how would they comment on your overview? Perhaps what I'm trying to get at is like, what is the actual substantive disagreement or the crux of the difference of opinion as it relates to this paper? Well, we might have to have him on

Nathan Labenz (22:43) to find out. I think I really resist the temptation to psychologize other people's AI takes and try to engage with the arguments themselves. I think what we'll kind of see here over a course of a few tweets is that mostly arguments are not really being made. Instead, this sort of little highlighted one sentence from this very narrowly focused paper about how you can engineer your training data to control what you know, under certain circumstances, if you can avoid the noise in this case, as it turns out, then you can, you know, have more control over what your model capabilities are. Mostly people are just kind of using that as, I think, a prop to make a more political point or somewhat of a, like, I'm kind of cooler than the crowd sort of point. I don't wanna go too far in terms of, you know, interpreting what people would say. And if there are better arguments, you know, I'd love to hear it. There is, I think, also pretty good. And I give credit to the study authors actually because I think their engagement has been quite productive. So one of the After all, there's The first tweet that launched 1000 tweets, instead of Helen of Troy, I was thinking this tweet is like the Helen of transformers. The original tweet was pretty mundane and just quoted the thing and then said, New paper by Google provides evidence that transformers, GPT, etcetera, cannot generalize beyond their training data. And then that became the thing that everybody was quoting and tweeting on and whatever. And the original author, Steve Yablowski, he says, this paper is continuing to make the rounds. It seems like maybe our paper on in context learning has been taken out of context. So I think they were kind of shocked honestly that this kind of three person paper that they put out all of a sudden became like the supernova, super viral thing that it did. And I think some other interesting commentary, somebody said, in fact, was Adam D'Angelo from Quora, who I think having served on the OpenAI board at one time, if not still, you know, building products in this area, certainly, you know, is paying a lot of attention to what language models can and can't do. His comment was, this paper, a very narrow result, which I'm sure only holds under a lot of assumptions, true, since since proven true, I'd I'd say, provided a good Rorschach test for people's views on AI. Surprising how many of them are expecting progress to stop. Here's a couple of things that other people said that I I just like, I was like, I gotta respond to this. Maybe I'll put a Twitter thread out as well. And I only engage with people here that I think either I really respect, which is the baseline or have a lot of credibility one way or another. So these are sharp people, but I think not serving the broader public as well as they could with some of these tweets. So, Naveen Rao, CEO, founder of Mosaic ML. We've had two members of the Mosaic team on. They exited for north of one billion dollars. They have real chops. No doubt about that. He says, well, the belief was fun while it lasted. Sparks don't always lead to fire, I guess. Again, So, Naveen, please come correct me as to my interpretation here. But this reads to me like somebody saying, I am wise. All of you have gotten carried away, but I was too cool for that the whole time. And the sparks specifically refers to the Microsoft research paper, Sparks of AGI, where there was obviously with this title like that, you're going to stir up some commentary. But I would say rather than sort of mocking that paper and certainly, again, this paper does not invalidate all the findings of what GPT-4 can do that Microsoft Research has put out. So I would really endorse actually people going and reading the Microsoft reports on frontier model capabilities. They did one. The original sparks of AGI was on GPT-4. Then they've more recently come out with one on GPT-4V. And even more recently than that, they've come out with one on GPT-4s impact on science, focusing on a lot of hard science areas, like material science and biochem and chemistry, solving partial differential equations, hard, hard problems. Across all of these, I think they you know, you can, of course, refine and quibble and do your own variations on their experiments, and you should. But if you go read those papers, I think you're gonna come away with a pretty significant and reasonably accurate understanding of what Frontier models can do today. So to, you know, to say that, like, their sense that some important thresholds on some path toward AGI are being crossed with the most recent models to mock that with this narrow result, you know, I think is just like, that's a total non sequitur. And, you know, I think people should rather than embracing that, I think they should go read those papers, and I think they'll come away with a much, much better understanding.

Erik Torenberg (28:25) Hey. We'll continue our interview in a moment after a word from our sponsors.

Nathan Labenz (28:28) Another one that caught my attention, this is from Arvind Narayanan, who is a professor at Princeton and author of a book and I think also a blog called AI snake oil, wrote that this paper isn't even about LLMs, but seems to be the final straw that popped the bubble of collective belief. And again, we have this notion of belief and gotten many to accept the limits of LLMs about time. If emergence merely unlocks capabilities represented in pre training data, the gravy train will soon run out. There's a lot to take apart there. For one, the paper is not about LLMs. So why we would be using this very small kind of toy model line and sign graph prediction research result already by the you know, again, with the add a little noise, and it does in fact generalize, and that's already out there. Why would we would be using that as sort of a way to, you know, shape our big picture beliefs about what the biggest and most powerful systems are capable of, that just doesn't make sense. So I don't think anyone should advocate for sort of extrapolating up from these toy examples to worldview scale beliefs. That just seems totally misguided. But the other thing that I think is even more important here is if emergence and there's been a lot of debate, what is emergence? Is it a mirage? Whatever. I would say probably the most interesting definition to me is capabilities that either come on very quickly or are just highly unexpected relative to kind of baseline performance. And we definitely have some examples of those. Another really interesting definition would be things that humans can't do. And we do have examples of those from all sorts of narrow AI systems and even now more recently, some from like GPT-4 type systems as well. But whatever definition you want to use for emergence. If emergence merely unlocks capabilities represented in pre training data, the gravy train will soon run out. I don't think that's true at all. I think that what this is missing is that the training data that we have and the training data that we might expand to start using is so vast that it is bound to contain all sorts of information, which if grokked will lead to superhuman capabilities. To take one kind of mundane example, it's not mundane, really. Mean, it's mind blowing, but we adjust very quickly. I would say it's probably safe to say that GPT-4 can speak the world's languages better than any human that has ever existed. No single human would I put against GPT-4 in a world's languages speaking contest. Now you could say, well, that's not surprising because all the languages are in the training data. Sure. But what else is in the training data that an AI might be able to pick up on that humans cannot individual humans cannot do? I think one very obvious thing that is likely to start to happen, especially given the success we've seen in the multimodality with vision, is that people are going start to throw other modalities into language models. Of course, this is already happening, but has it really been scaled up to the degree that it might? Not yet. Let's say if GPT-4 is trained on 10,000,000,000,000 tokens, what if the next version has a couple trillion tokens worth of genomic data in there or proteomic data? I don't think it's even hard to argue at this point, really. But it certainly wouldn't surprise me if the resulting models have capabilities that no humans have ever had. We are not good at looking at raw DNA sequences and predicting things. We do build tools to do that, and we can do it certainly somewhat well. But just like we saw AlphaGo play the mythical Go move that no human would have ever played, but it was actually genius. I think there's probably that level of information in just a huge boatload of DNA data that we haven't even built tools to work on just yet. And the result just this week, again, the timelines are unbelievably short. There was a paper out of Google from authors, including Vivek and Tau who were on for our MedPalm 2 episode where they are starting to do that. They are starting to get real insight into various genetic interactions and what's causing what in ways that are genuinely novel. They're like, they're getting useful novel hypotheses out of their latest MedPalm system. I think the key concept here that I really want to focus on is there's a lot more represented in the current training data and a lot more represented in the expanded training data that might go into a GPT-5 type thing than any human could ever read or process. And there's bound to be enough signal for a sufficiently powerful system to develop capabilities that no human has. What those are? Some of them we can probably predict, others we probably cannot predict. As Sam Altman recently said, it's a fun guessing game for us to guess what GPT-5 is going to be able to do. But I do not think that the gravy train is running out as long as we're continuing to hyperscale. And I would say, like, even even the argument that is given here is kind of self defeating because it's just already clear that AIs can do superhuman things in some ways based on the vastness of the training data. I don't think this is a snake oil question. Jim Fan was another one. This one I this was like, I love this dude's work, but his comment was transformers are not elixirs. Kind of a similar comment. Machine learning 101 got to cover the test distribution in training. Again, this is like this notion that those of us that have the fundamentals, those of us that did the machine learning 101, we know that, you know, these kind of basic things are are true. But again, it does not follow from the idea that you gotta cover the test distribution in training that things can't generalize in interesting or unexpected ways because the training data contains more than we have extracted from it. That is just, like, you know, I think manifestly obvious upon, you know, any serious reflection. So, you know, I just I hate to see this kind of stuff because I'm like, the worst thing, you know, I think people can do is kind of confuse the public or leave people feeling like, hey. This isn't that big of a deal or it's nothing to worry about or it's something I need to, you know, spend any of my time preparing for when in fact, like, I think the exact opposite is true. You know? And I I wanna see these thought leaders giving the, you know, the broader public, a clearer sense that, like, these are going to be super powerful systems that, you know, they they, in all likelihood, are they already do have, and they are all, in all likelihood, going to develop many more capabilities that no human has. And, you know, just whatever little, you know, nuanced, fine grained research results we may find along the way, like, big picture is already pretty clear. It can be an S curve, and the top of that S curve can still be higher than human. And this is where I kind of disagree with Amjad or I would challenge his thinking. His comment was, and this is again, you would struggle to find many companies that I'm a bigger fan of than Replit. And you can listen to our Replit episodes for confirmation of that. So as CEO of Replit, I definitely have a tremendous amount of respect for everything that Amjad has built and accomplished. But his comment, I don't agree with. He comments, I came to this conclusion, this conclusion meaning LLMs can't generalize beyond the training data sometime last year. And it was a little sad because I wanted so hard to believe in LLM mysticism. And again, there's that belief concept and that there was something there there. And this again, I'm like, I don't think that the question is like mysticism or a there there that almost suggests like that we're into some debate about consciousness or subjective experience, or is it a moral patient, something that we owe some responsibilities to? Mean, there's a lot of interesting questions there that kind of get evoked in my mind when I read, is there a there there or not? All of that is super interesting, but it's kind of outside of the scope of a question of, again, can the technology that we have already and certainly at future scale learn things that people don't know, develop capabilities that people don't have. It's obvious that the answer at this point is yes. And you don't need to appeal to any sort of mysticism to believe that. You just need to look at these systems that currently exist, look at what they can do that no individual human can do. And again, language itself is a pretty good starting point. I don't know why we would want to dismiss that point of view with mysticism. If I were to speculate, I would think this whole thing is being kind of used as a prop for a more political debate around what should be done. But, you know, I would I would really encourage everyone to separate their analysis of what is from what should be done about it. If we start to mislead or, like, you know, embrace pretty obviously wrong headed conclusions about what is, it cannot be good for our downstream discourse of what should be done about it. And I think this was kind of put into maybe here, I don't have to speculate, but Twitter user accelerate harder says in response to all this, the AI executive order will only continue to look more foolish from here. And I don't think that necessarily reflects everybody's point of view that I've kind of highlighted in this discussion so far. But I think it is kind of a big part of where this is coming from. The idea that we don't want regulation. We know that. So if we know that, then we see any evidence that AI could be super strong or could get out of control as encouraging the regulation. So we want to downplay that. And if we do see and you got to kind of grasp for it at this point. But if we do see evidence that AIs are not going to become super strong or they won't ever overtake humanity in key ways, then we really want to amplify that. And so I do think a lot of this was rushing to amplify this ultimately misleading statement about what is as a way to promote a agenda about what should be done. And I would really strongly encourage everyone to keep those things distinct in their minds. It's just honestly good intellectual hygiene, I would say, to do so. And also just from an attitude standpoint, this label that I've given myself of the AI scout definitely is an homage to Julia Galef and her book, The Scout Mindset, where she contrasts and maybe we should have her on to talk about this. She contrasts the scout mindset, which is really focused on what is true, what is really going on, against the soldier mindset, which is how can I advance my side in some intellectual or ideological conflict? And I just think this may be the time to build, but it's definitely not the time for ideology. Really is a time to these things are so confusing. The surface area is so vast. There are so many surprises. They are weird. I always kind of say they're more like alien intelligence than human intelligence, AI alien intelligence. Keep all those things in mind. The scout mindset is the mindset that we need to have. And if there's any kind of criticism blanket that I would put on this group, it's like, that's a little bit like soldier mindset. And that's not really what we need right now. Let me just quickly give some credit to people that I thought had good comments and we can move on to Well, I'll let you challenge me anywhere you like and then we can move on to part two. Twitter user, I'm not sure if I'm saying his name right, but people will know him, Visa says, quote, don't feel bad, GPT. Few humans can either. Meaning few humans can extrapolate beyond their training data. I think that is actually a pretty profound point. There aren't that many Einstein level eureka moments where people are like, nobody's ever conceptualized this this way before, but here I go. Most people are not doing that. It is super important when it happens. And, you know, thus far, it largely has not been demonstrated by, you know, even by even by the the the biggest and best language models. But it is notable that, like, this is not, you know, a general capability that we observe in people either, you know, to go well to go outside of, you know, what we've seen or experienced in our lives. A couple of kind of fairly generic comments. A guy named Eugene Vinitsky was the first I saw to say, people are drastically overreacting to this paper or just not reading it. That's another one where I'm like, yeah, guys, read the damn thing. You know? People are tweeting, retweeting, and quote tweeting on these things so fast. At this point, you know, I think we should all kind of take a breath and try to actually understand the research before commenting on it. Not too much to ask in my view. Ethan Mollick, I I always find to be informative. I think his commentary on this was very good. Seems relevant that we are increasingly throwing all of human written and visual history into the training data. Exactly. It's like but you don't have to generalize beyond the training data if just generalizing to the training data is enough for superhuman capabilities. That's really kind of the key question. Are we gonna see superhuman capabilities? I'm not saying we necessarily are, but we we certainly know that if you understood everything that is in the training data, you would be superhuman. So, you know, will they get there? Maybe, maybe not. But it's not gonna be because, like, the training data doesn't have more to tell us than we've already been able to glean from it. Meg Mitchell, I also thought had a pretty interesting and and kind of nuanced comment that that was interpreting the paper the right way. And she's definitely somebody, you know, I've disagreed with on kind of the importance of tail risks. She tends to focus much more on, you know, near term, you know, immediate harms, biases, all of which I do think are important too. But historically, with her, it's been this kind of like, don't worry about the big problems. Those are all fake. It's the small now problems that are important. I would say, you know, broaden your view and and take both into account. But I did think her take on this was quite sharp. She said, if this is reproducible for LLMs, huge if for one thing, then if you care about the safety of AI systems, it's another reason why we need to measure data with at least the same scientific rigor as we use to evaluate models. By understanding the data, we can understand what the model may do. And that really is the spirit that the original research was in. If we can, you know, engineer a dataset, then we can, you know, potentially find these, you know, kind of mixtures of and maybe we can even sort of start to begin to define what the boundaries will be of what something can do. I think in view of the noise result, it's very hard to take all the noise out. It's easy to modulate noise when you're, like, predicting graphs and, you know, straight lines and sign curves. There really is no path to take the noise out of high scale, you know, training datasets. So I don't think there is a way in which this really will generalize to large scale LLMs, but at least think, you know, that that framing is the right framing. And then finally, I want to highlight, Lee Sharkey, who's one of the founders of Apollo Research. This is a new red team organization that recently put out some research on creating certain conditions under which GPT-4 would deceive its user, which is something that I never experienced as a GPT-4 red teamer. But, you know, I wanna read their stuff a little bit more closely, and it will probably be the subject of a future episode. That is a key question in my mind. Does the AI start to deceive its user? But he said, you know, the reactions to this are insane. It's amazing to watch people deny something blatantly true that is right in front of their eyes. Of course, GPT can generalize literally just say anything new that it can't have seen in the training data, and you'll see. So that's Lee Sharkey. Definitely, you know, some interesting research coming out of that group, and, I thought sharp analysis on this particular point. So, follow Lee Sharkey on Twitter. Here's where we kind of start to turn a little bit more toward the policy discussion. And I would say a little bit because, again, it's all kind of implicit. And we're not yet talking about an actual policy proposal, let alone a policy, let alone a regulation, let alone a law. What we're talking about in part two is one person and a number of kind of cosigners who posted about some voluntary responsible AI commitments. The person and, again, probably everybody's seen this tweet at this point, if only for the retweets. Apologies if I'm not saying the name correctly. Hemant. So he posts on Twitter today, 35 plus VC firms with another 15 plus companies representing hundreds of billions in capital have signed the voluntary responsible AI commitments from responsible labs, the nonprofit I cofounded. There are some notable companies and names in here. Probably the biggest company in the original tweet is inflection. You know, big funds include SoftBank, include general catalyst, include insight partners, Intel Capital, IVP, Lux Capital. One former guest, which is Arthur AI also, was on the list. So, you know, it's not a huge list, but, you know, there's definitely some notable participants. And here's the five commitments that they have voluntarily made. One, a general commitment to responsible AI, including internal governance. Two, appropriate transparency and documentation. Three, risk and benefit forecasting. Four, auditing and testing. Five, feedback cycles and ongoing improvements. They also put out, like, a handbook for best practices for this stuff. And here's the part that I thought kind of was the, you know, the clearest articulation of the argument. It's kind of a long post, but here's the my highlight. We strongly believe in the power of AI to transform our world for the better. Our role as investors is to advocate for our startups and the innovation economy from day one. It's almost worth reading again. We strongly believe in the power of AI to transform our world for the better. Our role as investors is to advocate for our startups and the innovation economy from day one. Everybody saw the executive order last month. The reaction in the Valley has generally been to denounce it. The reality is that right now, it's largely just reporting requirements. However, there is a risk that it devolves into regulation that slows innovation down and makes America and its businesses uncompetitive. But the right path is not to be antagonistic toward DC. We in the Valley need to learn that this is not about regulation first innovation, but about innovation at the intersection of technology, policy, and capital. We have to embrace collaboration with our elected leaders. And as investors, we must hold ourselves accountable for what we fund and found. Okay. Now I did not sign. I did not cosign this. In fact, I've not actually cosigned any of the statements that people have have, you know, signed. Why? I'm just a little bit kind of generally averse to, like, oath swearing, and I find it not super conducive to the scout mindset that I wanna preserve to, like, be signing on to things. Then I have to defend those things and, you know, maybe don't necessarily agree with everything. I wasn't even asked to sign on to this one. But in general, I'm not like a big oath signer. But you would think that the sky is falling from the reaction, which has broadly been pretty hostile to this set of voluntary responsible AI commitments that these 50 organizations have made. And I would say this is basically just common sense. With some of it, you could kind of say, well, hey, in my particular context, I don't maybe need to do every last point that you recommend in your playbook. That's why, again, I'm a big believer in continuing to exercise judgment, and I'm not signing on to this in any sort of blood oath, but a general commitment to responsible AI, including internal governance. Okay. That's pretty general. Appropriate transparency and documentation. Well, we're left to kind of interpret what's appropriate. Risk and benefit forecasting. I I would think you'd be doing that in almost any significant upgrade or release of a product. Certainly want to figure out, is this going to work for our customers? It seems almost pretty consistent with that. Auditing and testing. This is maybe one that you know, would be a bigger burden on some companies relative to what they're used to doing in terms of software testing. But, you know, it feels appropriate to me. And again, there's an appropriateness how deep do you need to go. It depends on your use case. If you're doing a very narrow use case, like at Waymark, we help people make video scripts. The worst thing that those video scripts could be we help them make videos and the language model writes the script, I should say. But, you the worst thing that those things could be would be, like, hostile or toxic or racist or something. And, like, that would be bad. But, you know, it's a fairly contained harm relative to some other possible harms. So don't think we should necessarily be held to the same standards that a Frontier lab would be. But nevertheless, it's on us to make a good product. At a minimum, we should be confident that it's not going to go off the rails and, like, start antagonizing our users. Like, we have seen this year from Microsoft as a reminder. So, you know, we're we're only 9 months out from the launch of Bing Chat and and Bing Chat going so far off the rails as to tell a user to divorce his wife, I don't think it's crazy to think, hey, maybe we should do a little more testing with our AI products than we used to do. We've got an object example of what happens when you rush it out the door and fail to do that. And it, you know, can really blow up on your face and be in the front page of the New York Times. Is this something that's, like, altruistic to the public? I think yes. Is it something that's in your interest as a business to make sure that your shit is working as you intend it to work? I would also say yes. And feedback cycles and ongoing improvements. I mean, again, if you're just if you're building any sort of software product that's, like, basically canon, you know, sort of discipline of product iteration, just kind of applied to AI with, you know, a little bit of kind of fleshing out of best practices. So it seems like pretty mundane, right? Maybe I don't want to sign it because I'm an oath signer. Maybe some of these things are a little bit more than I think I need in my particular context. But it hardly seems like the basis for an ideological war. And yet that is exactly what reaction has been. I could read some of these in detail, but here's Balaji, obviously, friend of, the network and friend of some of our shows. Free Internet means free AI. I like Hemant and many of the people on this proposal, but fundamentally disagree with the philosophy of capitulation therein. We will fight government control over compute with everything we have. So my initial just take on this is it's an ideological position. People have made some voluntary commitments to try to uphold some certain standards. And to call it a philosophy of capitulation is framing the entire situation, the entire technology revolution in an ideological frame, which again, I just don't see why we need to do that. I do. Well, I anticipate some of what I think the concerns are, but let's hear from you first and then I'll give my reaction to that.

Erik Torenberg (53:49) The concern here is that this is a Trojan horse or a wedge into sort of a governing body that has the, you know, sort of reputational credibility and then the legal ability to regulate who or who not can can innovate. And we saw a lot of these players who were complaining saw what happened with social media over the past decade, where the people who were building the social media companies were very contrite. They were very apologetic. They were very naive. And as a result, they got absolutely dominated by sort of regulatory bodies on the censorship front. And they lost the sort of credibility war. They lost the moral war. And it's because they were very reasonable in the same way that this note that Hemant has is very reasonable. But the enemies of social media companies were not so reasonable. New York Times calls itself the literal truth. It doesn't do as much introspection as Facebook does or let itself get regulated in the same way. And in a vacuum, things like this are very credible. They are very reasonable. The concern is what they what are the implications of it and what they can be used to do. And Mark Andreessen likes to talk about how there's an alternative world where, you know, we're in the nineties again or where when the Internet is getting started and there is a sort of governing body that determines who can or who cannot start a website. And before before having to do so, you have to go register it and imagine all the permissionless innovation that would have never occurred as a result of it. So, yeah, you've anticipated. Well, people are concerned about what this implies and people are concerned about sort of conceding even an inch because they know they're in a war. They know that there are people who it's the Baptist and the bootleggers thing. Some people have have good faith and that's great and other people don't. And they they seek to either on the regulatory capture side or they just they may have good faith, but they absolutely think capitalism is evil and seek to regulate it. And they will use your your good norms against you. And this is why sort of moderate people always lose is because extremists tend to just use moderate principles against them because they don't apply them themselves. The extremists don't apply their principles of good faith or free speech or whatever. But when a moderate sort of goes against their own principles, they'll say, hey, wait, you believe in free speech or you believe in, you know, good faith or you believe in being reasonable. And so that's partially why the people are trying to match the extremists of the people who seek to sort of regulate them. Like, you know, there was an AI group yesterday, the test grill. I forget that. Know, I can't exactly remember her name advocating for, you know, responsible AI's community support for Palestine. Right. So there's there's obviously like a political agenda between what behind what they're trying to do. Anyways, that's a bit of distraction. I've made my point as to why people see this as as a concern because they see it as as morally justifying a small amount of people determining, you know, who can or cannot innovate. And they don't they don't want to concede even an inch that could that could lead to that.

Nathan Labenz (57:03) It's only a war if everybody involved thinks it's a war. I don't think there's any reason that we should assume it's a war. And if we approach it in a non war framing, I submit we will probably have a lot better results. It's comical and it's kind of don't look up ish, but if you imagine an actual alien life form showing up on earth and dropping in on us, one would hope that we would come together first and try to figure out what is this thing? Are they coming in peace or not? What capabilities do they have that we don't have? Is this something that we are going to benefit from? Is this something that we are going to be harmed by? Is it a weird combination of that? It might happen the same way that it might be Our response to literal aliens might be immediately polarized. But in that, as long as we're imagining fictional worlds, we notably did not have a committee on who can start a website in the nineties. So we can imagine a lot of terrible decisions that we did not in fact make. We have no actual rules. Anybody can do whatever they want. We have one kind of toe in the water reporting requirement. If you want to make something 10 times bigger in compute terms than GPT-4. And that is going to probably affect 5 companies next year, 5 to 10 maybe. And they're not required to do anything other than tell the government that they're doing it. Okay, that's like super heavy handed as it is right now. But my other thing is if you want to avoid regulation, just let's go back to the original poster. The original notion here, it's our role to advocate for startups and the innovation economy from day one. There is a risk that this will devolve into regulation that slows innovation down and makes America and its businesses uncompetitive. The surest way to that outcome is a self radicalizing technology sector that can't even pay lip service to protecting the public. All these people have done is said, we have some best practices for how we're going to build AI products. And they have been absolutely shit on by the technology industry at large. And nobody is sympathetic to that. The people are not with the technology sector on this. Every survey and there are dozens at this point that look at the public perception of AI, the public wants government action on this. This is by the way across parties too. It's not inherently a partisan issue. The Republican voter in today's world is not super favorable to big tech. And the Democrats maybe are in some ways on certain censorship questions, but they're not when it comes to concentration of power questions. So nobody is really on big tech side here. Neither party nor the public are sympathetic to the view that we should just not even try to do a good job. And what is the alternative to this? The alternative is we're going to shit on the people that are trying to establish some best practices. And then that's what's going to bring down the heavy handed regulation. It seems if you want to prevent that regulation, show me that there's no problem. Show me that you have it under control. Right? I mean, social media, whatever, that's super complicated. I wouldn't claim expertise in what social media companies should or shouldn't have done or whether they could have done better. Clearly, they could have done better on some things, many things along the way. You're not going to scale to every country in the world and 3,000,000,000 monthly users without some issues. But if we think that was disruptive or we think that was like a heavy handed response, either way, AI is going be way more than that. It's going to be way more disruptive and it's going to bring a way more heavy handed response than what social media platforms have brought on themselves. Social media platforms at the end of the day, until recently, are still just people talking to each other with like a layer of sort of curation and amplification, which is where I do think the platforms have some responsibility. But now we're entering into a world where you've got AIs that can plan, that can reason, that can use tools. Not necessarily super well today, but keep in mind two years ago they couldn't do it at all. So people are rightly outside of the field, outside of those that feel like, oh, I know how it works. I can hand code a transformer. Therefore, I know what's going on. Therefore, I can confidently say nothing bad is going to happen. Outside of that set or the purely ideological, people are looking at this rate of change and they're like, I don't know what happens next. Many of the leaders in the field are admitting that they don't know what happens next. And some of the leaders in the field are shitting on their brethren in the tech sector for just putting out some best practices. So I think that this is about to be the most self defeating position that the technology sector could really take. And again, the original post is framed as let's avoid heavy handed regulation that strangles our ability to innovate by self regulating, by holding ourselves to some standards. Good God, you know?

Erik Torenberg (01:02:37) I support self regulation. And in general, I think a more productive stance would be that the EAC folks come out with their own do as you say, basically show that we've got things under control. I think that that would be the best response. Also, I understand I sympathize with the actually, I like Hemant. I I think they they do good work. But I I sympathize with the critique of what Hemant is about, which is not it's not just this post in a vacuum. It's this broader idea or philosophy that I don't want use ideologies heavy handed, but philosophy of sort of responsible tech, which I think in the in the in the pro tech view does not give enough support or encouragement for or explanations for why tech is so good. Remember that we we live in an ecosystem, and this is why Mark felt compelled to write his manifesto, that is heavily anti tech or anti innovation that is far more concerned. We see this with the FDA and even with self driving cars. Right? There's way more concerned about the lives that you could see in front of you that are, you know, affected negatively. Right? A self driving car by accidentally kills someone. Oh, we must get rid of self driving cars because they can't see or it's not as easy to imagine all the lives saved. Self driving cars is your favorite example of your your acceleration is what we have people that say that they're absolutely dangerous. And so if someone comes out with, hey, responsible self driving cars, you know, similar to how much just did in some ways you'd sympathize. Of course, we want to be responsible. In other ways you see, hey, this is really just justification to the people that want to get rid of self driving cars when really it's going to save so many lives. And so if we say we want responsible self driving cars, it's also, you know, incumbent upon us to also just always reiterate how many lives this is going to save, because that's harder to see. It's harder to see the amazing impact of a of a drug that could save lives or or social media or or even AI, right? I mean, it's it's much as to be scared of a new technology than it is to embrace it. And so this is what I see the EAC folks is trying to correct is just constantly kind of shift the Overton window or shift the conversation to reminding people about the positives of this because it's it's it's easier to imagine the negatives and to give credibility to to to the negatives and to emphasize the negatives. So that is partly how I see this this critique of this this letter. But I but I do agree that that self regulation could avoid ham handed regulation. I I think it's just that they don't want or partially that they don't want Hemant or that that sort of philosophy to be the spokesperson for it, to be to be the voice of it because he's too European. It's too, like, too much like the EU, too much like it's too negative on tech. It doesn't it doesn't it comes from the wrong perspective, at least in the EAC view, which which I sympathize with on on on this issue.

Nathan Labenz (01:05:32) Yeah. I mean, I guess a couple thoughts there. I've said my EAC prayers on this episode already. So I don't know who jumps into the middle of this podcast, but if you missed that, it was at the top. So I'm probably sympathetic on this too. But imagine for a second if the self driving car companies took the EAC perspective. It would make no sense. What if they came out to the public when there was a fatal accident and they were like, Fuck you all. We're doing this. We don't care. Some people are going to die. That's the way it is. We have no standards. We refuse to put any sort of stake in the ground around what we're going to do to protect the public. And you're just going to like it. They would be out of business in two seconds. They, of course, are not doing that. It would be insane for them to do that. And I kind of contend that it's insane for the AI builders to do that given the uneasy mood of the public and the regulators and basically everybody else but themselves. And they are what's funny about and the reason I kind of paired these two things is, on the one hand, you have a lot of people who are like, there's nothing to worry about. This technology isn't that powerful. But these people aren't saying that. They are saying that the technology is powerful. They are saying it's going to be transformative. And yet they refuse to take on any responsibility for that. And that is where I'm like, it may be as you say that they're sort of trying to shift you over to a window. What I would say to them is in response to that, if that is in fact the strategy, is like, I would recommend communicating less strategically and more earnestly. What do you really believe? Let's just get you know, there's and it's enough challenges coming to, like, clarity on what is true. So let's just start with that. You know? Can we just say what we really think is actually happening? How powerful is this technology? How risky is this technology? You will not get any of that from the EAC crowd. There is there is a sort of you know, you talk about, like, mysticism. There's like a mysticism. There's sort of a it's a universal law of the universe that, like, everything's headed this way and it's you know, we're gonna be eclipsed by AIs and, like, maybe that's even a moral good thing or whatever. But we have nothing to worry about. I don't really see how that, you know, can work. And if it is strategic communication, I think it's badly going to backfire in all honesty. I guess my final challenge to this group would be this. There is existing law that governs product liability. And if you think you have nothing to worry about, then would you object then? Or would you accept working under existing product liability law? And I went to my favorite AI answer engine, Perplexity, to get a little clarity on what is the nature of product liability law in The United States. Two quotes that Perplexity gave me stood out to me. One, consumer product liability law in The United States refers to the legal responsibility of all parties involved in the manufacture and distribution of a product for any damage caused by that product. This includes manufacturers of component parts, assembling manufacturers, wholesalers, and retail store owners. So as I read that, the default situation is that if you have a hand in building a product, whether it's an AI product or a toy in in a toy store, and it hurts someone, then you can be held responsible for that. And then here's another further quote from Perplexity. In a product liability case, the law requires that a manufacturer exercise a standard of care that is reasonable for those who are experts in manufacturing similar products, which I think is pretty interesting because what I kind of see as happening here and what I hope really does happen is that there is an industry driven race to the top. Race to the top is, I think, a phrase that obviously it's been coined in many different contexts. But Anthropic, I think, has kind of led with this notion of we want to create a race to the top. We want to demonstrate that it is possible to build frontier technology, build a successful business on that technology and have the highest safety and ethical standards in the game. And they've obviously done quite well on that front so far. With this new set of standards, it does sort of create not a law that you must follow some rule, but it does start to create if these expert standards were to become acknowledged, it does create the potential for liability. So I'm kind of like, maybe that is some of the motivation for why people would be so against this. Because if it can be dismissed or if it can be prevented from being understood as a reasonable standard for experts in manufacturing similar products, then we don't have to worry about liability. But again, I would invite anyone, whether it's Balaji or Jeremy Howard or Martin Shkreli or Martin Casado, former guest from a16z, if he wants to come back or Steven Sinofsky, who compares this to that 90 gives that 90 scenario of, geez, imagine if this prevailed when databases were invented. I mean, look, dude, a database is an inert tool. The AI, it can plan, it can reason, it can use tools. Again, maybe not that well yet, but two years ago couldn't do it at all. I would invite anybody to come out and tell me why, if they really want to defend this view, why there should not be product liability in AI products, especially if we have nothing to worry about.

Erik Torenberg (01:11:27) I agree with you that the current approach of kind of being more about vibes than concrete arguments or not, you know, sort of putting a stake stake in the ground as you mentioned or just kind of being somewhat laissez faire is not as effective as I think it could be or not as reassuring to people who would otherwise be supporters. And maybe maybe you're maybe you're in that camp given your acceleration as, you know, bona fides as you as you established earlier in the episode. I think a more effective tack would be to talk about one self regulation as we've discussed, but to also the real dangers of of regulation and and really, you know, hammer home the history here as it relates to social media, as it relates to nuclear, as it relates to the FDA, as it relates to just just explaining public choice theory in general and saying, hey, we've got a couple we've got scary options here. You know, we think this this path is the least scary. We're going to make it as as comforting as possible, as encouraging as possible. But but the, you know, the concern of going too far is is outweighs the the concern of of not going far enough, even for the goals that we all share and establishing common ground. I think that would be in a more effective path or more reassuring path. And, I would love to have Balaji or any any of the people you mentioned. They're they're they're friends. They're they're they're great people in my book. Let's let's continue the conversation. I I think this was a a good a good opening Volley.

Nathan Labenz (01:12:59) Love it. Well, it's always a pleasure. Let's resist the temptation to polarize the discourse prematurely. And I'm all for painting a positive future, but I really think there is a bubble in which a lot of these folks are operating in which they are kind of missing the fact that the public is not with them on this. The public needs reassurance and that the elected leaders will be doing the public's bidding. They will be following the public will if they come down in a heavy handed way. So don't invite it. I don't want that either. I'm an AI application developer. I don't want to have a bunch of stupid bullshit reporting requirements put on me when I use 10 orders of magnitude less compute than OpenAI does. Nobody wants that. I don't want to have to click another stupid GDPR banner every time I want to use an AI product. There's plenty of ways that this can go stupid, but don't be stupid. The opposite of stupid is not smart. And I think we're kind of right now, I had to that's an LAS or original. But right now, I kind of see the EAC as imagining a stupid thing. And yeah, believe me, there's plenty of historical precedents. I'm with that. But we haven't done that stupid thing yet. We're imagining that we've done that stupid thing. We're polarizing ourselves to be the exact opposite of the stupid thing. And unfortunately, the opposite of stupid is not smart. It's just another form of stupid. And we really need to be better scout mindset people and really focus on what is true before we start loading everything into a polarized frame about what should be done. So, yeah, let's let's book some EAC guests and see if we can get beyond the the polarized discourse and into some, you know, some real credible positive vision.

Erik Torenberg (01:14:53) Perfect. Let's, let let let's wrap on that. Nathan, it's always a it's always a pleasure.

Nathan Labenz (01:14:58) It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcr@turpentine.co, or you can DM me on the social media platform of your choice.

Universal Medical Intelligence: OpenAI's Plan to Elevate Human Health, with Karan Singhal

Intelligence with Everyone: RL @ MiniMax, with Olive Song, from AIE NYC & Inference by Turing Post

Mathematical Superintelligence: Harmonic's Vlad & Tudor on IMO Gold & Theories of Everything

AI Discourse Deranged: Assessing LLM Generalization Takes and Polarizing Regulatory Debate

Watch Episode Here

Video Description

Full Transcript

Transcript

Nathan Labenz

Read next