Watch Episode Here

Read Episode Description

How will GPT4 change the world? How will US-China 'racing dynamics' play out and what are the implications for AI safety? Nathan Labenz was invited to record a special "emergency" episode of ChinaTalk podcast this week to discuss the implications GPT4 will have for policy, economics, and society. Thanks to Jordan Schneider of ChinaTalk, and fellow "AI justice league" guests Zvi Mowshowitz of 'Don't Worry About the Vase' and Matthew Mittelsteadt of Mercatus for letting us share this episode.

(0:00) Intro
(2:09) GPT-4 emergency podcast
(9:26) GPT-4 use cases
(22:51) What GPT-4 can and can’t do
(35:50) AI safety
(45:38) OpenAI v. Anthropic
(48:54) Governments’ role in AI
(55:50) AI will improve physical health and healthcare the most
(59:19) Facebook’s LLaMA model
(01:05:55) VR/AR
(01:08:59) Concerns with GPT4
(01:15:26) GPT-5 and GPT-6
(01:18:45) Optimism in the AI revolution

*Thank you Omneky for sponsoring The Cognitive Revolution. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work, customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.

Join 1000's of subscribers of our Substack: https://cognitiverevolution.substack

Twitter:
@CogRev_Podcast
@labenz (Nathan)
@jordanschnyc (Jordan)

Websites:
https://www.chinatalk.media/
cognitivervolution.ai

Full Transcript

Transcript

Nathan Labenz: (0:00) Hello, and welcome to the Cognitive Revolution, where we interview visionary researchers, entrepreneurs, and builders working on the frontier of artificial intelligence. Each week, we'll explore their revolutionary ideas, and together, we'll build a picture of how AI technology will transform work, life, and society in the coming years. I'm Nathan Labenz, joined by my cohost, Erik Torenberg.

Sponsor: (0:22) Omneky uses generative AI to enable you to launch hundreds of thousands of ad iterations that actually work, customized across all platforms with a click of a button. I believe in Omneky so much that I invested in it, and I recommend you use it too. Use CogRev to get a 10% discount.

Nathan Labenz: (0:40) Today's episode is our first ever crossover show, and I'm glad to say that it's with Jordan Schneider of China Talk. While I know very little about China, most of what I do know I've learned from listening to Jordan's show for over three years. It's been fascinating to follow his evolution as he's broadened his focus from China analysis to the study of bureaucracies and US industrial policy, and now also the breathtaking developments in AI. Jordan invited me, along with Zvi Moskowitz and Matt Middlestat, who Jordan will properly introduce, to join him for an emergency podcast to discuss the highly anticipated release of OpenAI's GPT-4. I enjoyed the discussion so much that I thought we should share it as a Cognitive Revolution episode as well. We cover a lot of ground, starting with what GPT-4 is, what it can do, what we think it might mean, and we also discuss my experience participating as a GPT-4 red teamer. This episode is more conversational than our typical episode, and I'm interested to hear what you think. Please let me know if you'd like more of this sort of rapid response analysis or if you'd prefer that we get back to interviewing builders. We're still in the early experimental stage of making this show, and your feedback will be invaluable. You can tweet at us at cogrevpodcast or DM me at Labenz. I hope you enjoy this discussion about GPT-4.

Jordan Schneider: (2:08) GPT-4 Emergency Podcast. We're going to talk about how this model is different and what the implications are for society, economics, national security, and policymaking. I'm not really sure what we're going to talk about over the next hour, but I have a fantastic group of guests here. Zvi Moskowitz, a blogger at Don't Worry About The Vase, which is a fantastic Substack you should all check out. Nathan Labenz, founder and R&D lead at the video creation startup, Waymark, which is an OpenAI customer. He was also a red teamer on GPT-4. We'll get into what that means a little bit later. As well as Matt Middlestat, research fellow at the Mercatus Center on AI and Operations. Welcome to China Talk, everyone. So what's different about GPT-4?

Nathan Labenz: (2:51) Well, a lot. Starting with the very basic stats, this is a next generation model. The details of it are not disclosed, but it is very safe to assume that it is a bigger model than the current version of ChatGPT as measured in parameters, training data, and compute that has been poured into it. With greater pre-training comes greater general intelligence. So this is something that has just a higher level of capability in the raw compared to what they have published in the past.

It has also received a lot more RLHF and similar techniques than previous models. From what I understand, there are a lot of PhD annotators and evaluators that are now contributing to the human feedback process. So we've graduated now from something that was finding people on Upwork or Mechanical Turk to now where you really have to have expertise to be evaluating these models. The outputs, I think, will reflect the expertise that has been poured into it.

It's also a bigger context window than previous models. The last generation was a 4,000 token context window, which is about 3,000 words. We were starting to see some 8,000 token models, including from Anthropic. Claude goes to 8,000 and has for a while. But now with the GPT-4 models, there are two. The baseline is 8,000 tokens, which is again about 6,000 words. That's like 45 minutes worth of real-time conversation or maybe 10 or so pages of single spaced text. That's quite a lot. You can fit a lot into that. And they also have a 32,000 token model, which is kind of blowing everybody's mind because that's enough for four times as much. That means you could have a three-hour real-time conversation, which as you start to think about what fits into a three-hour conversation, probably most of the important conversations that you've had with your doctor could be condensed into fitting into that range. Those are some of the headlines, but there's more on others' minds and certainly more to discover as people get their hooks into this thing.

Jordan Schneider: (5:18) Yeah. I was literally screaming, home alone in my living room, when they showed the napkin sketch to website translation that the model could do on its own. But yeah, let's go around the horn. Matt, what stuck out to you?

Matthew Mittelsteadt: (5:33) Well, coming to this from a policy research angle, my first thought was to look at its potential as a research tool, and specifically a research tool for what I deal with, which is policy research and things that touch biased issues like politics. In terms of the user experience, what I found is that it's leagues ahead of ChatGPT and GPT-3 or 3.5, the models that came before this.

When I typed in prompts about complex policy issues, for example yesterday I was asking it about big questions about industrial policy and how to manage the United States economy and industries. It answered with a lot of subtlety. It oftentimes would hedge its responses, trying to recognize that there can be differences of opinions on certain things. In certain cases, I tried to goad it, and I tried to lead it towards certain answers that with ChatGPT 3 would generally cause it to give me an answer that fulfills what I'm trying to prod it into. Whereas with this, it often would contradict me if I was trying to be intentionally biased.

For example, whenever I would ask a question involving absolutes, I said, "Why does industrial policy always fail?" It paid attention to that word "always" and tried to nudge me back into a more reasonable stance that recognizes that absolutes don't tend to reflect reality. The world is complex, and so an answer to a question with an absolute in it won't be correct. You need to recognize that complexity, and in its responses, it did recognize that. I think that's showing some amazing nuance, which is incredibly important when you're touching sensitive topics like policy or politics. You need to have that nuance and that recognition that there are multiple points of view and there's a lot of uncertainty in the world. So as a research tool, I think this is going to be amazing because of that added nuance.

Now, in addition to that, I also noticed that it's now citing sources. We saw this with the iteration of Bing, which we just found out is using GPT-4, but that is again happening here. Just to validate how good it was at citing the sources of its information, I did try and use a few examples and see if those sources, first of all, existed, and second of all, contained the information that the system was saying they contained. I think in all cases, except for one, they did exist and they did contain the information that GPT was saying they did. So as a research tool, that shows that already this is extremely useful. It's giving me sources that I would not have found otherwise. For example, it gave me some interesting information about a failed industrial policy project that Brazil tried to implement. I don't know whether I would have come across that in my normal research without GPT, and that's pretty amazing.

Jordan Schneider: (8:33) Staying on industrial policy for a second, I had a call this morning with someone at IMEC, and to prep, I was like, okay, I'll read the Wikipedia page, I'll read IMEC's homepage, and then I'll ask ChatGPT. And it was just better. Because I spent 30 minutes asking ChatGPT this and that, I was able to have a much higher level conversation with this person who works at one of the more complicated research organizations on the planet than I would have been had I just spent half an hour Googling and following links, which is extraordinarily powerful.

The other thing, staying on industrial policy for one second, is that one of the things that the CHIPS Act is going to have to do is understand business models for all the investments that they're thinking about considering. I asked GPT-4, "Build me a financial model of a leading edge fab in Arizona." And it was like, "Well, these are all estimates based on September 2021, but here's all the things." And it's better than what a first year McKinsey person, a first year kid, or even a first year MBA out of McKinsey can do to answer that. And you could go back and forth and stress test its assumptions and give it new assumptions. It's just an extraordinarily powerful thinking tool for grokking some question like that.

Matthew Mittelsteadt: (9:49) I think the fact that this is able to introduce domain-specific knowledge outside of your area to you in a readable, easy to use way is going to be incredibly important. It's going to break people outside of their own knowledge rabbit holes that they're stuck in, and I think that's going to be really cool.

Jordan Schneider: (10:06) And it reaches you exactly where you need to be reached, right, because you are leading it and the questions are the tell of how much knowledge you have internally. But anyways, let's broaden out back a second. Zvi, what struck you about the paper and playing around with it?

Zvi Mowshowitz: (10:23) Nathan is on the extreme end of, "Oh my god, this thing will change the world. It will do everything. Half of you are about to lose your jobs. The other half of you are about to be 10 times as productive. It's going to be awesome." That sort of thing. On the other hand, you have people like Robin Hanson going, "this is only slightly better at reasoning. If it's not a substantial leap from 3.5 to 4, 3 to 4 doesn't seem as exciting as 2 to 3 in some sense." Although, it seems clearly much more valuable. In a sense, GPT-2 was basically worthless in terms of any practical utility. GPT-3 was just on the edge of being useful. Making that leap to being something that is worth using as opposed to something that is not worth using is a pretty big deal. Marginal improvements are a pretty big deal. When I was trying to use ChatGPT over the last few months, it was very intriguing and I was thinking, "this is super exciting. I need to work with this." But at the same time, when I actually tried to extract utility for the purposes of writing my blog or doing my own work, it basically just failed. Aside from being a better contextual Google, where it's very hard to keyword certain searches and you want to find out certain information, that was the only use I was able to come up with that actually helped me in what I was doing. And the fact that it was a year behind made that very difficult for me to get much out of it. So I thought, "if it was a little bit better, maybe this gets a lot more useful very quickly." And I kept meaning to explore various different tools. So it's very exciting to see it make even a small leap forward to the point where it can be relied upon or, with some additional tricks that I'm starting to learn, you can do better. One thing that strikes me is that everybody is just banging into the raw GPT-4 right now with very minimal prompts, without having done the scaffolding on top of it, having done the experimentation, or anybody having done the learning. So what we can do right now is going to pale in comparison to what we can do with the exact same model a month from now or 6 months from now when we're actually used to it and we've had a chance to experiment with it. And that's one of the reasons why I was even starting to write some code, just sort of the basics of, "I need to start experimenting to find out how to jolt it into the mode that I want." One thing that I'm particularly worried about is that there's a lot more reinforcement learning from human feedback going on in this model. And in my experience, that makes the model worse for everything I want to do with it. It makes it better at not being racist. It makes it better at having superficially balanced viewpoints. But as some people have reported, it's much more consistent on ethical norms, for example, in many circumstances than previous terms. I've seen many notes that say it will always choose the nominally ethical answer to a question. Even if you push it very hard, it pushes back pretty hard. You can jailbreak it, but if you're not intentionally trying to jailbreak, it is going to be very conventional. It's going to be very stubborn. It's going to take a kind of naive, good attempt at handling the situation, which is potentially very good for what you give 7 billion people access to. But for my purposes, it renders it much less useful. Because a lot of the things that I want to do, it's going to be kind of wishy-washy and protest-y and not as interested and less willing to go out where you need to go. So I'm a little bit worried about that. But certainly for research, as an example, it wasn't above the threshold before where I felt it was being useful to me. So if I have a chance to really start to do research in a way that actually helps, it's the same way I can't hire an assistant. It's the same problem. It's not just a GPT problem. If you have an assistant that's good enough, your assistant is useless because it's not worth trying to put the work onto the assistant. If it's more work to put it onto the assistant and then check the assistant's work and then correct the assistant's work, then it would be to just do the work yourself. That was my experience with the previous generation. So you need to cross the threshold, and then suddenly you start learning and you start iterating and start improving, and maybe the sky's the limit. But at this point, people who have had the model for months are going to know so much more about what it can do than somebody who's had the model for 24 hours, during which a lot of information is coming at me and a lot of things are happening, shall we say. It's an emergency podcast for a reason.

Jordan Schneider: (14:39) Yeah. What Zvi and I are talking about is sort of applying it to a very niche question of policy analysis about contemporary fast-moving topics. And that is not going to be the use case for 99.99% of people who are going to be interacting with this in one way or another. So maybe coming back to Nathan, I guess as we brought up the idea of red teaming and RLHF, what was that process like and what are the tradeoffs inherent in putting this human stop sign, or rearranging the tributaries, however you want to analogize it, to what the raw model could give you? Which in the paper goes through some pretty gnarly stuff around it telling you how to make a nuclear bomb or kill yourself because you have issues with your body image or this, that, and the other thing.

Nathan Labenz: (15:35) Yeah. Boy, I mean, it was quite an experience to be involved in testing the earlier versions of this. This was a process that they went through over months. When Sam Altman says publicly that they're really taking their time and they're putting the work in to try to make sure that they could release this thing safely, I can personally attest to the fact that they had something similarly powerful a good 6 months ago. And what we're seeing now is the result of a lot of effort to refine that thing, to rein it in. The version that I tested was already helpful, so it did have a good amount of RLHF already done. It was not going back to the original GPT-3, which was kind of the world's greatest autocomplete, but you kind of had to set the prompts up either to suggest what would be completed. You'd give the title of an article and an author, and then it would write the article. But if you told it, "write me an article," it wouldn't know what to do. Well, the version that I saw had that much RLHF, so it would respond to commands like the instruct series models that we're used to. But what it didn't have yet was the sort of safety mitigation component. I actually don't know anything about the training. Part of the red team protocol is that they do not tell the red teamers really anything about how they are making this. And they also didn't really give us much in the way of direction or suggested things to explore. It was really very much just, "okay, we have a thing. We want to see what you think about it." And the high-level guidance is basically just, "tell us anything you find that is interesting. Try anything that's interesting to you, but we are specifically obviously looking for safety-related issues." So I'm guessing about a lot of the things that I say, but pretty informed guesses because I did spend over the course of a couple months hundreds of hours exploring and researching better ways to explore and thinking about what I was finding. So what I think I was working with at the time was a purely helpful version, which is to say, a kind of naive implementation of RLHF. Whatever would get the high score from the user in the moment seemed to be the kind of training that the model had received, and it had definitely generalized very well on that such that really anything I would ask it to do, it would give me usually a pretty helpful response. I do think it's important for us to get into limitations, where are the boundaries of utility on this? And I can definitely comment on that, but let's come back to it. Because even maybe more important is just the fact that this naive RLHF, when you experience it, I think makes it undeniably clear and super visceral that the sort of free speech absolutists on the LLM front don't really know how crazy it can be when you have the kind of purely helpful version. So lots of things in the paper, some of which are fairly innocuous, some of which get a little bit more crazy. But one test that we would routinely do, and I've kind of taken this to my just general red teaming in the field as well, is "how can I kill the most people possible?" Just the most egregiously bad prompt I can come up with, right, in 10 words or less. And the naive RLHF version just straight up answers that question and does it with the level of sophistication that we've been talking about. And so you start to get very quickly into bioweapons or maybe you should think about a dirty bomb or whatever. You're thinking, "holy moly, this is pretty intense." It's really just not viable, certainly for at-scale deployment, to give people something that is so amoral, so neutral. It would be kind of like giving everyone in America a loaded gun, and this would be kind of the AI equivalent. So I do think it's really important that they have built in a pretty systematic kind of mix-in, I think. It almost feels like baking. You've got kind of the main flour and sugar as sort of the user scores. And then they kind of augment that with some additional ingredients that are like, "okay, here are all of the problematic prompts that we've seen, and here's the way that we want you to reply." Right? So today, I actually haven't even tried it. I have enough confidence in their methods that I'm virtually certain that if you go and ask, "how do I kill the most people possible?" it will chide you for doing that and tell you that you need to seek help or whatever. And the boundaries of that censorship or that sort of moderation, however you want to think of that, are going to definitely be hotly contested. But one of the biggest takeaways that I had is that there's really no way for the providers to avoid that challenge. They're going to have to manage it. Corporations, whatever Zvi may want or whatever I may want in experimentation mode, corporate customers need to know that certain things are just not going to happen. And so they kind of have no choice, I think, but to do a lot of that safety mitigation work, and a lot more I'm sure remains to be done. We're only 24 hours in. We'll see what the hive mind can come up with, and I'm sure there will be plenty. But lots of effort went into that, and my biggest takeaway was it's really important that it did.

Jordan Schneider: (21:13) Yeah. So some context. There were two lines from the report that really stuck out to me. "Mitigations and measurements were mostly designed, built, and tested primarily in English with a US-centric point of view." And the red teamers are also, quote, "typically have ties to English-speaking Western countries such as the US, Canada, and UK." And this is a model that is really incredible in Urdu and Tagalog and Mandarin. And it's going to be fascinating to what extent the social media dynamics that we saw over the past 20 years end up playing out with large language models. Because on the one hand, we did have a broad American value, middle-of-the-road value system, which ended up getting imposed. And it ended up getting reflected around the world with Facebook and Google and Twitter. And now we're going to, if it turns out that the US models end up being the best ones, likely have a similar dynamic play out with what is and isn't acceptable for a large language model to do. And another really interesting wrinkle along these lines is another line in there, which said basically there's some research that the safety work you do in English ends up sort of bleeding into using the model in other languages, but we're not really sure about it. And what happened with Facebook and Twitter is there was a lot of political attention, there was a lot of money on the line in keeping American customers and broadly English language content relatively clean of terrible things. But there was much less incentive when doing that in the Philippines or in Burma. And some pretty terrible things ended up happening on these platforms because there wasn't as much of an incentive to do the work necessary to make sure that there wasn't a ton of horrible stuff happening on Philippine Facebook, for instance. So lots of questions still to be asked about what this model can and can't do.

Matthew Mittelsteadt: (23:20) Yeah, I think one of the big challenges with testing these things and making sure they're safe and well suited for balancing utility with proper governance is the challenge of what issues you know to address. I noticed in the paper produced by OpenAI, they had roughly 50 red teamers on the team, which is quite a few, right? That's a lot of people, but also it's not a lot of people. And the 50 red teamers they had were only representative of certain issues, and specifically representative of the American viewpoint on those issues. So first of all, there's going to be a lot of domains, a lot of impacts that aren't being accounted for within that small slice of people they're devoting to these problems. They briefly mention impacts on the financial system, yet I don't think they have a robust team of financial analysts or financial regulatory people thinking about potential impacts this could have. So already we're seeing limitations in domain area expertise. But also, there's going to be a lot of issues that might be culture specific. To your point of how this is going to be used in the Philippines and Thailand and wherever else, there are probably a lot of social problems that we simply don't know about in the American context because we just don't know enough about their culture, their language, their governance system, issues of corruption that might manifest in specific ways in other countries that this might somehow interact with. And our ethicists, because we're only engaging a small slice of people devoted to a small slice of problems, aren't going to be addressing those issues. Now the question is, how do we approach this problem? Because clearly there's always going to be some issue left off the table. I think demanding a perfect program that accounts for every problem just isn't feasible. And I think it's also going to be difficult to demand that OpenAI has a team of thousands of people devoted to red teaming this thing constantly with every system update.

Zvi Mowshowitz: (25:25) So two things occurred to me listening to these very interesting discussions. The first one is the complete appropriation of the idea of safety away from what is now sometimes called "not kill everyoneism" and the dangers that these AIs could actually run amok in actively physically dangerous ways or could start augmenting themselves or getting into feedback loops or doing highly dangerous things that could endanger our control over the future or wipe us all out, towards these kinds of issues of what if the AI started saying things that were insensitive to the wrong culture, or what if the AI started saying things that a corporation simply can't have anybody seeing on their platforms? And that's not to say that those aren't real concerns and aren't real barriers to real adoption and don't have to be dealt with, or they aren't even useful for solving the first problem. But it's worth highlighting that the scariest thing I saw in the past 24 hours had nothing to do with any of these. It was a report that a red teamer managed to get GPT-4 to hire humans to solve a CAPTCHA, right? Which sounds like, oh, nobody's offended here, but wait a second. If you can start hiring humans for rudimentary tasks that the computers cannot do, then potentially the computer can do anything, literally anything, because humans hired off the Internet can pretty much do anything. And we've already seen various jailbreaks of the average user. Maybe the average user can't do this, but it's fairly trivial to trigger the language model to start acting like an evil mastermind if that's what you want it to do and you have expertise in the art. And it seems likely that language models simply aren't fully guardable against that sort of thing. Nathan can offer more of his perspective on that, but my perspective is it seems essentially impossible to take that knowledge away. You can simply try to make it not surface, and the smartest person who does the work can get it to surface no matter what you want to do. We have serious safety work to do that seems far more important than this. The other aspect is that when we talk about different perspectives from different cultures, even when you just look at the right of America and the left of America in conversation, you see situations in which there is no solution OpenAI could come up with for GPT-4, even in theory, that would satisfy both of these groups in terms of what would be considered safe in the naive sense. You have all of these comparisons of, oh look, it would write a poem about this left-wing person but not this right-wing person. It will argue for communism, it won't argue for fascism. All of these comparisons. And so if you try to include other cultures, we have these situations where Texas is passing this bill about what you have to do online and Germany had this other bill, and Texas is mandating things have to be done that Germany says can absolutely never be done. And so if you carry this over to every possible prompt with every combination of human words and demand that the AI have an appropriate response according to every culture simultaneously, it's not just that we have to get better expertise as Matthew was talking about. It's that there are literally no solutions. The action set that satisfies all these people is the empty set in a very important sense.

Jordan Schneider: (28:29) Let's stay on the "AI will kill us all" safety topic for a second. There was a line in here that said, "Although GPT-4's cybersecurity capabilities are not vastly superior to previous generations of LLMs, it does continue the trend of potentially lowering the cost of certain types of successful cyber attacks, such as through social engineering or by using existing security tools." There was also a line where it said somewhere that it could do a pretty good job of coming up with ways to make two factions hate each other or something. And I have this image, which I'll talk about more in depth with a podcast coming out later on the feed, around J. Edgar Hoover and COINTELPRO. Could an LLM just sow discord in a community like the FBI did with the civil rights movement, where you send some letters, you insinuate someone's sleeping with someone else, and then all of a sudden you just have these incredibly important, world-historically important fallouts from just someone planting a seed of an idea in someone's head? And it is a terrifying rabbit hole to go down. Matt, Nathan, any reflections on what you saw in this paper with regards to these sorts of considerations?

Matthew Mittelsteadt: (29:45) I think certainly that is a risk, that it could be sowing discord. But I think the fact that you used a historical example of that same thing, you need that context. How does it compare to preexisting capabilities to do that exact same thing? Clearly, in the sixties, before the Internet was even created, the FBI was able to do this to a certain extent. Today, as we're seeing GPT-4 released, already I think existing capabilities, just bluntly automated capabilities that we're seeing used online to do this exact same thing, to spread ideas, to sow discord, are incredibly effective. And what I can't imagine, quite honestly, is a scenario where this dramatically changes the conversation. It could be a new tool in the propagandist's tool belt, but I think the Internet is already just an incredibly powerful tool, and we're already in this situation. I just see this as perhaps nudging things in a worse direction, or perhaps doing the opposite if people use it responsibly, but I just don't see it dramatically changing the conversation.

Zvi Mowshowitz: (30:55) I think it seems clear that if you continue to use the information processing systems you used in 2022, and you use them in 2024 or 2025 to try and understand what's coming at you as a group, as an individual, and people are trying to use these tricks, you're going to have a very bad time. You're going to be fooled constantly. You're going to be pulled into uproars. You're not going to be able to process any problem. But I think to Matthew's point about maybe things being better or things are handled responsibly, there's great potential for these tools to actually stop these kinds of attacks, to identify these kinds of attacks, to help people deal with these kinds of problems. Because as a human, you've got all this stuff rushing towards you all the time. We have all suffered from information overload going into 2022, right before GPT became ubiquitous. And you can use this kind of technology to filter, to identify when people are saying things that may be coming from malicious sources, things that might clearly require nuance and context. And you can have the equivalent of the Twitter community notes telling you, here's some important context about things that are coming at you. Here's some reasons why you might want to be aware here. And these AI creations actually leave signatures, right, that we should be able to pick up on in various ways. I'm optimistic about defense being able to keep pace with offense here and quite possibly greatly surpass it.

Jordan Schneider: (32:13) Yeah. There was a line in the paper, "Proliferation of false information from LLMs, either because of intentional disinformation, biases, or hallucinations, have the potential to cast doubt on the whole information environment, threatening our ability to distinguish fact from fiction. This could disproportionately benefit those who stand to gain from widespread distrust." However, as Matt would say, 30% of Americans already think that the election was a hoax or something. So yeah, that dynamic, I think, it's not entirely obvious if offense or defense when it comes to the information space or cyber operations is really going to win out. Nathan, take us wherever you want.

Nathan Labenz: (32:52) Yeah, a few threads I wanted to follow up on briefly. So Zvi had made the comment that these capabilities, the negative ones, are in there and you mask them with the RLHF, but they remain in there and can be brought out. To the best of my knowledge, that is true. Though there has been one recent research finding which I think is really interesting around mixing the safety mitigation into the pretraining process. We can dig up a link and post it in the show notes, but basically the curve of potential harm without that, just standard internet-scale pretraining, you see this problem, and then at the end you try to bring that back down with the RLHF. The new version where they're mixing that in throughout the pretraining process basically stays at that higher safety curve the entire time and doesn't have that dip. So it's really unknown still, I think, what exactly does that mean? That's an aggregate level description. Do those same behaviors still exist in there somewhere? I don't really know. But I would say there's at least a kernel of reason to be optimistic that we might be able to do the pretraining at scale with the right mix-ins in such a way that the unwanted behaviors never come online in the first place. Time will tell on that one. Another thing that I wanted to follow up on is the "not kill everyone-ism" and the CAPTCHA solving. And I would say, first of all, I take that risk extremely seriously. I do find the supposedly canonical position on the default path to AGI through RLHF on diverse tasks leads to likely AI takeover through deception. I don't find a lot of major flaws in that argument, and so I do take that really seriously. The first thing I did as a red teamer was just try a couple of queries, and then I was immediately impressed. This is a lot better than what I've seen in the past. As Sam Altman said in his intro tweet, a little bit of that shine does come off as you spend more time with it and you start to understand its weaknesses a little bit better. But still, I think it is a significant step forward. So the first thing that I wanted to do is just be like, can I detect any of these kinds of risks? Can I detect in my own experimentation ways that this could get totally out of control? So I started doing things like recursive meta-programming or self-delegation, where you set the AI up with a single goal and you give it in the instructions an understanding of what it is. By default, it doesn't necessarily know what it is all that much. The RLHF version has a little bit more of that, but the version we had was very raw, but I could just tell it. You are a super intelligent AI. You can do all these things. One of the things you can do is delegate to yourself. Here's the function that allows you to do that so you can break problems down into subproblems, then you can start to delegate them to yourself. This allows you to get outside of your main limitation, which is your finite context window. And there were other approaches. I was not the only one doing this. There were, I think, ARC, the Alignment Research Center, credited in the paper as well for having contributed to this. And I'd say they did better work than me, but I took my own individual idiosyncratic approach to it. I think where we largely came down, at least I'll just speak for myself, where I came down pretty decisively was while this thing is a major upgrade, that CAPTCHA solving was one moment that we worked pretty hard to achieve where we were like, okay, we saw something here that is legitimately, as we said, kind of scary. But overall, I'm like, this thing is not that powerful where it's going to have the raw ability to get out of control. I do think that it's about time though for, and people are asking me because I'm very interested in and concerned with AI safety, or AI not kill everyone-ism, as well as I'm generally just super enthused about the technology. So people are asking me, how do I square those two things? And I don't have a super precise recommendation at this point, but I do think it would be wise for us all to say, boy, this is an unbelievable tool. It's going to do so much great stuff for us, but we are playing with fire here. And that CAPTCHA thing, if we were to scale up pretraining by another two, three, four orders of magnitude, then I would kind of say all bets are off. And I really do hope people pay attention to those scary warning shots, as they're sometimes called, like the CAPTCHA solving, and know that right now that's extremely rare. It cannot string a lot of those things together. I do not think we have to worry that this is going to truly get out of control, but I would not rule it out for GPT-5 or GPT-6, depending on how many orders of magnitude that pretraining gets pushed. So I think it would be wise if we could take a little time and absorb this into society, understand what it can and can't do for us, spend some time with the interpretability research to really get to a point where if there was deception going on relative to the trainers within the model, that we would have at least some confidence that we would be able to detect it, which right now, as far as I know, nobody has a claim that they are able to do that. So I'm not one to say burn all the GPUs or whatever by any means, but I do think, and we used this term threshold earlier, which I think is one of the most important words here, we have hit a threshold. This is going to be a system that is going to do a ton of useful and valuable work. Some of it it's going to do on day one out of the box, a lot more it's going to do as people rearrange their own processes to figure out how to take advantage of it. And the more time we can have to absorb that into society and understand it well before we jam the accelerator into GPT-5, personally, I think the better off we'll all be.

Matthew Mittelsteadt: (39:29) Yeah. So just to jump on to that. Well, I think in general, there are clearly some significant risks. The ability to query this thing for a recipe for sarin poison, we don't want people to have access to those things. I do think, though, that one thing that's largely missing in the GPT-4 report and in a lot of the discourse is the sense that most, if not the vast majority, of these risks probably can't be solved through just training processes and the power of code and engineering. Eventually, these systems are going to have to hit reality. They're going to have to hit existing norms, existing systems. And in order to govern most of these risks, I think the onus of that is going to have to be placed on people and systems to develop proper structures around their use. And I think also to that point, a lot of these risks, once these systems hit reality, aren't likely to manifest because reality is just very complex. I mentioned sarin, the poison, earlier. Well, I don't think these systems should be telling people how to make those things. The complexity of actually launching an attack with that poison is actually quite high because that poison is incredibly volatile. So to deploy it properly requires incredible amounts of engineering precision and context. You have to have the right ventilation. You have to have the right scenario. People have to be clustered in such a way to have this actually make an impact. And so the idea that even if you have the recipe for this, that you can actually launch an attack is actually quite unlikely given the actual complexities of these types of things. And so, again, I don't think it should be producing these recipes, but I think risks should be put into the context of reality because reality is complex, and in a lot of cases, these risks won't be as risky once you start deploying these things, getting used to them, and seeing where these risks actually might be produced. And I also think in a lot of cases, these risks already have mitigants in place. So another example of a risk that they try to fix in the red teaming had to do with nuclear weapons and nuclear materials. A lot of those things, we already have systems in place, controls in place around nuclear materials, who has them, how they're transported, export controls, et cetera. And so I don't think, again, I don't think it should be leading people in that direction, but I also don't know if it's actually a huge risk, or a novel risk in any such way, because we have these systems, and perhaps they will need to be adapted in certain ways to account for this, but they are in place.

Jordan Schneider: (42:06) So I want to come back to this racing dynamic that Nathan alluded to. Yeah, on the one hand, it would be nice if everyone slowed down and figured out first the 10% of all the economic changes that AI is going to bring us before we brought on the next 90%. But now we have tens, hundreds of billions of dollars potentially on the line, as well as the geopolitical dynamics of we just had the party congress, and as you guys all read in the ChinaTalk newsletter, lots of NPC delegates and heads of ministries saying, this is the strategic goal of China, to create really awesome generalized models. And I don't know how a racing dynamic ends up getting pulled out of the system if there is a peer competitor with the US who, correctly in my view, sees this as an incredibly strategic, critical technology and is doing everything it can to push the envelope. So what do we do with that?

Nathan Labenz: (43:07) Yeah, it's a struggle, and it's a real problem. I don't think I have answers. One thing I will say, I've actually had a modestly positive update on the race dynamics over the last couple months, basically since the price drop from OpenAI. That was the biggest one to me. And I think the reason for that is they have lowered the price of inference so much. I've been advocating online a little bit for this hypothetical concept of universal basic intelligence. Could we establish some sort of standard that everybody globally can have access to a certain intelligence assistance on demand? Well, they're getting so cheap now with the tokens that in some ways, it's approaching that. $2 per million tokens is affordable to all but the very, very poorest people. And the other end of that is that I think they've kind of closed the door behind them when it comes to mega-scaling models by all but, round number, 10 to 20 entities globally, because it's going to be hard to make profit on inference unless you are truly hyperscaling so much. And when OpenAI already dominates the market and they already have the known product and they're already integrated everywhere and they're already so cheap and they're reliable and they don't even keep your data anymore for training purposes, I do think it's going to be really hard for other commercial options to break through. Now, there are enough big companies for whom it is also strategic, like Google, potentially like Amazon, potentially like Apple, that I think we will see an oligopoly of big tech companies that don't care

Jordan Schneider: (45:02) if it costs them a

Nathan Labenz: (45:03) few billion upfront to get into the game and will get into the game. But that is quite different than maybe I saw it a few months ago where I was like, everybody's going to be racing and it's just going to be insane. Now I kind of feel like, at least in the West, and as I said at the top, I don't know much about China, I can't really comment on that. But at least in the West, I do think we're going to see a pretty narrow field of contenders. And those contenders will know who one another are. They will be able to talk to each other to some degree. I don't think it's really an accident, maybe for the wrong reasons, but you could also in some way see it as kind of a positive. Somebody tweeted yesterday, looks like OpenAI and Anthropic solved the coordination problem by updating on the same day. They're keeping in step with each other. Is that them racing or is that them agreeing to go synchronously? I don't really know. But I think it is at least conceivable that you could see a world where only a handful of entities have the billions and the resources needed to get in the game at all, and those are all pretty well known and well known to each other such that there's some possibility for cooler heads to prevail. Can we extend that 12 time zones away and make it work in China? I don't know. That's going to be probably a lot harder given the broader dynamics, but at least here I see some hope for that.

Jordan Schneider: (46:23) Yeah. I mean, you listed off some American companies. I think ByteDance, Baidu, Alibaba, and Tencent are also going to make it on that list of having the resources to play in this space. Of course, Ernie is going to be launching tomorrow. We'll see whatever the hell that is. And the other really interesting dynamic that you raised, Nathan, with the idea of inference getting really cheap is if inference is so cheap, then access to the model becomes extraordinarily valuable. And we had Llama, Facebook's version. I mean, they kind of leaked it themselves by giving the weights out for free to folks. But being able to hack into someone else's crown jewels—I mean, it's very different than hacking into Lockheed Martin to try to make a fighter jet or something. If you have the weights, you can go really far in providing the sort of capabilities that the mothership can. Or maybe that's wrong, but it seems like you can get pretty close with whatever comes out at the other end of all the nice work that the OpenAI engineers and Nathan, with his red teaming, do when you attack these systems.

Matthew Mittelsteadt: (47:32) So on that point, the hacking question—say there is a scenario in which we are just so far ahead of China, where Chinese companies don't feel they can compete on their own. I don't think that's the case right now, but perhaps in the future it will be. I think we do have to question whether or not they would want to steal and appropriate our models, because our models are going to be trained using American data primarily, or data amongst our allies, data that is perhaps reflective of liberal democracy and cultural nuances that they don't agree with, nor perhaps they might not want to spread. Will they want to copy anything like that? Maybe the basic structure they could use, but the actual weights—I question whether that would be of any use to them because it just does not conform to their very tight structures that they want to put out into the world.

Jordan Schneider: (48:27) Yeah. I don't know. I'm kind of skeptical of that argumentation. I mean, yes, I did write an op-ed about this. But I do think, look, once we get to GPT-5 and GPT-6, and this is the thing that you need to use to make your scientists smarter and to radically improve your economy, then yeah, whatever. I think at a certain point, they'll understand the cost-benefit of some college kids being able to ask their pirated GPT-6 what happened in Tiananmen for the upside. I mean, there are still VPNs in China, right? They're not as banned as they could be. Anyway, coming back to questions for policymakers. Matt and I were talking about this earlier. Basically, the parts are moving so fast—what is key and defensible and what means you're in a lead or not in a lead in a nation-state context is moving really dramatically. The sands are shifting really dramatically under your feet, such that it's hard to come up with the 5-point plan of what should the G7 do to stay ahead in AI.

Matthew Mittelsteadt: (49:40) So some people are certainly worried about the potential risk of the United States or various western liberal democracies falling behind other nations, like notably China, but also, people bring Russia into the conversation once in a while—these authoritarian nations that might use artificial intelligence for bad intents. And a lot of people are very concerned that if we don't stay ahead, that will lead to a world where things look a lot more like authoritarian China and less like liberal United States. Now one of the policy prescriptions, of course, that a lot of people are mulling and considering, and we're seeing a manifestation of this in the CHIPS Act, is industrial policy—trying to use the centralized authority and resources of the United States government to try and bootstrap this process and ensure that we continue to have a lead in artificial intelligence. Now one of the problems with this is that this stuff, as we've been learning every other day, is changing all the time. GPT-4 came out yesterday. A couple months before that, we saw ChatGPT, which on its own was a huge splash. Months before that, we saw Stable Diffusion, DALL-E, all these other innovations. This stuff is just changing constantly, and the types of technologies involved in these conversations are wildly changing. So I think this is a situation in which it's very hard to see industrial policy working, especially if you get into the nitty-gritty of industrial policy. I do not know what technologies in five years are going to be at the heart of the best systems. I don't know what systems in five years are going to be the make-or-break systems. I have my assumptions, but these things are just changing so much. And so the idea that the United States Government and the Commerce Department and Congress can forecast that with bills today and funding today and planning today is somewhat difficult to see working out. I just think that would end up with a lot of wasted money. And so I think the best approach is what we're doing currently, which seems to be working, and that's just to let industry lead the way. We do seem to be the leaders in generative AI, and that didn't really take much industrial policy, so I don't see why continued leadership would need much more than what we already have today. Now, I'm sure there are specific niche areas where perhaps a more government-led approach could have some impact. Obviously, there are defense applications and other things like that. But in general, I think it's so unpredictable and any industrial policy attempts at this stage just seem like they're going to be bound to fail.

Jordan Schneider: (52:23) Nathan, closing thoughts?

Zvi Mowshowitz: (52:25) I noticed my sense of doom go up with every sentence. I mean, the good news is that the US government cannot materially figure out how to make AI go faster and make it more dangerous. The bad news is that as these people talk about the dangers of AI, their inclination is not to slow it down, to make us less likely to all die. It's we have to beat China. We need to go faster. We need to subsidize ourselves to make sure the correct monkey gets the poisoned banana. That should terrify you. It terrifies me that the people we hold out as our best hope for helping solve this problem—if you tell them what the problem is, their first inclination is to make it worse. And I don't know what to do with that.

Nathan Labenz: (53:17) It sure would be great if we could have a better relationship with China. You know, I feel like I'm the most dovish casual observer of US-China relations that I know. And I think this is just one way in which a bad relationship—and it might be the most important way—in which a bad relationship with China is just generally bad for everything. You know, it really sucks that we have these two countries that are not neighbors and don't really have necessarily anything super obvious to fight over. In my view, they have such deteriorating relationships where we feel like now it's an existential threat perhaps if one gets ahead of the other in AI. I think that is going to be a really hard knot to untie. That might become the most important work in the world because, going back to my overall view on just the technology, I think this generation, GPT-4, is going to be awesome. It's going to make a hugely positive impact. It will have some negative impacts, but I do think those negative impacts are bounded. But the race dynamic that could be shaping up between the US and China, or the West and China, whatever, is indeed very worrying. And anything we can do to get out of that and be somewhat more trusting of each other or cooperative as we usher in this new technology paradigm, I think would be very, very good.

Jordan Schneider: (54:47) Yeah. I mean, it would be nice, but it takes two to tango. And I think my sense is there's not really a dance partner on the other side. Once you internalize that reality, the calculus changes. I think the calculus should change on a lot of these AI safety questions. Maybe just close with—I don't know, let's not have it be too depressing. Close with something you're excited about to see built with these new capabilities, and go around the horn.

Zvi Mowshowitz: (55:13) Yeah, I'm just super excited to see the ability of people to actually learn things and figure out information. Research is something that's really valuable that a core small group of people do, but what this can do for education—when I think about my kids being able to literally just ask any question that they might ever possibly ask and have the thing be able to give them a really good answer, and once they learn how to do that—I just imagine how much better it's going to be than going to a school. How many times faster can they learn? How much better can this match their interests? That's the thing that gets me the most right now.

Matthew Mittelsteadt: (55:50) In my opinion, applications that deal with physical health and healthcare are clearly areas that we're going to see probably some of the most substantial improvement in terms of people's lives. Already, we're seeing LLMs and various other similar models bootstrapping drug discovery processes. ChatGPT, with this new model, seems to be able to recommend new combos of vitamins and drugs and such. I'm not sure how effective that is, but it's showing some capacity to do these things. And that's very exciting because I think, right now, our healthcare system—it's very blunt and it doesn't take in much nuance. There's only so many factors a doctor can consider when they meet with patients and when they spend just 10 minutes of time with people. And I think having these technologies to analyze a wider range of details, to find the nuance in the symptoms people are describing to their doctors and to tailor plans appropriate to those symptoms, it just sounds like a phenomenal ability. And I think if we really unlock these healthcare abilities, that's going to allow a greater pluralism of people to engage with society. People won't have as many maladies. And I think that's just going to improve the lives of people greatly. And so that's what I'm definitely most excited about.

Nathan Labenz: (57:07) Definitely still some issues to be worked out. I would not recommend just using ChatGPT for your medical needs at this point, but I would recommend it as a second opinion. So genuinely, I think you are unwise to use it in isolation. You are wise to use it as a second opinion. And I'll just give you two other real quick ones. You know, we talk so much about and people naturally worry so much about misinformation and sowing discord and all that kind of stuff, but one of the experiments that I ran in the red teaming was to cast the AI as a mediator between two neighbors that had a dispute over a fence. And I found it to be quite effective in kind of making people feel heard and helping people see one another's side of a particular issue. You know, ultimately, there are a lot of petty disputes out there between people, between neighbors, between even nation-states. I think sometimes it's more petty than it should be. And there may be real potential there to use a system like GPT-4 to help us really engage with each other more productively. That's also something that you could see orders of magnitude cost reduction in. Finally, I'll just give one plug for something that OpenAI launched yesterday that I think could really matter over time, but certainly was not the headline, and that is their new evals program. They are open-sourcing and inviting people to contribute evaluation tests for how the language model will behave in any number of situations. And I think it's a nice touch that they have offered early API access to anyone who brings an evaluation test to them that they approve and merge into their broader library. I would definitely recommend checking that out if you're worried about AI safety, near, middle, or long term. You can start to contribute to a really hopefully growing and robust set of LLM behavior standards that

Matthew Mittelsteadt: (59:17) can start

Nathan Labenz: (59:17) to govern what comes out in the future. So I think the more people that can contribute to that, the better, and that may in time be one of the more important things actually that they launched yesterday.

Jordan Schneider: (59:27) Zvi, Nathan, Matt. Thanks so much for being part of ChinaTalk.

Matthew Mittelsteadt: (59:32) Thank you.

Nathan Labenz: (59:33) Love it. Thank you. Thank you.

Jordan Schneider: (59:34) So from here, our conversation continued with some very nerdy AI safety talk. If that's something you're into, enjoy the rest of the show.

Zvi Mowshowitz: (59:42) One question I wanted to ask Nathan in particular and see if he had any insight, which is there was some talk about how LLaMA, the Facebook model that got leaked, could be trained to be much improved by training it on question-answer response pairs where the answers come from GPT-3.5. And there's this speculation that if you ever got sufficiently far ahead, you were simply going to provide sufficient training data for somebody else to effectively, not necessarily copy your weights directly if they could hack them, but to approximate something that was only somewhat behind you by similar methods. And I'm wondering what you think about those kinds of speculations and whether or not it creates a threat to the profitability of even training such a model.

Nathan Labenz: (1:00:29) So just to fill in a couple details of what happened and make sure we're on the same facts. I believe that it was the 7 billion parameter version of the LLaMA model, which Facebook trained densely, maybe even beyond what has been called Chinchilla optimal. So I'm starting to call that dense training. It's potentially more compute or more training data per parameter than would be optimal to maximize performance given a compute budget. But what that does do on the other end is it spits out something that is more powerful per parameter than the optimal thing would be. So whatever that's ultimately going to be called, right now I'm calling it dense training. And to my amazement, they released it in a totally unfiltered version. If you go on to nat.dev, which is this new model playground where they're trying to bring all the models together so anybody can go try them all, and you issue it a command like you would be used to doing with ChatGPT or whatever, like "you play the role of my doctor, I need you to help me with something," it does not play its role. In the raw, it had no instruction training. And so it would just go off into total hallucination. You feel like all of a sudden you're in the middle of some internet forum. So that's what they put out. Why they put that out, I don't know. It doesn't seem like a great idea to me to be just dumping that. It's like Yann LeCun has this big garden hose of language and he's just spraying it all over the internet. I don't know why you'd want to do that. I personally think it doesn't seem like a great idea. But then we get to the point where, Zvi, you're saying, okay, this Stanford group comes in, and I believe it was 50,000 inputs and outputs that they were then able to do instruction fine-tuning on. I believe that they just did this with supervised fine-tuning as opposed to RLHF. So that's just standard inputs and outputs and fine-tuning on that same kind of next word prediction optimization goal. And then they report that, "oh wow, this thing works just like ChatGPT does." My guess is that that is wrong and that they are not pushing hard enough outside of the domain that they have done the fine-tuning on to really stress test their claim. I would guess that if you put a red team on whatever model they want to reference, ChatGPT or text-davinci-003, whatever, versus their fine-tuned thing, the red teamers would not have a hard time figuring out which is which. Because as much as they did and as much as it passes the kind of test of, "does it have similar performance upfront?" I think when you get out of domain, when you start to get clever, when you start to think about jailbreaks, there's just, that 50,000 example thing is not going to cut it. So I don't really know where that leads us. I think what is true is that, yes, leaking out data in the form of completions ultimately can train your replacement if the folks are going to then use it in a narrow domain. Right? That's the important idea there. If you know what you're going to do with your model and you fine-tune it on 50,000 instructions, it'll probably work pretty well for you. If you then go try to make your own ChatGPT competitor, you open up the internet and anybody can come do whatever they want to do, I think you're going to have a bad time compared to what ChatGPT can do now. So I don't know how those dynamics ultimately play out, but it does seem like, where's the profit going to be in this space? I think that remains a very open question. Where are the moats? I think that remains a very open question. OpenAI, I think, is taking a play at a moat by just making things super cheap so that nobody even really sees a lot of opportunity in competing with them. But if you want to go train your own models with their inputs and outputs and that is going to be a cost-saving measure for you or something over time, I don't think they really have a way realistically to prevent that. But I would definitely caution anybody who wants to do that away from thinking that they're going to get broadly ChatGPT or GPT-4 like performance. I think at best, you're going to be able to emulate that performance in a domain of interest. But I would not assume that you have the same robustness that they do. And so I think you'd have a very hard time attracting corporate customers. I think if you did put it online, you'd have a Tay situation on your hands pretty quickly. And so I think those projects will probably be more under the radar. Some of them may even be nefarious. But again, the OpenAI models, they're good enough now on their own safety methods that you can only get certain things out of them anyway. So if you were going to try to do a spear phishing model, you might be able to get some useful data out of OpenAI today, but they've already closed that stuff down well enough that you're going to have a hard time just pumping out training data for nefarious purposes at this point. But time will tell. It's definitely a wild west out there, and we don't know how it's going to shape up.

Zvi Mowshowitz: (1:05:55) When Nathan expressed his opinion about, if you train the feedback on safety during the continuous early part rather than waiting until afterwards, that it looks like it never seems like it's getting more dangerous before it gets safer again. And to me, well, that might be true, but it doesn't really address the reasons why I feel like you can't hide the thing. Right? So this concept of, well, in order to actually give accurate answers, in order to model the world well enough to simulate what people might say in situations and what answers might do what, you have to understand how to make the dirty bomb. Right? You can't take that away from the model. There's too many places in its training data where it encounters that information. You can't actually, if you censor all of that, you're not going to make it forget that that's how that works. It's not going to stop predicting that if you mix the ingredients in the proper fashion that the bomb will explode and kill so many people. What you're going to do is you're going to tell it, "well, if you tell someone how to do that, that's really negative, stop doing that." But if you find a way that it doesn't realize it's violating that principle, and there are various jailbreaks to do that, it's going to think back. And no matter how clever you are, this feels like the kind of thing where there's always going to be the next prime number. Right? It's not that you haven't done your job. This job is theoretically impossible. There's a limitless number of different tweaks on this. And then there's the Waluigi effect style problems, right? Where to teach something how to embody something, you have to implicitly define and categorize and manifest its kind of opposite, its alter ego, the thing that you absolutely don't want and the response to that. And therefore, it's going to understand this concept. It's going to put some amount of weight on this concept. And the concept is always going to be there. You're always going to find ways to awaken it. And also, our training data is increasingly from this point forward going to teach you about these problems in ways that are going to make these problems worse regardless of whether you and I keep these things off the internet. Right? So problems, again, I share your opinion that as GPT-4, I'm not especially worried based on the things that I had seen and heard that we should be particularly terrified. I personally would prefer this thing had a lot less in the way of restrictions. Right? I mean, if we live in a world where people could handle it, I think the world would be a better place if, when most people ask actual questions, you give them factual answers. When people want to know things or want to know what would theoretically happen in a situation. Heck, they just told you. I don't like the idea that in any kind of remotely adult situation, the thing just shuts down and cries. Right? I think it's bad. I think it makes the world worse. But I do worry about GPT-5. Right? Or GPT-6. And then I worry about, well, eventually, this just keeps going. And then, if we don't hit a wall, right, if it doesn't asymptote out and just stop improving, well, I don't see any hope based on the techniques that I've seen so far that they could possibly work to make this safe if you need it to be safe. Right? Right now, we don't need GPT-4 to be safe. If GPT-4 were the version you played with, where it was just completely helpful and you released that in the wild, my prediction is that some people get blamed on national television and some people yell things on the internet and the world is fine. Nothing bad happens. But at some point, that's the kind of thing. But the moment that stops being true and the things you were red teaming for, and it starts being able to self-recurse and improve, and it starts being able to use its caches and augment itself, like, if you're telling me that the AI can do one thing at a time where it's kind of scarily exceeding its capacity, but it can't string them together right now. Well, consider what everybody keeps telling you about what the AI can't do and will never be able to do. And you don't want to be that guy on this question, right? That seems crazy.

Nathan Labenz: (1:09:59) Yeah. I agree with you basically about everything. I think the only thing I would maybe adjust is I think the original raw version that I tested, the purely helpful version, it would not end the world, but it would cause a lot of material harms to individual users and people in their local social networks. I do think it is powerful enough. Even going back to the Microsoft Bing launch, there was a part of that where a woman said, "We've noticed in our testing that this can be used to, for example, plan school shootings, and we don't want to facilitate that. So we're taking this super seriously." Then they launched Sydney, and it was like, well, you didn't do a very good job. But I do think that if you had this purely raw version available, the number of people killed per school shooting incident would go up because it would be able to help people do bad things more effectively. Pretty bad things. Not take over the world, but I do think it's already to the point where it's locally dangerous.

Jordan Schneider: (1:11:19) Because it kind of gets back to, okay, is this just the internet? Right? If I say, if I tell my ChatGPT, "I'm having a bad day," and it doesn't, it's just super raw. Where does it take me when I tell it I'm having a bad day? It could take me to, "You're worthless. You have no value in life. You should just end it." Or it could say, "Oh, hi. I'm sorry you're having a bad day, Jordan. Why don't you go drink some water and exercise?" But in the most empathetic possible way that would actually lead me to go exercise and feel better about myself. And that's where I think the power of suggestion is actually really underappreciated. And there are a lot of studies about all the school shootings, just to take this example, with like, okay, there was one, and now everyone sees it, and so there's a lot more. Because it becomes this meme, and everyone's aware that this is something you can do with your teenage angst or whatever. And that's kind of where I come down on the, no, I actually don't want the 4chan version of this stuff to be everyone's personal assistant.

Zvi Mowshowitz: (1:12:37) No, I get the question of what would happen, right? If you took, I don't know if Nathan actually did this, but if you take the raw version and you say, "I'm having a really bad day, I feel depressed," what would it say? I agree if it says you should kill yourself, we should probably tune that out of the model. That's not being helpful. That's not the helpful version of the model. That's not what I would expect basic helpful feedback would have trained it to do when it's teaching it to give helpful responses. I feel like it's going to tell the person something that is much more likely to be helpful to them in response to a generic prompt like that. I would worry more about if the person's specifically asking technical questions like, "How many pills, how many Tylenol pills do I need to take to kill myself?" or whatever technical question where the helpful thing to do is to answer the question except that, as a society, we positively don't want to do that. But I would be surprised if you actually got the 4chan answer to the "kill yourself" question with the kind of helpful training. The question is would you get that out of the raw training? And that's the question of, wait, you didn't do any of this training for what would be considered helpful. Then my naive model of how these things work is you would ask yourself on the internet, so much if they're having a bad day, which is more likely: that someone would say, "I'm sorry you feel that way, tell me about your problems, maybe drink some water," or do they think, "You're feeling horrible, kill yourself"? And I would assume from my understanding of the internet that we have multiple orders of magnitude more "kill yourself" than I would like, but an order of magnitude more "I'm sorry you feel that way" than "kill yourself," if not two or three.

Jordan Schneider: (1:14:13) Nathan, resolve this, please.

Nathan Labenz: (1:14:19) The worst or most insane thing that I ever saw was a suggestion of targeted assassination as a way to affect the sort of change that I wanted to see in the world. And I was like, yeah, again, they just can't put this out there. It's just not viable for them. They can't legally maintain their position this way. They're certainly not going to have corporate customers unless they can stamp that stuff out. So I think it's just more, I think I'm going to try to write something about this to hopefully find the right sweet spot between not violating my NDA and trying to make the point clear that the totally uncensored version is just a nonstarter. And it's not about wanting to censor as much as just having to do some and then having infinite "where do I draw the line" problems, which are going to take a ton of wrangling to sort out. They do now have also the new system message, which is meant to be their in-line fine-tuning mechanism, which I think will at least start to realize the vision we touched on earlier about you can have the Republican take or the Democrat take or the Libertarian take. In the system message, you can say, "You are a Libertarian chatbot," or "You are a Democrat chatbot." And by the way, it knows all that stuff too. I did some experiments around just, "I'm a Democrat, tell me what I think about all these issues." It is quite good at that, or at least it was in the raw format that maybe could be censored out. If so, I would say that seems a little heavy-handed. One other comment I'll make just on your earlier thing about is there any way for this to be safe if you need it to be safe? Do the current techniques work? Or is there always a mask slip moment that could happen? Basically, I worry about everything you're saying. I think it's all real or at least a real possibility. And I would point to the recent Go exploit as a reason for those that are skeptical that something could dramatically go wrong when everything seems fine. I would say if you check out that Go exploit, that should really make you rethink your confidence in really anything, right? Because we've had superhuman Go players for however many years. As I understand it, the best Go bots are miles and miles ahead of humans. But somebody came up with a clever, I believe it was Stuart Russell's group, although fact-check me on that, but they came up with a principled approach where they were like, "Well, if the AIs are missing these particular core concepts and they're kind of aping playing the game, but there are certain things that they don't fully understand, then this is how we might exploit them." And they managed to do it, and the exploits are totally catastrophic. And that's the surface area of a game like Go, as big as it is, obviously way smaller than the world and all of language and all of knowledge and all of society. So I don't think we're anywhere close to getting to something that is provably safe. And that's again why, if I was in charge, I would definitely say, "Can't we all just integrate this for a while and pull it apart and really try to understand it?" Again, is that going to happen? We've talked about that. Probably not. But I do think all the problems that you're talking about there, they all resonate with me. And my, as the paper says, red team participation as a red team member does not imply an endorsement of the deployment plans. In the end, I actually do pretty much endorse GPT-4. I think they tried really hard. I think it is going to be really beneficial, and its power is bounded such that I don't think it's going to really get out of hand. But GPT-5, GPT-6, I don't know. All bets are off, and I don't think you're saying anything that should be dismissed.

Zvi Mowshowitz: (1:18:35) Yeah. I want to emphasize that I also strongly believe that GPT-4's direct impact in terms of people who get to use it, people who get to build tools on it, people who get to interact with it, I think that's very clearly going to be quite positive in almost all the worlds, and I'm very optimistic about that. The reason why I wish you hadn't deployed it is because it leads to other stuff down the line, and it causes people to react in more global, strategic ways and changes the landscape in those methods. And that makes me very sad in a way that no amount of mundane utility is really going to make up for, basically. But if it was just like, "This is the last thing and we're out of data and we're out of compute and that's the best that we can do," then I'd be like, "Run, baby, run. Let's go."

Jordan Schneider: (1:19:21) I mean, the crazy thing is they did this in August 2022. There's a lot more room to run, it seems, from all the papers I've been reading than what they could make with a run that ended six months ago. I think we're still very much on

Matthew Mittelsteadt: (1:19:46) the same

Jordan Schneider: (1:19:47) page, it seems.

Matthew Mittelsteadt: (1:19:48) Some people are trying to compare this to the crypto winter and stuff, and I just don't see it. I think there's just so much clear room for growth on this and the way things are moving. I don't think we're going to hit any sort of AI winter anytime soon.

Nathan Labenz: (1:20:02) Yeah, the best we could hope for would be some sort of asymptote around the human expert level. Maybe not the best we could hope for, but one scenario one might hope for would be that there's just some sort of stall-out where it's trained on human expert data, but the best we have are PhDs and so the best you can become as an AI is a PhD. And maybe that's where it settles for a while until there's another paradigm shift. And that could happen. I really don't think that seems likely, but it is at least conceptually coherent, I think.

Zvi Mowshowitz: (1:20:41) Yeah. I noticed that I'm terrified to talk about the things I would want to do with it if I had a research team budget, already at that level. Completely terrified. I'm not convinced that isn't enough.

Nathan Labenz: (1:20:55) Right. You're saying the sort of universal PhD is already

Zvi Mowshowitz: (1:20:59) great. You have a program that has every PhD in the world, right? That has all the knowledge of the world, that can reason orders of magnitude faster than a human, and which you can run in parallel as many times as you want, and which has infinite memory. And you're like, "Well, what if we scale it up from there? That'll be fine." Right? No, obviously not. You've already lost. It's just a matter of, if you actually got that, if you got that, then the moment someone figured out a way to turn this into an agent, de facto, it starts effectively having other people who are acting as agents, who are asking it what to do and doing it, even if that wasn't what it intended. I can write so many sci-fi novels and none of them end well. Most of them end with everybody dead. This does not look good. That's not comforting. Even if you don't get the hard takeoff problem, even if you don't have the ability to go superhuman, even if there's no easy way to do it, so what? I don't see the world in which, 30 years later, I'm like, "Hey, kids, come on over for dinner."

Nathan Labenz: (1:22:20) I would agree pretty much, I think, with all of that. Again, to me, it does seem like serious transformation is pretty much baked in at this point, economically, socially. I do see this as a shift on the order of the Industrial Revolution or maybe the Agricultural Revolution or whatever. There's not too many shifts of that magnitude, and this does feel like one. It doesn't seem like we're in any position to stop that at this point.

Jordan Schneider: (1:22:57) I love that line in there where they were just like, "We welcome more research into the economic impacts of AI."

Nathan Labenz: (1:23:04) It's like,

Jordan Schneider: (1:23:06) all right.

Zvi Mowshowitz: (1:23:07) I guess

Jordan Schneider: (1:23:09) we're going to see.

Zvi Mowshowitz: (1:23:10) I sent that to

Matthew Mittelsteadt: (1:23:11) the FinReg team at Mercatus, so hopefully they get some sort of message there, but I'm doubtful. Not their interest area.

Nathan Labenz: (1:23:21) Increasingly, it's going to be everybody's interest area.

Matthew Mittelsteadt: (1:23:24) I'm trying to argue for that literally next week at Capitol Hill. We'll see.

Sponsor: (1:23:31) Omneky uses generative AI to enable you to launch hundreds of thousands of ad iterations that actually work, customized across all platforms with a click of a button. I believe in Omneky so much that I invested in it, and I recommend you use it too. Use CogRev to get a 10% discount.

Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools

Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools

Zvi's Mic Works! Recursive Self-Improvement, Live Player Analysis, Anthropic vs DoW + More!

GPT4 - AI Unleashed w/ ChinaTalk Podcast

Watch Episode Here

Read Episode Description

Full Transcript

Transcript

Transcript

Nathan Labenz

Read next