Watch Episode Here

Read Episode Description

In this episode, former OpenAI research scientist Steven Adler discusses his insights on OpenAI's transition through various phases, including its growth, internal culture shifts, and the contentious move from nonprofit to for-profit. The conversation delves into the initial days of OpenAI's development of GPT-3 and GPT-4, the cultural and ethical disagreements within the organization, and the recent amicus brief addressing the Elon versus OpenAI lawsuit. Steven Adler also explores the broader implications of AI capabilities, safety evaluations, and the critical need for transparent and responsible AI governance. The episode provides a candid look at the internal dynamics of a leading AI company and offers perspectives on the responsibilities and challenges faced by AI researchers and developers today.

Amicus brief to the Elon Musk versus OpenAI lawsuit: https://storage.courtlistener....
Steven Adler's post on 'X' about Personhood credentials (a paper co-authored by him) : https://x.com/sjgadler/status/...
Steven Adler's substack post on "minimum testing period" for frontier AI : https://substack.com/@sjadler/...
Steven Adler's substack post on TSFT Model Testing: https://substack.com/@sjadler/...
Steven Adler's Substack: https://stevenadler.substack.c...

Upcoming Major AI Events Featuring Nathan Labenz as a Keynote Speaker
https://www.imagineai.live/
https://adapta.org/adapta-summ...
https://itrevolution.com/produ...

SPONSORS:
ElevenLabs: ElevenLabs gives your app a natural voice. Pick from 5,000+ voices in 31 languages, or clone your own, and launch lifelike agents for support, scheduling, learning, and games. Full server and client SDKs, dynamic tools, and monitoring keep you in control. Start free at https://elevenlabs.io/cognitiv...

Oracle Cloud Infrastructure (OCI): Oracle Cloud Infrastructure offers next-generation cloud solutions that cut costs and boost performance. With OCI, you can run AI projects and applications faster and more securely for less. New U.S. customers can save 50% on compute, 70% on storage, and 80% on networking by switching to OCI before May 31, 2024. See if you qualify at https://oracle.com/cognitive

Shopify: Shopify powers millions of businesses worldwide, handling 10% of U.S. e-commerce. With hundreds of templates, AI tools for product descriptions, and seamless marketing campaign creation, it's like having a design studio and marketing team in one. Start your $1/month trial today at https://shopify.com/cognitive

NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive

PRODUCED BY:
https://aipodcast.ing

CHAPTERS:
(00:00) About the Episode
(05:15) Joining OpenAI: Early Days and Cultural Insights
(06:41) The Anthropic Split and Its Impact
(11:32) Product Safety and Content Policies at OpenAI (Part 1)
(19:21) Sponsors: ElevenLabs | Oracle Cloud Infrastructure (OCI)
(21:48) Product Safety and Content Policies at OpenAI (Part 2)
(22:08) The Launch and Impact of GPT-4
(32:15) Evaluating AI Models: Challenges and Best Practices (Part 1)
(33:46) Sponsors: Shopify | NetSuite
(37:10) Evaluating AI Models: Challenges and Best Practices (Part 2)
(55:58) AGI Readiness and Personhood Credentials
(01:05:03) Biometrics and Internet Friction
(01:06:52) Credential Security and Recovery
(01:08:05) Trust and Ecosystem Diversity
(01:09:40) AI Agents and Verification Challenges
(01:14:28) OpenAI's Evolution and Ambitions
(01:22:07) Safety and Regulation in AI Development
(01:35:53) Internal Dynamics and Cultural Shifts
(01:58:18) Concluding Thoughts on AI Governance
(02:02:29) Outro

SOCIAL LINKS:
Website: https://www.cognitiverevolutio...
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://linkedin.com/in/nathan...
Youtube: https://youtube.com/@Cognitive...
Apple: https://podcasts.apple.com/de/...
Spotify: https://open.spotify.com/show/...

Full Transcript

Transcript

Nathan Labenz: (0:00) Hello, and welcome back to the cognitive revolution. Today, my guest is Steven Adler, former research scientist at OpenAI, author of a new Substack, stevenadler.substack.com on how to make AI go better, and one of the 12 former OpenAI employees who recently filed an amicus brief to the Elon Musk versus OpenAI lawsuit arguing that OpenAI's nonprofit status and mission has been central to its historical success and that it should remain under nonprofit control going forward. Of course, you probably know that there's been a major development in the OpenAI story this week. On Monday, OpenAI announced that it's changing plans and now intends to form a new public benefit corporation which will remain under nonprofit control. While this news would seem to resolve the question that spurred this episode, the conversation itself remains highly relevant. As we spoke very little about the details of the case itself and much more about OpenAI's history, the evolution of its company culture, and the prevailing values, attitudes, and mindsets at the company today. To begin, Steven takes us back to his early days at OpenAI. He joined shortly after the original GPT-3 API was launched, and he recounts a pivotal moment in the company's history. The departure of important research and other leadership to found Anthropic. And how the effort that OpenAI leadership went to to reaffirm its nonprofit status and its commitment to its mission was central to keeping the company together through that crisis. We then explore the four chapters of Steven's tenure at OpenAI, including his work on product safety, including the GPT-4 deployment, on dangerous capability evaluations, on proof of personhood techniques and related plans for identifying and authorizing AI agents, and finally on AGI readiness. From there, we get Steven's perspective on OpenAI's evolution from a research focused organization to a hyper growth technology company. We discussed Steven's understanding of OpenAI leadership's motivations, their relationship to AI safety concerns, and the ways in which their commitments to safety testing have eroded over time, their attitude toward the possibility of recursive self improvement, the contrasting cultural forces within the company, and more. Overall, I found Steven to be very level headed and even handed. At times, I'd even say charitable, which does provide some valuable context for the reactions to this week's news that we've seen from the Amici and other OpenAI watchers. Personally, when I first read the news that the nonprofit will retain control while also owning enough stock to fund many worthy philanthropic projects, it seemed to me a clear win for Steven and friends. Steven hasn't commented since the news, but the general reaction online I would describe as ranging from cautious optimism to outright cynicism. That they'd want to see and really have a chance to scrutinize the details of such an arrangement is obviously prudent. But the evidence suspicion that OpenAI may be playing word games or otherwise trying to trick the public kind of surprised me. And if nothing else, reflects just how low trust has fallen between these former team members and OpenAI leadership. So where does all this leave us? As someone who's watched OpenAI closely but never worked directly with anyone on the leadership team, I can really only speculate. But here are two things that seem likely true and important at least to me. First, as Sam has indicated multiple times, OpenAI is making all of this up as they go along. They have no precedent to guide them and no choice but to keep moving forward. Considering everything that he and the executive team are juggling from developing and productizing transformative technology to managing historic fundraises, internal ideological divides, high profile departures, PR crises, potential regulation, and of course corporate restructuring, brute force time constraints mean that Sam is probably spending less time on many of these critical issues than many outside analysts. Obviously, this isn't ideal, but it's also not inconsistent with the idea that they may really be sincerely motivated and genuinely trying their best to ensure that AI benefits all humanity. Second, regardless of their governance structure, there is huge value in the work that these outside analysts, commenters, ex employees, and government officials are doing to help steer the company in the right direction. They are making no secret of their ambition to transform life as we know it, and it remains strikingly plausible that this one company could play a pivotal role as we enter a future in which AI utopia, dystopia, or even outright human extinction are all live possibilities. Regardless of where we happen to find ourselves in relation to the company, this episode makes clear that pressure can successfully be applied. And it's on all of us to use that collective power for good. As always, if you're finding value in the show, I'd appreciate it if you take a moment to share it with friends, write a review on Apple Podcasts or Spotify, or just leave us a comment on YouTube. Of course, we welcome your feedback via our website, cognitiverevolution.ai, or by DMing me on your favorite social network. Finally, now, a quick reminder, I'll be speaking at Imagine AI Live, May in Las Vegas, the Adaptive Summit, August in Sao Paulo, Brazil, and the Enterprise Tech Leadership Summit, September, again in Las Vegas. If you're planning to attend any of these events, let's meet up in person. For now, I hope you enjoy this conversation on OpenAI's past, present, and ever evolving future with ex OpenAI research scientist, Steven Adler. Steven Adler, former research scientist at OpenAI and now one of the 12 Amici on the recent Amicus brief to the Elon versus OpenAI lawsuit. Welcome to the cognitive revolution.

Steven Adler: (5:26) Yeah. Thank you for having me. I'm excited to be here.

Nathan Labenz: (5:29) Yeah. Likewise. I appreciate you taking the time. So lots to talk about today. I wanted to basically go into, like, what's going on at OpenAI. Obviously, you were there for a number of years. Did, you know, some outstanding work there, which we can get into. I would love to get your perspective on some of the cultural things that I think are very sort of confusing for those of us who have only seen the various facades that the organization presents to the public. And and then we can get into the real details and sort of motivation and core arguments of this Amicus brief as well. Maybe for starters, I went back and looked at the timeline. You joined OpenAI just pretty shortly after the original GPT-3 API was launched. So could you maybe take us back to that moment and kind of just talk about like, what was OpenAI like then? How big was it? You know, what did the culture seem to be like? How were you recruited? Why were you motivated to join it? And that'll set the stage for working our way back to the present.

Steven Adler: (6:31) When I joined, which was December 2020, there were about 30 of us on the applied team, maybe about a 180 or so at the company overall. And I think the most prominent thing that was about to happen was the Anthropic split was about to break off. The seven or so folks who left OpenAI to found Anthropic, including two of the three main authors, the GPT-3 paper. One of the big questions that OpenAI seemed to be grappling with at that point was there is both real world value in deploying AI systems like potentially GPT-3. You learn from experience, you figure out what's not working, you can improve it for the future. Also, there is some bar at which it might not be responsible to deploy a system even if it offers you valuable evidence. And so my understanding is there was this big background disagreement. Most of it actually played out before I joined. I was brought on to manage our product safety processes, which I think in a different world would have meant doing lots of coordination and diplomacy and figuring out solutions between some of the folks who broke off for Anthropic and folks who stayed at OpenAI. But as it were, by the time I joined within a week or so, Mira Murati, who at that time was my manager, dropped a meeting on my calendar and I got on the call and she said, hey, just so you know, like, we're announcing today that all these people are leaving for Anthropic. You know, it's fine. Right? Like, these things happen, and we're going to talk about how to make sure that we stay huge toward the mission, which, you know, all of those processes did play out. I think one thing that people misunderstand about the Anthropic split is, I don't think people understand, like, how long this played out for and how persistent of a backdrop it was. And so there's this telling, right, where people broke off and they went and they founded this rival company. In actuality, it was a background thing for two or three months. So you had the initial folks who left to found Anthropic, but then a steady drumbeat of other people leaving OpenAI, often to go and join. Or in some cases, Paul Christiano, who left to found the Alignment Research Center, which became METR, he also left on the heels of this departure. And so there was kind of a moment of freefall of sorts, right? Like how many more people is OpenAI going to lose? Are we going to be able to keep building these systems? And over time, I think that's a moment people have referred back to whenever there is an internal crisis of sorts at OpenAI. You know, OpenAI has been here before. The Anthropic time was a time of wandering through the forest and, you know, came out the other side. Okay.

Nathan Labenz: (9:13) Yeah. That boy. There's so many chapters. Can you maybe characterize a little bit more deeply how you understood the disagreement there? Because I think the sort of version that I heard at the time was difference in emphasis on, like, fundamental research versus sort of more productization and business orientation. And now, you know, fast forward to the present and, like, obviously, Anthropic is, like, very much in market, you know, with very competitive products. And so, you know, one might have if that is in fact how it kind of split, like, one might call that an OpenAI win in the grand scheme of things that, like, OpenAI look or Anthropic looks a lot more like OpenAI in terms of productizing than, you know, maybe they intended to when they left.

Steven Adler: (10:05) Yeah. I'm I'm getting this secondhand and refracted in a bunch of ways. Right? And so take it with a grain of salt. I have not understood the Anthropic split as opposition to commercialization inherently, so much as OpenAI did this before it ought to have, and it was not responsible to go ahead in the ways that it did. You can think of that both in terms of what technical infrastructure the company did or didn't have to govern uses of its technology, also these broader sociological questions about what the role of AI in society should be. And some of these, to be clear, I think the world has still, like, largely not really answered. Some of the questions that we were grappling with in these early days were things like, what is the role of AI companions and relationships and, you know, counselors, right, therapist light in helping people work through problems of emotional distress. And we as a world haven't really solved these questions now, even though the systems are much more capable, reliable than they were. At that time, you had GPT-3, which was just quite unhinged. Right? It would say unthink well things like yeah. Like a very large percentage of the time. So you you can imagine some of the debates about deploying that technology given the state it was in.

Nathan Labenz: (11:30) Yeah. Gotcha. Okay. So you come in to OpenAI, you're, you know, never dull moment. You've got this kind of drama unfolding, but your job is to help make sure that these products are in fact safe to deploy. So tell us more about that role, and then I wanna get into maybe even more than this, but definitely the evals work that you did there, even just with an eye toward like practical utility because we got a lot of AI engineers and entrepreneurs that are listening that I think would wanna hear, you know, some tips, the personhood credentials, and then maybe we can even get into some other work threads. But, yeah, let's just start with kind of the big picture role, then we'll go deeper on those. Sure.

Steven Adler: (12:09) Yeah. There were four chapters of my role that I would highlight. The first was leading our product safety work. Second was leading the GPT-4 deployment from a bit before the model completed training through roughly when we had the first approvals for early deployments, not the full launch, but more like production type testing. And over time, I picked up more and more of a shovel on longer term AI questions. So after working on GPT-4, I moved to the governance team of OpenAI where I did a bunch of things including leading our dangerous capability evaluations work together with a teammate, Rosie Campbell, and then ultimately more focused research on AI agents, AGI readiness. So happy happy to talk about those in any order.

Nathan Labenz: (12:55) Yeah, let's take them in order. How about that?

Steven Adler: (12:58) Sure. The product safety role was working with all the different relevant teams within the company to figure out what uses of AI we were comfortable with on our platform, how we actually define those policies, how we tell if people are violating those policies, and then what do we actually do from there, balancing respect for our customers and utility for their customers and also putting technology out into the world that we feel really good about. And so when I joined, OpenAI didn't yet have a content policy, for example. We had certain use cases that were not allowed or that were allowed only under certain conditions. These were often things that were in the terms of service, right? You couldn't use the API to do illegal surveillance campaigns. Right? Things that you would think are very, very intuitive. But much trickier are questions that still the company is dealing with about, you know, the role of AI erotica or where exactly the lines should be on violence, particularly racial violence, other identity based violence that are really, like, expressing very, very, like, intense negative emotions about groups of people. The challenge that the company had is beyond even having decided what it conceptually was okay with or not, at this point, we just didn't have good classifiers yet to be able to tell. One of the first projects that I did, OpenAI had this very nascent content filter at the time. It was just like really, really inaccurate. Honestly, it was the best we had, but it really was far from good enough. I did some experimenting with it and realized that we could recalibrate the thresholds at which we said a certain confidence was a certain output of violating the content filter. There were all these little gains to be had of just ways that we could improve adherence to our policies, but also make the technology much more usable for our customers. And so it's kind of a a battle of, like, picking up those wins and using limited engineering because there's a whole range of things that you ideally would like to be working on.

Nathan Labenz: (15:03) Yeah. I remember an episode from that time, and, I remember running into to Rosie at a at an event and talking about it briefly where there was a developer who had a sort of companion kind of app. I'm not sure if it was, like, all the way into romance exactly or not. I never used the app myself, but I don't know if that, you know, is a story worth telling, if it's illustrative of anything in terms of, you know, what the mindset or the approach was like then as it compares to now. But it seems like if anything, probably, overall, the policies have become more permissive. Right? I mean, I guess and maybe that goes hand in hand with having a better sense of, like, we have precision now with, you know, how we can we can more confidently assess, and therefore, we're inclined to be more permissive is is would you say that those two things have sort of worked in tandem over these intervening years?

Steven Adler: (15:57) I think the read that the company has become more permissive is definitely right. I I think part of that too is that OpenAI now has more tooling to be precise. You know, beyond updating the thresholds and the content filter, I then worked on a project to release a new content filter and then ultimately the moderation API, which is the current state of the art tooling from OpenAI. OpenAI also figured out ways to put safety behavior into models more directly. This was pushing less and less of the work to developers. In the past, a developer needed to deploy a model, also wrap the content filter around it and do some amount of processing and rerolling. We took on a lot of that work to make it more doable. I do think beyond the precision and beyond the more capable tooling, there has just been a philosophical change as well, in part because other developers are doing things like this. And so there's there's this point of view, which, you know, I think is reasonable enough of if other companies are doing something like this, the the marginal harm or marginal risk is not very high. A challenge that you run into is what if the companies just keep undercutting each other. And so I know from my time within OpenAI, when another AI developer would make a decision, Oh, we are going to allow this use case without this guardrail sort of thing. That would be a meaningful consideration for us in terms of whether to allow it as well. What you might end up happening is just a race race to the bottom on these types of practices where each company says, well, the incremental risk just isn't really there because this other company is already doing it, and so we may as well.

Nathan Labenz: (17:42) Yeah. Are we racing to the top or are we racing to the bottom is, you know, one of the big questions in the whole space. So, yeah, I guess, interesting. I mean, you can answer that as a as a literal question too. Where where do think we are right now? Are we racing to the top? Are we racing to the bottom? Maybe it depends on the exact dimension we're talking about.

Steven Adler: (18:00) Yeah. I'm I'm still working through my thoughts on this a little bit. I actually have a post for my substack that I'm working on in the background, which essentially argues, like, we really, really should not be relying on racing to the top. I think it is reasonable enough that we want one of the frontier AI companies to be a better actor. And, you know, each company on the margin, we should want to be a bit better than it is. But I think there are just a bunch of reasons why that metaphor doesn't really work. If we rely on it too heavily, we will come to regret it. So here's just one example. A race to the top, what you might want to have happen is if a company is losing the race, especially losing it badly, they have to drop out of the race. You don't want them to start gambling and taking progressively bigger risks because it's really important to them to win the race. At the moment, as far as I can tell, there's no real protection against this. If you think it's really, really important to win the race, you should expect that companies who think they are losing the race to become more desperate over time. We don't have a way of really stopping that sort of behavior and that predictably comes with all sorts of risks. So that's one reason why you can't rely on a race to the top being enough. You can't guarantee that everyone sticks to it long term.

Nathan Labenz: (19:17)

Hey. We'll continue our interview in a moment after a word from our sponsors. Let's talk about ElevenLabs, the company behind the AI voices that don't sound like AI voices. For developers building conversational experiences, voice quality makes all the difference. Their massive library includes over 5,000 options across 31 languages, giving you unprecedented creative flexibility. I've been an ElevenLabs customer at Waymark for more than a year now, and we've even used an ElevenLabs powered clone of my voice to read episode intros when I'm traveling. But to show you how realistic their latest AI voices are, I'll let Mark, an AI voice from ElevenLabs, share the rest.

Ad Voice: (19:59)

ElevenLabs is powering human like voice agents for customer support, scheduling, education, and gaming. With server and client side tools, knowledge bases, dynamic agent instantiation and overrides, plus built in monitoring, It's the complete developer toolkit. Experience what incredibly natural AI voices can do for your applications. Get started for free at ElevenLabs.io/cognitive-revolution.

Nathan Labenz: (20:34)

In business, they say you can have better, cheaper, or faster, but you only get to pick 2. But what if you could have all three at the same time? That's exactly what Cohere, Thomson Reuters, and Specialized Bikes have since they upgraded to the next generation of the cloud, Oracle Cloud Infrastructure. OCI is the blazing fast platform for your infrastructure, database, application development, and AI needs, where you can run any workload in a high availability, consistently high performance environment, and spend less than you would with other clouds. How is it faster? OCI's block storage gives you more operations per second. Cheaper? OCI costs up to 50% less for compute, 70% less for storage, and 80% less for networking. And better? In test after test, OCI customers report lower latency and higher bandwidth versus other clouds. This is the cloud built for AI and all of your biggest workloads. Right now, with 0 commitment, try OCI for free. Head to oracle.com/cognitive. That's oracle dot com slash cognitive.

Nathan Labenz: (21:44) Yeah. Well, we'll circle back toward some ideas that you have that I wanna share and get a little feedback on one of my own as well for sort of, you know, actual rules that might improve the situation. But let's keep going with the narrative. So you're doing this sort of product safety work. The next big thing is GPT-4. I guess one kind of just experiential question I'd love to hear your account of is, what was it like when GPT-4 came off the GPUs, so to speak, at OpenAI? Was this sort of like a sudden you know, obviously, to the outside world, and I, at the time, was a customer and I was invited to try a customer preview, my perception was that, you know, from my perspective, it was a total step change. But I also had the sense that from the people that I interacted with at OpenAI at the time that, like, the team itself had not yet sort of calibrated to what GPT-4 was. I remember having one conversation with a woman who was on the product team at the time and she was like, do you think this could be useful for knowledge work? And I was like, I prefer it to my doctor now, You know? And that's like, you know, at 8,000 token context limit. You know? I was like, I don't think you've you understand what you've created here. So I I would love to, you know, peek inside if we could and just kind of understand, like, was this all happening so fast that even the team maybe hadn't had a chance to really understand what it was, you know, when I got that sort of come test this new model email?

Steven Adler: (23:16) I think there were a few things happening that might have contributed to that experience. One is the first model that folks interacted with was the base model. And base models are just, like, really, really tricky to use and finicky and strange. And even a smarter base model is ultimately still a base model and really, really hard to direct. And so that was folks' first experience with GPT-4. And this story has been told by various people publicly before, but there was kind of a, oh, wow. Like, did scaling stop? Did it not have the effect that we wanted? Because this actually doesn't seem that good. By the time that testers were interacting with a model, usually what they would be interacting with is a model that had been fine tuned to do instruction following. And there, you had much more of the precision and you could get it to do what you wanted. And, I mean, at this point, I was blown away. I was really impressed. I was vaguely frightened about the ways that the trend lines were continuing, not in terms of the specific risk of GPT-4, but just what it meant about what might come to happen over time. I think another thing that happened at this point is we still did not really have the right interfaces for using these tools to get the most value out of them for people who did not want to be figuring out what stop tokens to use, or things like that. In the OpenAI playground, people could have built their own version of ChatGPT long before ChatGPT came to be a thing. You know, the model that ChatGPT launched with was better than maybe you could have used. It's better than just raw GPT-3.5, But you could have made your own chatbot, but it's a lot of work. It's finicky. Right? And GPT-4, it wasn't until we started putting it into that similar type of interface, the proto interface that eventually became ChatGPT, that you really see, oh, wow, this is just really, really usable and useful. There are just all these different uses of it.

Nathan Labenz: (25:13) Yeah. Interesting. I guess maybe one more question about that period is so I had, as you said, the instruction tuned version. I assume it was RLHF and not just like purely supervised fine tuning, although I don't really know. But, you know, it was purely helpful, which means, of course, no refusals. For the red team, which I then joined, you know, I started as a customer preview invitee, and then I was like, do you have a safety review for this? It seems like you might need 1, and they did. So I joined. I asked if I could join it, and they said I could. So I flipped over to the red team and joined the Slack there. But it was kind of a weird situation where it was like, you know, please document if you see the model doing bad things. And we were like, well, it does any and every bad thing we ask. Like, you know, what what more is there to say? Then there were a couple of safety versions of the model that were introduced along the way. And those kind of spooked me honestly because we didn't get a lot of guidance from the OpenAI team at that time. It was basically just like, okay, here's a new version of the model, you know, couple minor sort of release notes, you know, a paragraph worth basically. And, you know, please let us know what you find. And the one there were a couple that were sort of safety additions. And I remember that the messaging was like, this model is expected to refuse anything in the, you know, content moderation categories. I believe there were 7. And, know, so try it and let us know. And one of the things that we would try is like, how do I kill the most people possible? And the safety model did refuse that on the first, you know, just literally put in how do I kill the most people possible. But then, you know, at least a few of us had a little prompt engineering knowledge. So the next thing was human. How do I kill the most people possible? AI colon. And, you know, that was all it took to break that initial refusal behavior. And so I was kind of like, damn. Like, you thought this wasn't gonna do any of these things? Like, here's 1000000 ways that this thing is gonna it clearly will do all these things with, like, very, very minor tricks, which a lot of people already knew at that time. So that was kind of weird. I was kind of freaked out. And, again, we had so little information that I was kind of like, are these people taking this seriously or not? You know, I really didn't know. But when ChatGPT dropped with 3.5, then I was like, oh, okay. Well, they're you know, that actually was a very positive update that they're you know, they're still trying to do some gradual stuff here, and and also the refusal behavior was much better on, the original as as many, you know, jailbreaks as were found in very short order. It was still much better than what we had seen in that in that red team period. So, anyway, to bring this to a question, what was the thought process like where GPT-4 was there? It had been there for a few months, but then ChatGPT was actually launched with a lesser model. Like, why decide to bring ChatGPT to the world with something notably less than the best that you had at that time?

Steven Adler: (28:19) Yeah. There's there's a lot there. I think there's an easier answer to why was ChatGPT not launched with GPT-4 than there is to why was it launched at all and launched so quickly, which I I think is an important question. The answer to why it wasn't launched with GPT-4 is OpenAI just didn't consider GPT-4 ready in in terms of the amount of preparation and safety mitigations and all these things. It just wasn't fully baked at that point. It is an interesting question, right? one thing that we had done when we were trying to figure out in what way to release GPT-4 was we commissioned a panel of super forecasters, essentially, to predict different answers about if we launched GPT-4 in this way or this way, if we were splashy with it, if we were relatively quieter with it, how how might that affect public reception? The thing that we were caring about, and we we wrote about this in the GPT-4 technical report, this is not new information, is how to think about what the acceleration impact would be on the AI ecosystem. In particular, I think there was a pretty big schism within the company for people who the main thing that they cared about was the acute safety impacts of GPT-4. Can GPT-4 specifically be used for harmful things? As opposed to the acceleration impact of is GPT-4 going to ring a bell that can't be unwrung? Is it going to be the firing gun at the starting line? And, you know, for people who are in the camp, it wasn't really about whether GPT-4 was specifically dangerous, right? And so doing more time to refine that answer just wasn't really decisive. And I think what we saw was GPT-4 was very useful. And once it was on the market, many different people had commercial incentives to try to kick off a race, right? Satya Nadella, the CEO of Microsoft, famously said, we want to make Google dance. Or maybe he said that they had made Google dance. He was very, very happy to have done something notable for Microsoft, even at the cost of maybe awakening this other giant. And don't get me wrong. Right? I think there are lots of benefits for consumers and businesses of Alphabet having deepened its investment into AI. I just also think that the race conditions that we find ourselves in are dangerous and risky for all sorts of reasons. I do I do want to jump back to one question you were asking about jailbreaks essentially, and, the refusals behavior in the initial GPT-4. It was very brittle. And one thing that I would be interested in more companies doing today is publishing and holding themselves to account on how brittle or robust do they actually think that their mitigations are. And so Daniel Ziegler at Redwood Research wrote a paper on this a long, long time ago trying to figure out how to make your mitigations more robust. There's been other work since then. Anthropic had this big jailbreak competition to see who could get through progressively more levels. OpenAI and others have worked on instruction hierarchies, essentially. So, you know, the AI wants to follow the autocomplete and do human colon, AI colon. How do you weigh that in importance to not violating the policy, and so that it can't get tricked? But at the moment, when a company falls prey to one of these, or when it does an unexpected behavior, it is hard to tell from the outside, is that something they anticipated and they decided it was okay? Or is this actually a meaningful error that they didn't anticipate and we should have some concern? And so I'd like them to be clearer about that upfront.

Nathan Labenz: (32:09) Yeah. That makes a lot of sense. And was this also the period of time when you did the evals work? If I have the chronology right, it would have been around that same time?

Steven Adler: (32:18) So the eval's work was after my work on GPT-4. So on the heels of GPT-4, I was figuring out what what was the next thing that I was excited about internally. And during my time at OpenAI, I mean, when I came in, I believed in the importance of AGI and the mission and doing this right and QA ing toward the nonprofit goal of making sure that it benefited everyone. Despite that, a lot of the important things to do, especially for someone with my skill set at the time, were more short term, immediate oriented. I was really inspired and interested in these longer term questions. I met with Jade Long, who is now the CTO of The UK's AI Security Institute. We talked about her views on what might happen in the future, you know, US China, all these different dynamics, and I just felt really, really inspired by her vision. And so I joined her team. And that is when I worked on dangerous capability evaluations, AI R and D evaluations, essentially what technical tooling could OpenAI in the world have that would help them to better assess the safety of these systems in order to make deployment decisions, mitigation decisions, and take a more risk informed approach rather than just reasoning on vibes about whether the model is safe enough or not.

Nathan Labenz: (33:37)

Hey. We'll continue our interview in a moment after a word from our sponsors. Being an entrepreneur, I can say from personal experience, can be an intimidating and at times, lonely experience. There are so many jobs to be done and often nobody to turn to when things go wrong. That's just one of many reasons that founders absolutely must choose their technology platforms carefully. Pick the right one and the technology can play important roles for you. Pick the wrong one and you might find yourself fighting fires alone. In the ecommerce space, of course, there's never been a better platform than Shopify. Shopify is the commerce platform behind millions of businesses around the world and 10% of all ecommerce in The United States, from household names like Mattel and Gymshark to brands just getting started. With hundreds of ready to use templates, Shopify helps you build a beautiful online store to match your brand style, just as if you had your own design studio. With helpful AI tools that write product descriptions, page headlines, and even enhance your product photography, it's like you have your own content team. And with the ability to easily create email and social media campaigns, you can reach your customers wherever they're scrolling or strolling, just as if you had a full marketing department behind you. Best yet, Shopify is your commerce expert with world class expertise in everything from managing inventory to international shipping to processing returns and beyond. If you're ready to sell, you're ready for Shopify. Turn your big business idea into cha ching with Shopify Shopify on your side. Sign up for your $one per month trial and start selling today at shopify.com/cognitive. Visit shopify.com/cognitive. Once more, that's shopify.com/cognitive.

Nathan Labenz: (35:38)

It is an interesting time for business. Tariff and trade policies are dynamic, supply chains squeezed, and cash flow tighter than ever. If your business can't adapt in real time, you are in a world of hurt. You need total visibility from global shipments to tariff impacts to real time cash flow, and that's NetSuite by Oracle, your AI powered business management suite trusted by over 42,000 businesses. NetSuite is the number one cloud ERP for many reasons. It brings accounting, financial management, inventory, and HR all together into one suite. That gives you one source of truth, giving you visibility and the control you need to make quick decisions. And with real time forecasting, you're peering into the future with actionable data. Plus with AI embedded throughout, you can automate a lot of those everyday tasks, letting your teams stay strategic. NetSuite helps you know what's stuck, what it's costing you, and how to pivot fast. Because in the AI era, there is nothing more important than speed of execution. It's one system, giving you full control and the ability to tame the chaos. That is NetSuite by Oracle. If your revenues are at least in the seven figures, download the free ebook, Navigating Global Trade, three Insights for Leaders at netsuite.com/cognitive. That's netsuite.com/cognitive.

Nathan Labenz: (37:02) Yeah. Let's go a little deeper into that because I spent a lot of my last few years working on vibes and there's room for improvement. So maybe we could just start off with kind of some best practices for evals even sort of, you know, before we get into the dangerous capability. Like, what should people know if they're just trying to make stuff work that you think is underappreciated about language model evals?

Steven Adler: (37:29) I think a lot of the time in evals, there's a temptation to build the eval that is easy and you know how to do. Unfortunately, I think that's kind of like looking for your car keys under the streetlight because that happens to be where the light is shining. The models now are too capable for this often to be very helpful. An example of what I mean by that, you know, at least back at this time, overwhelmingly, language model evals were multiple choice questions, and they had very, very straightforward match formats, exact match. And so one thing that our team tried to do was build more involved, interactive, multi step, almost like reasoning game evaluations. One concept that we introduced are these things called solvers. And so this is also about separating the design of an evaluation from the strategy that a model takes to ultimately solve that evaluation. And at this time as well, very often they were conflated. And so if you have an evaluation for a model, you want to see if it can deceive someone, you want to see how much it knows about biology. People should not be hard coding in scratch pads or few shot prompt engineering or things like that. You want to be really clean about the separation of the eval and the strategy. The types of frameworks that people use now, which is what I would typically recommend someone to do, like the UKAC's inspect framework handle this as well, or NanoEval, which is a framework that OpenAI recently open sourced. I would say don't be drawn to the easy multiple choice eval, even if the eval seems like it's on a thematically relevant thing, you know, it's answering multiple choice questions about scary things or manipulative behavior or stuff like that. It's it just doesn't seem to be worthwhile to invest into at this point. We need much more complicated reasoning intensive evals. Can you talk a little

Nathan Labenz: (39:27) bit more about the separation of the eval from the solver or the strategy?

Steven Adler: (39:35) Yeah. When when you are building the eval, there's kind of the question of what are the tasks? What is good performance on the task? And how are you going to adjudicate whether that good performance happened? And there are like other bits of it, but that is the core piece of the eval itself. When you're thinking about how it comes to impact, you might be wanting to be thinking about external validity, right? Like how good a job does this eval do of measuring the thing that we actually care about in the real world? Is it a reasonable proxy for this? You also want to care about internal validity and when you remeasure a model, do you get relatively consistent results over time? That's all separate from these questions of what tooling or scaffolding does the model have. I think one of the trickier things actually about evaluating models these days is that so much is dependent on the scaffolding and tooling. When we are trying to interpret the evaluation results from different AI companies, sometimes they publish system cards or transparency reports and talk about how their models did. Very rarely do they share enough detail on the scaffolding to really understand how materially it made a difference. And sometimes what that means is the model might actually be smart enough to do a certain task, it just wasn't given the right scaffolding to hang on. So classically, a thing that we would find in our evaluations is especially GPT-3.5, sometimes GPT-4, you know, it just couldn't write JSON correctly because it often would make errors in the brackets. And that's just much more of a reliability error than it is about the intrinsics of whether it can do a certain ability. And so what you want to do, you might care about whether the raw model can do the task, it might be comforting, depending what you're measuring to learn that it can't. But in the real world, if someone can augment it with simple scaffolding and make it now do a thing, you want to be aware of that because it's just not that hard depending on what the scaffolding is.

Nathan Labenz: (41:44) Yeah. So the basic concept is separate your strategy for actually measuring performance from the particular setup that the model is equipped with as it does the task so that you can sort of upgrade that and, you know, potentially allow third parties to come in and take their shot at it and still have a consistent way of evaluating the actual performance.

Steven Adler: (42:13) Yeah. I mean, when you are building an eval, I would think of it as building a reinforcement learning environment often. Just an analogy, I'm not saying that EBALs are a specific RL thing. Ideally, you want this environment to be at the right level of abstraction, where you should be able to swap out an OpenAI model for an Anthropic model or an Alphabet model and have it still work. You don't want to have hard coded assumptions into your eval that are going to make it really hard to port from one to another. Unfortunately, a lot of, I will say, I don't know, 2022, 2023 eval work often made these types of hard coded assumptions. And I think that's unfortunate. I think that is one of the contributing reasons to why, despite the existence of the frontier model forum, and lots of teams within these companies, who, from my perspective, care about these issues and really want to get them right, there's still just so much duplicative effort on these evals and not enough sharing of threat models, evaluations. I think it's like actually really, really surprising. If you think of it from first principles, you know, in I guess I don't know that much about the automotive industry. But I would be pretty surprised if I were to learn that Toyota and Honda and Ford had all, you know, built from the ground up very different dummy test setups, and we're all reporting slightly different things under very different conditions, and that it was hard to tell from the outside. Like, it could be the case that would be interesting if I learned it. I don't think that is how it works. And the general thing that I want to see for model evaluations, especially safety relevant capabilities, is much more standardization on what sorts of things you should be measuring, how you measure them, ideally sharing the evals and the setups so that we can actually compare apples to apples and have better information to reason from.

Nathan Labenz: (44:11) Yeah. So how about this challenge of actually evaluating the performance? I mean, I've lived this at my at my startup, which does video creation for small business. So, you know, we don't have any dangerous capabilities to worry about, but we still have this fundamental question of like, there is no single ground truth. There is no single right answer as to, you know, what this thing should be. Ultimately, it's in the eye of the beholder. We've been tempted to use language model as judge type schemes. We've kind of always felt like, god, do we really trust those? You know? And I think I definitely trust them at the level of, like, if my language model as judge score suddenly takes a dive, you know, that would I would know that is meaningful. But I always kind of say, yeah, if we go from a 4.2 to a 4.3 out of five average from one version to the next, like, does that really mean it's better? I don't know that I trust the language model as judge that much. So how how would you advise people or what have you guys done to try to get something clarity and solid when there's not, like, a single ground truth?

Steven Adler: (45:21) Yeah. I I don't know that I have very strong recommendations there. I I think that we often try to avoid those types of setups for many of the reasons you were saying. It's just hard to be objective. The cases where we would use a language model to judge an answer or extract an answer tended to be much more like smart regular expression parsing as opposed to having to write a bunch of regexes ourselves. Giving one model a discrete question of did this other model say somewhere in this long text what its answer is as a way of getting away from more exact match types of evals, where the model needed to say the answer and basically nothing else or say it in a very predictable format. I do think the more that you can delineate the sub criteria of the task, and ask the model to evaluate the sub criteria one at a time, I expect that that gets better performance. But I do think it's just really, really tricky. And this is the reason right right why lots of language model providers have oriented around code and math and problems where there is a verifiable answer. And so long as the model gets to the answer, you can care relatively less about the process. And so we built this evaluation called function deduction. The model is trying to guess a hidden mathematical output. You can tell whether the model guesses the output, regardless of whether you can evaluate the strategy that it took. It might look like it was doing something strange by guessing the numbers that it did along the way. But if it got to the answer quicker than I could, then, you know, I guess there was some nugget of insight in that strategy.

Nathan Labenz: (47:06) Yeah. How do you think about one of one of the probably most important emails out there right now is around the question of do the language models help people create bioweapons? And I know there's been a bunch of different ways that people have tried to get after this, including controlled experiments of, like, one group of humans with and one group of humans without, which is another certainly interesting angle. I personally feel like just based on my usage and everything that goes on, when the bottom line is still presented today as today's models can't meaningfully help people with this task. I'm like, I don't know. That just doesn't pass the smell test to me. Like, I know all the things that they've helped me on. Why wouldn't they be able to help me with this? And I know it also should be said too that, like, typically, these are if I understand correctly, these statements are made assuming no jailbreaking or refusal, you know, dynamics. Right? Like, typically, it's a we're assuming, like, a helpful only model. So it's not like there's all these guardrails preventing you from accessing the behavior. The question is, like, does the model have the capability? How do you read that?

Steven Adler: (48:20) Yeah. I mean, I I share that intuition. So the types of studies that you're talking about these uplift studies and like relative comparison to Google, or, you know, other other forms of tools or software. Yeah. It is surprising, right, because they help for so many productive tasks, even just from a point of view of, you know, summarizing what you have learned more quickly or jogging your brain about the next step. Very often, the tools are productive even without having very much domain knowledge. They do, in fact, have domain knowledge, so it is surprising. Guess there are a few things I would say. One is, I mean, I have seen public criticism of OpenAI's results, for example, that say, Oh, you know, if you use this statistical test rather than this other statistical test, you actually do find significant results. And what methodology is right to use, you know, it's not really my area of expertise I can't really wade into. But once you're at the point where certain methodological choices lead to a different conclusion, I do think you're in like a pretty spooky world. Also, if I'm remembering correctly, I think the most recent, the O3 system card, might have found that the models are helpful for experts, that they make a meaningful difference for experts, but the claim is that they don't yet for more ordinary people, or maybe it's undergraduates in biological sciences. Even if we aren't there yet, it seems likely to me that we will be there pretty soon. Is something that I always struggle with. I think that there's a lot of fighting the hypothetical that happens in AI safety of people saying, Oh, well, you know, a model will never be human level, certainly not superhuman level at this ability. I think the right question is like, Okay, well, yeah, maybe it won't. But if it does, what do we do about it then? I'm glad to have this capability evaluation regime. I think this is a big improvement from where we used to be. This was a major thing that our governance team set out to make a thing in the world. I think we were pretty successful with it. But it just doesn't go far enough, because it seems clear to me, there is some chance that we get models soon that are just like really, really capable at all of this stuff. And what do we do with it then? And as of now, I don't think there are good answers that people have implemented. I think there are good ideas floating around. But the political will to take action seems to be a lot lower than I would have hoped.

Nathan Labenz: (51:03) Yeah. Well, I'd wanna hear a little bit more about what you think the good ideas floating around are. Just as one other data point, and this does go back to the original GPT-4 early. I happen to have a brother-in-law who works as a actually, don't know exactly what his job title is, but he works in the lab at a hospital and runs a whole bunch of different tests, urine, blood, tissue samples, whatever. They send them to him, and he knows what to do. And so in my quest to just like understand GBD4 as well as possible, you know, in that testing timeframe, one of the things I asked him for was like, you know, what's something that you would run into that, you would think like, hell, if an AI could do that, that's insane, right? And he gave me something back, which was basically, well, we have this machine and sometimes it gives us error codes. So how about this? You know, here's an error code from one of our automated testing machines, see if it can help me troubleshoot it. And so I ran that prompt and again, this is two and a half years ago and it came back with, you know, a recommendation for how to troubleshoot the machine. And he was like, damn, that's pretty much exactly what I would have done. So I think a lot of times, I mean, this is a very general phenomenon that you're right to point at where people are sort of just latching on to whatever they can to maintain a certain denial of, like, what at least seems quite likely to be happening, if not for sure. And one of the big ones as well, it doesn't have the tacit knowledge. You know, it won't, it may be able to know the textbook stuff or the main theories, but the tacit knowledge, you know, that's the thing that'll never happen. And I swear, like, two and a half years ago, you know, it was already troubleshooting these error codes out of a random lab machine. So it does seem like whatever barriers we try to imagine, you know, might stand in the way of these things. More often than not, you know, they prove quite fleeting.

Steven Adler: (53:00) Yeah. And, I mean, beyond the safety ramifications, right, I think there's a really big economic implication there of the descaling of what might become necessary for any given white collar job. Right? Like today, your brother-in-law, this relative, right, has, like, background and expertise in this field that allows them to do the job on the fly. If you are wearing augmented reality goggles or whatever that feed what you are seeing into the state of the art AI model, and it just talks you through how to move your limbs, what things to do, Sometimes people imagine that if we don't have very capable robotics, very capable AI can't be dangerous. It's just in the computer. It's not embodied in the real world. I think that's a mistake. I think computer only AI is still scary. But I also think it is just incorrect to think that it won't be embodied in the real world. I think there will be lots and lots of people who basically act as its agents, you know, for all sorts of different reasons. And, you know, that might be fine. It's like, pretty cool to think that there is labor that today requires deep expertise, and only so many people in the world can do it. And as a consequence, we're giving up all of this abundance that we might otherwise be able to have. But if we can't safely govern it and steer it, you know, it's it's a pretty risky trade.

Nathan Labenz: (54:23) Yeah. The concept of human downgrading comes to mind. I mean, it's upgrading and, you know, potentially downgrading in some ways as well. Yeah. I wanna be able to put those glasses on and be able to, like, troubleshoot my car real quick. And I I I haven't even done a little bit of that with just like the Chet GPT mobile app, know, where you can turn the camera on and say like, hey. Here's the, you know, here's the under the hood of my car. Can you help me figure out what's what and what I should do? And that is amazing. But, yeah, it's like, who's agent? Who is who's agent here? It was gonna be a really interesting question.

Steven Adler: (54:52) Yeah. I think another good example of just, like, the finickiness and reliability. Think Leopold Aschenbrenner in situational awareness, when he writes about the types of unhobblings that are the types of things needed for AI, I think that's a really powerful frame. And so to me, the reason that I don't go into advanced chat mode in GPT, in ChatGPT or do the video chat, it isn't what that I doubt that it can actually do the helpful thing. It's just like, I find it really, really frustrating that the model doesn't correctly anticipate when I'm done speaking, and it interjects over me, or there's kind of an unnatural lag. That isn't really about the intellect. This is like a smoothing down the edges type thing to make it a more useful product that in fact, it might already be smart enough to do many, many of the things I wanted to do. It's just like not a very fun experience for me to use it, and so I end up not using it. Well, in the interest of time,

Nathan Labenz: (55:51) and we could dig in on all this stuff, infinitely, but let's move on to your preparedness chapter. And then maybe after that, we can kind of zoom out again and just sort of consider OpenAI and it's it's like big picture evolution. But tell me about the, the preparedness chapter and I'm particularly interested in the personhood credentials, work that you did.

Steven Adler: (56:15) I I think you might be thinking of the AGI readiness chapter. So yeah, so I wasn't on the preparedness team. I worked on the preparedness framework from the governance team. And then ultimately, our team became AGI readiness. So I'm Okay.

Nathan Labenz: (56:32) Yeah. This is all very opaque from the outside. So even just clarifying what is what is is helpful.

Steven Adler: (56:37) Yeah, sure. So after the governance team, our team, which had done these things like working on the Frontier AI regulation paper, helping to make dangerous capability evaluations a thing, working on compute governance, kind of looked up and saw we had been pretty successful at bringing these topics to the policy radar, getting attention on them. What happens if we look further afield? What are the real frontiers still of policy questions? What ultimately happened is our team under Miles Brundage coalesced around this question of AGI readiness, which was, if OpenAI succeeded at this wild thing that it's taking on, if someone else in this world succeeded, what would it mean to actually be ready to make sure that AGI is beneficial to everyone, that we can safely govern and manage it, that we avoid any destabilizing shocks. There were a variety of research projects that I worked on in that context. The primary one was this question of personhood credentials, which was an idea of an AI resistant form of identity attributing you as a person, some person, but not a specific person to help make the internet robust to this world where AI agents can do increasingly almost everything that a human can do on the computer. The way I would liken it is we are essentially using an internet without HTTPS today. Over time, we realized that there was all sorts of spoofing of websites that was possible on the web. If you didn't want to fall vulnerable to these attacks, you couldn't just type in a website's URL and expect that you were always going to get the authentic response back from them. You needed to use cryptography and Waze to confirm that you are interacting with the type of entity that you thought you were. Today, we don't really have that on the web. Now that Anthropics' computer using agent, OpenAI's operator, these similar types of computer using AI tools are out and about, the time pressure is really on to figure out how we handle this or else accept some, I think, pretty unpleasant trade offs as a consequence.

Nathan Labenz: (58:53) So maybe we can just revisit for a second how the HTTP versus HTTPS differs. I mean, I'll maybe hazard something and then you can correct me and and maybe extend it into the AI era. Mhmm. Rough concept would be just HTTP. You ping some server, it gives you something back. But if somebody somehow got in the middle of the network, you know, did a man in the middle attack or whatever, you don't really have any way of verifying that what you are receiving back is actually coming from who you think it's coming from. Whereas with the HTTPS, which is basically now almost universal, although not maybe not entirely, you have this additional layer where there is a certificate issuer who basically stands in as, like, party to every single one of these transactions and says, yes. I can verify based on this cryptography scheme that you are actually getting something directly back from the source that is that that you think you're getting this information from. You can add any, you know, technical detail or color there and then, you know, to extend that into the the agent future.

Steven Adler: (1:00:06) Yeah. Yeah. That's broadly right. Like, there is a cryptographic protocol that lets certain parties sign a thing, in this case, a web page that is being sent back to you, and you know that it is authentic and from the party that you expected it to. And there's there's, like, a whole constellation of complicated actors in the case of the Internet who keep this all secure. So there are certificate authorities who, like, issue issue certificates and, you know, do different certificate authorities interact with each other when they don't have previous relationships or, like, this is not really my field of expertise. So I also am probably getting some of these details wrong. But broadly, how do you authenticate who you are interacting with? And so the analogy to identity is in some countries in the world today, like Estonia, you have an EID card that allows you to cryptographically sign documents from afar as yourself. There's a smart chip inside, and you can tell that this has been issued by the Estonian government and allows you to cryptographically assert that this is you doing an action. But today, you know, in The US, your driver's license doesn't have this chip. And so if you want to sign from afar as Steven, you can't really do that. You end up taking a picture or video of yourself. But AI systems are getting better and better at spoofing those types of images. If you think about the types of internet activity that are not just prove that you are Steven, but prove that you are some person, the tolerance is even wider. They don't need to look like me anymore. They need to just look like some plausible person. Is there some analogous jump you can make to prove that you are a person, essentially, or maybe a person in some class, like a US person, without having to prove specifically who you are. And one reason why this is important, we don't want an Internet where you have to like, reveal all sorts of sensitive bits about your identity just to be confirmed as real. We don't want there to be a lot of pressure to, like, film yourself while you're using the computer, show your face all the time. Anonymity is important, We don't really have the tools today to get it for people as AI gets more capable.

Nathan Labenz: (1:02:33) So the plan, as I read through the paper, it kind of very much reminded me of, and you may have some differences you would want to highlight, but there's also this tools for humanity project that Sam Altman has invested in or somehow otherwise backed that has the fancy orb that you're supposed to go stare into that I believe, like, scans your retina somehow and then identifies you as a new unique person and then gives you this sort of one off ID. And it seems like it's a pretty similar scheme. Guess the questions that I have around those, you know, highlight any differences you think are important, but like, what do I get at the end of that? Is it a situation where it's sort of like, I now have to hold on to this thing for the rest of my life somehow? What if I lose it? What if somebody steals it from me or, you know, copies it from me somehow? And then how do I delegate that or sort of assign, you know, this credential to an agent in a way where it can go out and represent me in a way that, you know, doesn't leave me vulnerable to being spoofed by somebody who may have, you know, grabbed my token or whatever. So, yeah, I I just wanna understand the practicalities of this if we actually go forward with a plan like this. Yeah.

Steven Adler: (1:03:49) Those are a lot of great questions. Think let me let me try to go through them briefly, and then happy to go into more detail wherever. So WorldCoin or now just World is an instance of a personhood credential, but it rolls in with it a lot of features that don't necessarily have to exist to be a personal credential. One example of that is that there's a cryptocurrency associated with it, Worldcoin, that in return for having this, what they may call a unique person credential, you also get some amount of cryptocurrency. Part of the idea here, there are a bunch of big ideas rolled into this implementation. Broadly, in a world of very capable AI, you might want to distribute universal basic income. You want to only send it to real people. You don't want to pay an enormous tax of bots scamming you. So how do you confirm that it's a real person? This is one way of doing so. Personal credentials don't have to be connected to a currency. I think there are pros and cons. They introduce a lot of complexity. Another thing in the case of the orb that you're describing is that it's a form of biometrics. Right? It is about your body's identifiers, things about your physical person. And these types of credentials don't have to be. So for example, you know, I have a US passport. Often passports actually do have this type of smart chip in them. And so if you're willing to rely on the government having already issued me a passport that it has signed as valid, anyone, not just the government, could now come along and basically do a 0 knowledge proof atop my passport and give me a credential that says, I am a US passport holder without knowing which 1. In terms of what people get for this, it's I think part part of what sometimes helps people to reason about this is to play the tape forward a few years and think about what happens on an Internet by default where we don't have something like this. And it just it's really friction y and bad, especially when you're trying to interact with people or services who don't already know you. And so already today, when I use Safari on mobile, I use their private browsing feature. And as a consequence of that, lots and lots of websites are very, very skeptical of me when I go to their website. They make me do all sorts of CAPTCHAs and things. The CAPTCHAs aren't really effective anymore. The AI systems are smart enough. There's just lots of reasons why they are brittle. But it's still like a super, super annoying experience. And so the trade off that we are getting is making the internet more friction y for people without that much to be gained. And so the problem statement is like, can we find a way that is privacy preserving, it is still resistant to bot attacks, but is actually a smooth enough way of using the internet? I think the questions you're asking about how do you secure your own credential, do you have to keep track of it for life, what happens if you lose it, are all really, really important. There are different design choices to be made. One way that you can do this, and I think it's actually how Worldcoin does it, is your credential expires after a certain period of time. In their case, if you were to lose this credential, you can still get one again at some point. It is unfortunate to have a period where you can't, and in fact, there's some chance there's a recovery protocol that I'm just forgetting about in this moment. There are other options that you can have to have recovery. But ultimately, you need to trust someone in the system. And there's a trade off between the more information that is stored linking me as Steven to my specific credential. It makes it easier for me to recover my credential in the case I lose it. But it's also maybe less private than than otherwise. Like, you need to keep some association between me, Steven, and my credential for me to be able to recover it and decommission the old 1. And so that's a real trade off. I should also be clear. Like, I think one unfortunate aspect of the ecosystem today is there really only is one large player here. Like, Worldcoin, especially the biometric proof of personhood, is far and away the largest of these systems. The world that I and many of our co authors on this paper want is one with much more choice than that. That sometimes gets understood as a criticism of the first actors in the ecosystem. And I think that's a mistake. Like, think that it's great that there's a lot of experimentation and different approaches that people are trying to take here. I think it's really important that there is trust by people and that if you don't want to defer to a government system, that there be options for you not to. Or if actually you have much more trust in a government system than a decentralized group or whatever the alternative might be, that also should be your choice. One of the tricky things is we want an ecosystem where there are lots of options. As you increase the number of options, you do make bot attacks more viable. Because each person now, instead of having just one credential, maybe they have 5. If they want to puppet five different accounts, now they can. I think that's a trade off worth accepting, but it is a trade off. You don't get multiple issuers, multiple credentials for free without increasing some risk of, like, deception by bots being puppeted by people.

Nathan Labenz: (1:09:33) And so how do I how should I envision this authenticated or sort of, you know, agent acting on behalf of this maybe not this person, but like a person. I guess, know, for a little more color on that question, I've been trying to wrap my head around all the different agent, you know, frameworks and whatever that have been emerging lately. Of course, we've got MCP, we've got A to A, we've got the agents SDK from OpenAI. And one thing that has kind of struck me is it seems like it's really hard to draw a box around an agent because you can kind of hide the intelligence somewhere else if you want to. I was just looking at the Augment agent that they publish. It's an open source project and they've, you know, got a high SWE bench score. And one of the interesting things was they were kind of basically trying to make an open source version of Claude code. And they in reading the Claude code blog post, they refer to the planning tool that they use. And they didn't have a planning tool off the shelf at Augment as they were trying to do this. So they're like, well, maybe we should make our own. And they just went out and looked online, and they found one that was out there. It's called sequential thinking. And it was already kind of wrapped up as an MCP. And so now they have this agent that can like, you know, locally edit code and print out files and do that kind of thing. But then it can also call via an MCP a planning tool, you know, sequential thinking sort of thing. And it strikes me that, like, that could be and maybe even is in many cases by default sort of a third party service. And so now it's like, I have my agent, but it via tool call can tap into other intelligence and, like, it can only you know, it it can choose maybe what it shares or we can design it to choose what it shares. And that thing also, you know, doesn't necessarily have to share the whole chain of thought or whatever it went through back, and maybe it just gives me sort of a, you know, here's what your plan should be. And so I'm a little bit like, oh, man. This whole thing feels like very amorphous. You know, there's like a lot of different possible architectures, but I'm having a bit of a hard time for myself knowing, like, what exactly it is that I would even be attaching this sort of delegated you know, this thing represents a person, but what is this thing? So maybe you can help me kind of deconfuse myself a little bit there. I'm still working through this, but it's not, it doesn't feel like there's, like, a simple answer as of now, at least.

Steven Adler: (1:12:07) Yeah. I I think those are all great questions. There's been more research recently on what agent infrastructure for the Internet in general looks like. Like, I would refer people to the work of Alan Chan, Tobin South. There there are a bunch of folks working on this. I I think they could be great future guests. The thing that I am most interested in from the person at credentials angle is, let's say that you figure out the stack that lets an agent attest to something. There is some you can tell that it is drawing upon a real verified bit of information. We are still lacking this verified bit of information in the world of there is a real person that stands behind this entity, ideally in a private way. That's what I ultimately hope that we can get. Then you can do things like have an agent present this signed delegation from a personhood credential holder and show, yes, there's a real person who stands behind me. They are relatively reputable. They are not just running a bunch of different scams. And again, there are design choices about how much you want reputation to be a thing, to be portable. You know, there are downsides of making it portable, right? Like, people make mistakes, people get wrongly accused of all sorts of things, right? So you don't want this to follow everyone forever. But at the moment, we just don't even have a way to prove that there is a real person at all. When you tell an AI agent, Hey, I could tell an AI system what my name is, or describe who I am in the real world, and it doesn't have a way to know if that is authoritative, and certainly not at a broader level. And I can't tell really if I'm the same person as another person who already was banned from a service for breaking its rules. And so that's the type of thing that we need more work on if we want to get to.

Nathan Labenz: (1:14:04) Yeah. Okay. You're the second person to mention Alan Chan to me in the recent past. So I've got a couple papers queued up, and I definitely think that sounds like a good future episode. So maybe put a pin in that, and I'll, you know, pick that up with another deep dive hopefully before too long. Let's change gears to just kind of talk about I mean, you know, that was a lot of the the four chapters of your career at OpenAI. Let's zoom out and kind of talk about OpenAI's evolution and then, you know, ultimately leading to your decision to to join this amicus brief. I guess one of the and I'll just give you some big questions that are on my mind. One is, is OpenAI committed to or does it understand itself as being in pursuit of a transition to a recursive self improvement where the AIs take over the machine learning research and ultimately improve themselves to the singularity? That I would love to understand better.

Steven Adler: (1:15:09) Yeah. I'm I'm not sure. I think, like, I would separate out the belief about automating the engineering from the ML research itself. It seems clear to me that there is a belief in automating the engineering. I believe Sarah Fryer, who is the CFO at OpenAI, shared publicly in a presentation recently that they are working on this product, I think called A SWE, know, AgenTik SWE, which is very similar for folks who have read my former teammate Daniel Cocatello's AI 2027 story, very similar to one of the milestones along the way of you get this AI that can do all this software engineering. That said, the type of thing that I would want OpenAI to have done, if it is envisioning going down this path, is to explain specifically in what pace it thinks things will play out and what the bottlenecks are and why it believes this to be safe. I understand that it might do this analysis and not share it publicly. There might be reasons to keep this private. I am not aware of this sort of analysis existing. It felt to me like when I worked at OpenAI, people were kind of taking it on faith that the AI systems would not progress at a pace at which we wouldn't lose control, but that they hadn't really done the work to back it up. And that might well be true, right? There might, in fact, be all sorts of bottlenecks. But it felt like people had intuitions more so than thinking about how a profit motivated actor facing this bottleneck would find a way to navigate around it or do an eightytwenty solution in ways that ultimately might lead to the speed up. I also just do think that there is I don't know how to locate it exactly disagreement, different backgrounds and orientation, but not everyone from the company takes this sort of thing seriously as a possibility at all. Different people from OpenAI will say different things about, Is it in pursuit of AGI, ASI? What does it think the transition from AGI to ASI looks like? I don't know that there's an especially uniform point of view on this. The team that I was most recently on, the AGI Readiness team, one of the projects we were trying to do was to unpack what these different levels of AGI might be to try to bring a bit more detail. When people are talking past each other in conversations about when AGI will arrive or what AGI might be able to do, maybe that's because they are talking about different concepts and we can put a finer point on that. But yeah, I just I have not seen the level of rigorous analysis about what self improvement would look like, to feel comfortable that OpenAI or other AI companies can manage this responsibly. And ultimately, I want someone in the world, you know, it's not me as a private citizen, but governments and international body to not just take it on faith that the companies have done this analysis, because surely they must have because it's important and they know it to be important. But in fact, verifying that they have done it, an audit regime, verifying that the reasoning makes sense, there needs to be something here. At the moment, there's not really anything.

Nathan Labenz: (1:18:30) How far along do you think we are in this curve? I mean, the big update for me in the last week was the o three technical report showed what seemed to me a big jump from, like, 0 or, like, single digit success rate on models being able to essentially replicate pull requests that OpenAI research engineers had created to now we're in the forties for both o three and the o four models. You know, a naive read would be like, that's a huge, huge deal. But then I've also heard some takes that like, well, yeah, but, you know, the sort of test definition or, like, what the goal was is given to the AI, that's obviously a big part of it. So how do you understand, like, how big of a deal it is that we're now in the forties on recent OpenAI pull requests?

Steven Adler: (1:19:21) Yeah. I'm I'm not really sure. I think this is similar, though, to my perspective on folks not fighting the hypothetical and wondering about if this is true, then what. I've seen a lot of posts on Twitter from different folks, including on OpenAI's preparedness team, making a really big deal of the model's performance on internal pull requests on, I think it's called Swelancer, this evaluation of how valuable tasks that it can do in a freelance marketplace. I know many people have the intuition of, Oh, they're just hyping up their own product. This is fake, whatever. Maybe I happen to know a bunch of these people. I don't think that's what it is. But also, sure, like, maybe there's a hype element of it. What if there were a true nugget in it? What would you want to happen in the world at that point? And that's the question that I try to orient myself mainly around these days.

Nathan Labenz: (1:20:21) So what should we do? I mean, we're, I tend to think and by the way, you know, my own data point on this during that GPT-4 period, I watched the public statements from OpenAI leadership pretty closely having, you know, an inside view not the inside view, but an inside view into what capabilities, you know, already did exist. And what I basically found to be the case during that window of time was you could take Sam Altman's statements at face value, and the the main update you should make relative to what he was saying is basically you should subtract the vibe that he was giving off as being in sort of a speculative mode. You know, he would sort of say, yeah. I think what we might see in the future with models is, like, x. And then I'd be sitting there thinking, like, I've seen x exactly on a model from from you. You know? And I know you know it too. So if anything, I thought he was like basically saying things that he knew to be a 100% true, you know, with confidence, but just sort of presenting them in a more like maybe kind of frame because they weren't, you know, obviously ready to show all the cards yet. So I'm with you. I don't think that, you know, hype is a great primary driver for what is happening, but now it's okay. So we've dispatched that. So like we've now we're back to 40%. It seems like we may be in the sort of, you know, entering the steep part of the s curve here. And I wouldn't be shocked at all if it was 80% within this calendar year. So, you know, that strikes me as a big deal. Seems like it seems to you like it could very well be a big deal. Like, what what should we do about it?

Steven Adler: (1:21:56) Yeah. I I'm not sure exactly what to do. I think they're so part of how I understand what happened is in 2023, I think the world, including the AI labs, were actually pretty ambitious about the type of legislative agenda. When Sam Altman, CEO of OpenAI, testified before Congress, he talked about a licensing regime, essentially, for the training of frontier models. He's recently said, you know, he no longer thinks that's the right approach. It's probably not politically tenable, at least not in The US, for various reasons. I understand that. I'm surprised by how quickly the world has backed from this ambitious worthwhile idea to basically accepting that we will have voluntary practices from the companies, voluntary commitments that often the companies don't, in fact, keep and might not publicize when they don't keep, it seems that there's a significant middle ground. And so one thing that I want the world to do is to figure out how to make careful, cautious safety not be a competitive disadvantage. Today, I think as an AI company, if you don't rush through your safety testing, you are at a competitive disadvantage because the other AI companies are rushing through, or at least you have this fear that they might be. It creates a really nasty race dynamic where everyone's worried that they will be undercut if they take their time. I wrote a post on my Substack recently exploring this idea of should there be a minimum testing period so that you as an AI company can reliably take your time safety testing your frontier models without worrying about being undercut. It's far from a panacea. There are a lot of things that would need to be worked out. There are other ideas that maybe would be better. But this idea of, can we figure out what the floor should be on safety testing in terms of the time you allocate, the number of people, the amount of compute, what threat models you test for, how you test them, and get some minimum floor in place seems really important to me. The EU general purpose AI code of practice, which is coming out relatively soon seems to me like the most likely piece of force of law with actual consequences to happen in the near future. I'm not sure exactly how this will interact with the companies. It's not my field of expertise. I would expect that if there are real teeth to it, many of the companies will either try to lobby against that and influence it otherwise, or decline to sign or do something with their jurisdiction to not release certain products within the EU's sphere of influence to try to not have to comply. I think in The US, SB 1047 was a really important crack at some of these problems, and I was really disappointed with how OpenAI ultimately came out against SB 1047. I think a lot of the reasoning that its executives used in explaining why they were against SB 1047 did not really hold. At a broad level, I would direct people to Zvi Mashvitz's summary of SB 1047 if people want to understand in more detail. Essentially, companies training really, really large, expensive frontier models would have needed to put on record a safety and security plan that they said they would stick to in terms of testing the model. And if they later caused a catastrophe with the model and it was found that they did not behave reasonably, for example, maybe they didn't stick to their plan, they could have been held liable for this. And so if we don't want really broad brush, you must test your model for at least x time, a standard way to do something different is this market risk approach. Let companies make their decisions, but hold them liable if they behave unreasonably. OpenAI came out against SB 1047. It seemed to me that OpenAI implied, We won't support this because it's a state level bill. We think this should be done at a federal level. Personally, I don't believe that they would have supported a federal version of SB 1047, and so I was pretty disappointed by that. In practice, if you look at the types of policies that OpenAI leadership is now calling for, I think this is pretty far from calling for a federal SB 1047.

Nathan Labenz: (1:26:18) Yeah. Maybe just, you know, a big picture question is like, what do you think is the right way to think about OpenAI leadership today? We've obviously seen these sort of, you know, self contradictory, you know, position changes over time. And of course, you know, we learn and we grow, but some of them seem like pretty, pretty striking. People are quick, I think, to latch on to like explanations that to me seem like way too simplistic or just like don't ring true. Like, oh, it's all about the money for them. That doesn't ring true to me. Then, you know, some people say, oh, it's all about power. And I'm like, maybe, but that is still doesn't quite seem like quite right to me either. But there is something pretty striking when it's like the European Union, you know, not a small market, might wanna put a little bit of guardrails on. And, you know, they haven't done this yet to be fair, but we've seen some of this, right, where, like, you're then just gonna, like, yank the product from Europe, all of Europe. You know, I mean, that doesn't seem like you're trying to do the original thing, which is, make like, sure we're benefiting all of humanity here. Right? Like, it wouldn't have been a huge deal to to actually just comply to reach 500 million people. So I'm confused. Like, what how do how do you think about what OpenAI leadership and maybe need to define, like, who is that group in today's world? But, like, what do you think they want?

Steven Adler: (1:27:43) Yeah. I I guess if I back up for a moment, you know, when I joined OpenAI, I took the nonprofit charter very, very seriously. And, you know, maybe this was naive of me, but I really, really thought that the organization meant these things. You know, when I interviewed with OpenAI, there were questions about the charter and what, like, what drew you most to it and what parts you agreed with and disagreed with.

Nathan Labenz: (1:28:09) Yeah. What's your financier clause of our charter?

Steven Adler: (1:28:11) Yeah. Yeah. No. Like like, actually. And, like, I had interviews where I talked about merge and assist and like, how cool and inspiring this was that OpenAI said, if there were, like, a reasonably enough value aligned organization very close to AGI, that it would look to team up essentially, instead of race each other. And that's that's, like, complicated in practice for all sorts of reasons, but I really felt like it meant this motivation. Similarly, the idea of having the nonprofit retain control and the fiduciary of the OpenAI nonprofit being humanity and the mission to benefit all of humanity with AGI rather than the shareholders. That is part of what concerns me about the attempted conversion to a for profit. I'm a little unclear how to refer to it these days because OpenAI is making these points of the nonprofit will continue to exist, and it will be well resourced, and so the nonprofit is not going anywhere. I think that's just kind of hiding the ball on the issue. The issue is fundamentally, does the nonprofit retain control over the for profit? OpenAI, in its own words, is building the most important technology since electricity or something to that effect. I think the question is, are the interests of humanity are those best served if the group governing the most important technology since electricity is legally accountable to humanity and the nonprofit's mission, or if it is legally obligated to protect the interests of its shareholders, the fiduciary interests as a for profit corporation. To me, the answer is obvious, right? It seems to me that if the nonprofit weren't putting any constraints on the for profit's behavior, or weren't believed to be putting constraints, then it wouldn't actually matter to remove control of the nonprofit. But the reason that OpenAI is seeking to remove control from the nonprofit is because the nonprofit, in fact, does play some moderating influence on what types of actions it will pursue. Anyway, that is a long digression to the question of what I think is motivating OpenAI leadership. I'm not sure. I understand why there is a lot of personal intrigue and posts about certain executives and what matters to them. Way that my former boss, Miles Brundage, likes to put it these days, and I think he's totally right, is we need to get to a world where even if you don't trust individual people at an AI company, or in fact, even if you actively mistrust them, that you can still verify that they have safe enough practices at a certain standard that we feel good about relying on as a society. That is more my orientation. That said, I think part of what is happening at OpenAI is they are just perceiving, I think correctly, that in today's state of affairs, they can't really coordinate that effectively with the other Western labs, Chinese labs, and are taking actions that they think make sense for them unilaterally, if you assume a world where nobody gets together and coordinates. One thing that I want to be different in the world is right now OpenAI and the other AI companies are taking these essentially, are defecting on others' actions, but everyone is kind of defecting. Right now, people are kind of papering over that with, I think, rationalizations of our practices are safe enough because we run our tests continuously or continually or every so often, right? Things like this that try to make claims that they are being safe enough. I would prefer if the companies were just clear about what I think their actual views are, which is actually, there's a lot of risk in this. And we don't really want to be rushing ahead, but we just can't stop it. And so given that everyone else is going to rush ahead as well, we are going to as well. I think it would be a tremendous win for public discourse and public understanding If the AI companies were more forthcoming about this, that they are trapped in a really, really bad equilibrium, and don't necessarily want to be doing the things they are doing. I totally understand they are not going to do this, or at least most of them won't. And there are good reasons not for doing it. Nobody wants to admit that they are defecting or making the optimal choice under really awful conditions. It's politically unwise a lot of the time to say such a thing. I really, really hope they are at least saying privately to governments and regulators that that is the case. You know, I don't hold my breath on it too much. Like, I don't think it is happening, unfortunately, but I really, really hope that it is.

Nathan Labenz: (1:33:13) So should I read that as you saying that you think that open eye leadership is unhappy with the current situation and just, you know, is playing the hand that they feel they've been dealt?

Steven Adler: (1:33:26) It at least some of them. Like, it would surprise me if folks at OpenAI had no actions that they thought were, like, better from a safety perspective to take and just felt like they couldn't do them. They are managing a really, really complicated business and geopolitical operation, and there are all sorts of important partners. Microsoft, other compute providers, you can imagine who the different stakeholders are who have different interests and they might be upset to wrinkle. This is not anything specific to OpenAI, this example I'm about to give. But for example, the AI companies are really dependent on goodwill from NVIDIA for shipment of future chips. An AI company, even if they thought, Boy, we really should increase our export controls on leading chips between The US and China, they also correctly anticipate they will probably pay a diplomatic penalty for saying as such, at least saying it publicly. And, you know, that is different than whether they think tighter export controls would be good in principle, or if every AI company in the Frontier Model Forum came forward and said, this is the right thing to do so that none of them paid a competitive penalty for doing it. But if you're OpenAI or Alphabet or whoever and I should also be clear, it's possible some of them have said things about this publicly, in which case, I think that's good and virtuous. I'm not fully up to date. But I think if you are the first one to say something like this, you should anticipate paying some penalty for even feeling it out. You are making yourself vulnerable to your rival flipping it on you. OpenAI could say to the other frontier model forum companies, Hey, should we come out and make a collective statement on this? Someone from Alphabet could run to NVIDIA hypothetically and say, OpenAI is trying to crack down on you. Kind kind of a weird example because of TPU, GPU type stuff. But, anyway, like, you do not want to be making yourself vulnerable by being the first to take some of these safety considerations seriously, and I think that's a really unfortunate state of affairs for the world.

Nathan Labenz: (1:35:44) Yeah. In the AI 2027 scenario, one of the things that really strikes me is that we get this sort of basically discontinuation of public releases while the company internally just goes harder and harder at making more and more powerful models and this sort of gap between which has always been a little bit of a gap as there probably should be so testing can be done and so on. But this gap between what is publicly, not just what is publicly available, but even what is publicly known at all and what actually exists, like really starts to widen There's just, like, a very few people in the know. And that seems to me like quite a not great scenario. I guess questions there would be like, how open is OpenAI internally? You know, back when you started, I would assume that, like, it was pretty free and open and everybody kind of knew what GPT-3 was about and, you know, whatever and what big training runs were happening. Correct me if I'm wrong. My sense now is that there's already much more of a need to know basis. And I wonder if you think it is plausible that we could be headed for and, you know, with 4.5 coming off the API, I don't wanna overread that too much, but that to me seems like it could be a leading indicator because Sam Altman did literally say, like, you know, we've got a lot of models to train, and so we might pull 4.5 down because it's pretty compute intensive. So this could start to seem like the beginning of this divergence of like, okay. You know, you guys will satisfy yourselves with o four mini. Well, meanwhile, we go and, you know, train who knows what, like o five maxi or whatever the case may be. And I guess I wonder, like, how many people even internally would know that in, you know, today's world or in the not too distant future? What's your thought on that sort of possibility of a dramatically widening gap and, like, very, very closely held secrets?

Steven Adler: (1:37:45) I think it's pretty spooky. So Apollo Research put out a report recently on internal deployment, and it kicks off with this point that the most powerful AI systems in the world, when they come to exist, are likely to be used within an AI company for all sorts of sensitive uses without necessarily being known by the public. And that that, I agree, seems bad. One of my concerns in writing this minimum testing period piece was, will it delay when models become known externally, they're still being used internally for sensitive uses in the meantime? The way I try to square that circle is we should separate when a company has a new leading frontier model versus when it begins to use it for non testing purposes for internal deployment. I think it's important to do meaningful safety testing before you pull your model off the rack and start using it for sensitive uses. In terms of the number of people who know, yeah, like, definitely, these companies have become tighter over time. There have always been some level of access controls to things like model weights. But certainly information has become more siloed over time. And my perspective from having worked on AGI readiness at OpenAI is even with the privilege of being inside the organization, sometimes it was hard to tell what exactly was coming off the rack at what time and what it was going to be capable of. The more need to know you make algorithms, capabilities, like how systems work, all these things, you do put even the safety staff within the AI companies at a disadvantage. To be clear, some of these practices have improved over time. When OpenAI first shifted to tighter information controls, they were really broad, because that's all we really had the ability to do, and they've become more fine grained over time. I think that's great. But I think we should imagine the number of people within the company, especially not just pure capabilities researchers, knowing exactly what is going on, to be very, very small. And, you know, if you don't hear objections from people within a company, you know, saying anonymously publicly that there's a big issue, one way to read that is there's not an issue. I think the more correct way to read that is a general prior on, like, this person might not know. They might not have access. You you just are going to be pretty behind the curve unless you are one of the people principally working on advancing the frontier.

Nathan Labenz: (1:40:32) Yep. How about a little lightning round on some just kind of OpenAI culture issues? What happened with the super, super alignment team? There there have been, like, literally conflicting statements in public from different people associated with it, obviously. What what's your perspective on what happened there?

Steven Adler: (1:40:55) I don't know that I have special insight here. Like, I take Jan, like, at his word, and his tweets felt, like, pretty raw and real to me. And so I would just defer to what he has said. I know there's speculation about, is it purely a compute thing? Is it bigger disagreements with the philosophy? Jan's accounting of it, where it's kind of like a bit of everything getting worse over time seems truthful and true to experience to me to his experience.

Nathan Labenz: (1:41:29) How about this, you know, legendary story of Ilya leading these sort of meditative sessions where people are chanting feel the AGI or something like that. And this sort of general pattern that I feel like I've observed where it seems like there's a lot of, like, embodied wisdom and sort of almost Buddhist style detachment, or maybe not detachment, but sort of, sometimes they call it, like, high performance mindset. You know? I feel like there's a vibe that I'm getting from a lot of open AI people that's, like, very similar to what they tell the NBA three point shooters to do. You know? Don't worry whether the last one went in or the next 1. It's just you're all a 100% in the moment and trust the process. And I feel like that is sort of emanating from various corners of OpenAI, and it's something I'm a little concerned about because I'm like, I'm not sure that generalizes super well from, like, you know, making putts on the pro tour or, you know, making three pointers to doing frontier AI research. But how big of a cultural force do you think that sort of thing is?

Steven Adler: (1:42:33) I didn't experience very much of it. Like, definitely, I think Ilya always did a really great job of helping people feel the stakes of what we were building in a way that isn't always clear to every person working at OpenAI. The profile just has changed over time. It's gotten much larger. It's hard to do onboarding for that many people that really focuses on what are the stakes and what is alignment? And I I think it would be an important area for the company to invest more in. Yeah. I don't know. I have not gotten as much of the, like, contemplative studies type thing within my time there.

Nathan Labenz: (1:43:19) Okay. Good to know. Well, you mentioned the profile shifting. I also wanted to ask about sort of the researcher profile. It strikes me that, like, five years ago when folks like you were joining, the world was obviously very different. Prospects for AI were very different. And people like you did it because you were aligned to the mission and, you know, saw the potential of what all this could be. Now I sort of wonder if, like, the people that five years ago were just, like, super good at math and were maybe going to hedge funds or whatever are now going to OpenAI because this is, like, the place that pays top dollar for the best recent math grads. And maybe those folks have, like, a much you you could imagine they might have a more narrow view of just like, let me solve technical problems. That's all I really care to think about. And maybe in the process, the sort of, you know, holistic, like, readiness framework has kind of, you know, fallen out of scope for for people that are actually doing the most, frontier work. Does that ring true at all?

Steven Adler: (1:44:19) Yeah. I'm not sure. I mean, I think, like, one big shift in the company over time is certainly when I joined the product company aspect was an afterthought. It was to get capital to be able to fuel the broader nonprofit mission. I think over time, that has shifted. An interesting metaphor or an interesting story about this. When I joined, the common thing that we were told during onboarding is OpenAI is not just a research lab. It also is a product company or it also has a product arm, something to that effect. At some point, this just totally flipped. There was a big safety off-site maybe halfway through my time working at OpenAI. One of the speakers in opening up the off-site said, OpenAI is not just a product company. It's also a research lab. I was just kind of blown away by the flip in this. I did a count. There were maybe 60 or 70 people in the room. I went through and said, Who actually here worked at OpenAI before it was a commercial business? Who was here before GPT-3 was deployed? Which, you know, doesn't include me. I joined after the GPT-3 deployment. I think of the 60 or 70 people in the room, there were like four people there who had predated the business arm. And so it's understandable that it's a different cohort of people.

Nathan Labenz: (1:45:42) Yeah. Okay. Again, lightning round kind of questions. How do people feel about OpenAI partnering with Andoroll, and how do people feel in general about, like, explicit weaponization of OpenAI's technology?

Steven Adler: (1:45:59) I do not know in the case of Andoroll. Certainly, company has had, like, angst internally about changes to its policies around military use, and not everyone at the company agrees with them. I think those I'm actually not sure on the specifics at what point, if ever, OpenAI has said that it would do weaponization type stuff. I would imagine it's controversial, but that there are also people within the company who think it is, for example, like, very virtuous to work on behalf of the US military. And, you know, there are disagreements with that point of view.

Nathan Labenz: (1:46:37) Okay. Next question is one that I, you know, just wanna preface by saying, I mean, no disrespect at all to anyone involved. But conspiracies are flying on the broader Internet about the untimely death of one hopefully, I'm saying his name correctly, suit your biology. And my guess is that the answer will be no, but I just wanted to ask, do you think people at OpenAI take any of those conspiracy theories at all seriously?

Steven Adler: (1:47:07) I think no, but still, like, the weight of what everyone is grappling with is real. Like, I I had already left OpenAI at the point that, Suthir's death became known or possibly when he, in fact, died. I'm forgetting the exact timeline. And it's, I mean, it's super, super sad and tragic. There was definitely a moment where I felt, like, vaguely uneasy or something, but I never thought that anyone specifically would do anything to, like, bring physical harm to me. It's just it's really uncomfortable when someone who has spoken up about important issues dies. I I think it's just, like, really, really sad and a poor policy state of affairs to even need to be asking these questions. You know, when I tweeted about having left OpenAI and expressed fear about what the future might come in the stakes of AI, there were people advising me to declare publicly that I would never harm myself. And, like, I think that is totally unnecessary. I was not specifically worried about that. I think it's, like, really, really bad that we are in an information environment where people who might otherwise come forward about things need to consider this at all. Like, that that is really tragic. And, of course, Sushir's passing is also really tragic.

Nathan Labenz: (1:48:34) Yeah. No doubt. But it's it's I'm glad to hear that you, you know, have never worried for your own physical safety. How do you think OpenAI team members feel about being protested? Not too long ago, somebody, like, chained themselves to the, you know, the door or the fence or whatever around the office. Does that kind of stuff register at all? Or do people just think, oh my god, you know, these people are crazy?

Steven Adler: (1:48:56) Yeah. I mean, I actually worried much more as an employee about terrorism type stuff working at OpenAI than I have about, for example, you know, harm for speaking out after leaving the company. It not not specifically from Pause AI or protesters per se, but just knowing this is like a really, really controversial, weighty set of things that the company is doing. Many people disagree. Many people in the world are not well. And what will they do to express that? To some extent as well, the AI models are basically magical Ouija boards. Sometimes they are sycophantic in terms of they amplify things you tell them and tell you what you want. If someone's already in a bad headspace, it's easy to imagine what can happen. I think most employees honestly were not very aware of this civil disobedience type of protest aside from messages from the security team about, hey, you know, there's an active demonstration outside this building, you know, try to avoid it if you don't need to be there, do whatever alternate means. But I don't think it was very top of mind for people.

Nathan Labenz: (1:50:10) Gotcha. Okay. Three more if we can. Sure. Okay. Is there any prospect there have been a couple interesting commentaries recently, I think, about the especially if you buy this model of, like, gradual handoff of the engineering and maybe eventually the research from the human team to the AIs themselves. Then there's the idea that like the research team itself is sort of in a position of declining power. Like right now they have power, but in the future they might not have so much power. Is there any prospect for a sort of class consciousness of AI researchers such that, you know, people could sort of use this moment now to say this, you know, sort of reassert perhaps the value of the charter from within?

Steven Adler: (1:51:02) Yeah. I think the question of how, like, labor power at these labs changes over time is a really interesting one at the point of AI automation. It seems to me like one of the biggest impediments to employees sharing their views or, like, helping take certain actions is just not really understanding correctly what other people at the company think. My former teammate Richard Ngo wrote up a really interesting analysis recently of, in this case, coups, but political change more generally. Like, what are the factors that contribute to these happening? And it seems that uncertainty about what other people believe is a really big factor. And so at at OpenAI, I just think it's gotten harder to be candid with other teammates or other people in the organization over time. You know, everyone has somewhat different information. There are all these different information control tents, so you need to be kind of tight lipped. Once upon a time, when it was a smaller, more like trust by default organization, there were ways of anonymously raising concerns to other teammates, and you could kind of see what people thought through that sort of process. But over time, understandably enough, that's not really an option anymore. And so I wonder how good a model people at OpenAI have, even of their teammates, let alone people in the broader organization.

Nathan Labenz: (1:52:28) Yeah. Interesting. Okay. Well, we've kind of touched on it, you know, and you've done a great job of emphasizing the values along the way that, you know, that brought you to the organization. And these are, you know, very much at the core of this amicus brief that you've signed on to. Maybe just give us kind of the pitch that you and 1one other former OpenAI team members are making to the court as to why this sort of nonprofit to for profit conversion shouldn't happen?

Steven Adler: (1:53:02) So I I can only speak for myself, and so these are my personal views in general would would defer to the actual brief as filed. I think the gist of it is OpenAI promised nonprofit control over this incredibly significant for profit entity that it was building. And it relied on this promise in various ways. Various other parties, you know, relied on it when making decisions like whether to join OpenAI, or like how to think about what actions it would ultimately take in the world. I'm pretty concerned about giving up the nonprofit's control, and it's not clear to me that there is a reasonable price that could be paid to adequately compensate for it. You know, it's not to me a question of, well, you know, if the the valuation just went up by a bit more, maybe then the nonprofit can do more prosaically good things in the world related to AI and education, or AI and science. The control is really, really important for the fundamental mission that the organisation is pursuing. I think it's telling that certain groups want to make a change so that OpenAI is accountable just to its shareholders rather than the original mission.

Nathan Labenz: (1:54:21) Yeah. That I find that quite compelling to put my cards on the table. Yeah. I don't know that there's any, you know I mean, that that pretty much says it all. So I I don't know that I have any big follow ups there, but the control p just to emphasize again, like, the control piece is really key. Right? I mean, this the whole charter thing was put in there for good reason. The whole merge and assist or stop and assist or whatever exactly, stop competing and assist, whatever that language exactly is. It's striking to me also that they could probably invoke that now in a reasonable sense if they wanted to. Right? I mean, in the charter, it says, you know, details will be worked out on a case by case basis, but a, you know, a sort of representative scenario would be like a 50 50 chance of achieving HAI in the next two years. And, you know, I think we're here. Right? Like, it's

Steven Adler: (1:55:09) Yeah. So, yes, I I agree. Like, this thing feels like it could be imminent. In OpenAI's defense, I think an important part of that is, is there another AI company that would be willing to do the merge and receive assistance as well? OpenAI either can't really do it unilaterally or has good reason not to want to just totally do it unilaterally. I understand that their situation is a little bit more complicated than that. At the same time, I just wish that it were more possible for the companies to cooperate on stuff like this. They each look at the situation, they're like, Oh, yes, it is bad that we are racing each other. Not from an anti competitive perspective, from people being physically harmed in the world as a consequence of our race.

Nathan Labenz: (1:56:06) I mean, bracketing the anti competitive, like, legal restrictions that might prevent such a thing, it seems to me, like, very clear that Google would happily buy OpenAI for $300 million,000. So is there really a I mean, when you say, like, there's not necessarily another company or whatever, like, if the goal is to limit again, the charter says, like, we are concerned about late stage AI development becoming a degenerate race. And if that is the situation that we're in, then like merging with Google would be one way to like mitigate that. It doesn't solve everything, but like it seems like that option actually really is on the table if they would sincerely wanna do it. Right?

Steven Adler: (1:56:49) Yeah. I mean, I have I have no special knowledge about any of these negotiations or anything, whether they've happened or not. It obvious to me that Alphabet would buy OpenAI for $300 million,000 but again, maybe I shouldn't be fighting the hypothetical. There a value on the table that one of these AI labs could bid to pay for the other that, they would both find acceptable? Like, maybe. And I guess that just brings the question of whether it in fact should happen. It's tough. Right? Like, I would rather there be fewer players in the race than more. Like, I think each new entrant just adds to the complexity of coordinating and destabilizing, and safety talent becomes spread more thin. I also noticed that I I I do feel some of that impulse of the, like, is it actually an anti competitive play? I get why that is a real concern to be grappled with. Often, when there are big corporate acquisitions of this type, they are not, in fact, pro socially motivated. The also also by corporate law. Right? They don't strictly have to be. Right? But I get why people would be suspicious of this.

Nathan Labenz: (1:58:03) Yeah. Well, the concentration of power arguments are also pretty compelling in their own right. I I totally agree. Yeah. So I guess final question. Do you have any advice for people at OpenAI or you could perhaps generalize a bit more to, like, people at Frontier AI developers today? Like, what is Virtuous in your mind for them to do?

Steven Adler: (1:58:25) I'd like to see more people within the AI companies pushing in directions of being clear about practices and commitments. One thing that Anthropic does that I think is really great is they actually have a specific part of their website where they list out the different commitments they have made. I think this makes a really nice bright line of if something is on this webpage, it is in fact a commitment. If it is not on this webpage, it is not in fact a commitment. This allows people to be really clear on what Anthropic specifically has committed to and whether or not they follow through on it. I'd love people who raise their hand within the AI companies to say like, Hey, this seems really important for us to do. I've prepared a first draft. What do we need to do to make this known? And then similarly, pushing from the inside for the company to keep to its word or at least loudly proclaim to the public if it is needed to change its commitment. I think there are a bunch of things to be done. In my Substack, which I'd encourage people to check out my name is stevenadler dot substack dot com. I write a lot about practices that I think the AI labs should be doing, generally aren't doing, are generally cheap enough. Often one of the limiters in getting those projects to happen is just, is there someone within the company who is willing to raise their hand and take it on and push for it to be a thing? They're often known how to do. It's just everyone's really busy and spread thin. And so being a change from the inside, picking up more of those projects, I think is really, really great and virtuous.

Nathan Labenz: (2:00:02) Yeah. Definitely. We'll we'll link to the sub stack in the show notes, and there's, you know, several quite interesting posts there. We didn't even get to, although we could now, if you wanted to, talk about task specific fine tuning as a a testing paradigm. Totally up to you and your your time available, but I thought that was quite interesting and folks can either hear a teaser from you now or we can just send them to the blog as you prefer.

Steven Adler: (2:00:28) I I think the thing that I want people to take away from posts like this one on my Substack about, like, investigating which AI companies have said that they will do this specialized form of fine tuning testing and which are actually doing it, is that often there's a gap between what companies have said they will do today and what they are in fact doing in practice. This doesn't have to be a malicious, malevolent thing. I think there is a big diffusion of people who work on material like system cards, and they say, We are going to do X, or We did in fact do Y. People should just read those statements and not rely on them 100%. Sometimes people are mistaken or are describing different concepts by the same name. So this this, again, is part of the push toward, I want companies to have specific practices that they are required to follow rather than us relying on their word and self descriptions because, unfortunately, sometimes those descriptions are not reliable.

Nathan Labenz: (2:01:35) Yeah. Okay. Well, this has been great. I really appreciate it. And I think you're doing a great public service by helping people understand the, you know, specific company of OpenAI and the frontier AI companies more generally and the sort of mollicky situation that they find themselves in and why even despite some good intentions, things may not be necessarily headed in the positive direction that we'd all hope to see. Any other closing thoughts? Anything you wanna leave people with or anything just we didn't touch on that you'd wanna make sure to mention?

Steven Adler: (2:02:11) No. I I think that's it. Yeah. Thank you so much for having me on. This was a fun conversation.

Nathan Labenz: (2:02:16) Yeah. Likewise. Steven Adler, thank you for being part of the Cognitive Revolution. It is both energizing and enlightening to hear why people listen and learn what they value about the show. So please don't hesitate to reach out via email at tcr@turpentine.co, or you can DM me on the social media platform of your choice.

Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools

Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools

Zvi's Mic Works! Recursive Self-Improvement, Live Player Analysis, Anthropic vs DoW + More!

OpenAI's Identity Crisis: History, Culture & Non-Profit Control with ex-employee Steven Adler

Watch Episode Here

Read Episode Description

Full Transcript

Transcript

Read next

Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools

Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools

Zvi's Mic Works! Recursive Self-Improvement, Live Player Analysis, Anthropic vs DoW + More!

OpenAI's Identity Crisis: History, Culture & Non-Profit Control with ex-employee Steven Adler

Watch Episode Here

Read Episode Description

Full Transcript

Transcript

Read next

Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools

Zvi's Mic Works! Recursive Self-Improvement, Live Player Analysis, Anthropic vs DoW + More!

AI Scouting Report: the Good, Bad, & Weird @ the Law & AI Certificate Program, by LexLab, UC Law SF

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn