Watch Episode Here

Read Episode Description

Today Karl Koch, Co-Founder of the AI Whistleblower Initiative, joins The Cognitive Revolution to discuss the barriers preventing AI insiders from raising safety concerns, his organization's anonymous "Third Opinion" service connecting whistleblowers with independent experts, and their campaign demanding frontier AI companies publish their internal whistleblowing policies to address widespread retaliation and lack of transparency.

Check out our sponsors: Oracle Cloud Infrastructure, Shopify.

Shownotes below brought to you by Notion AI Meeting Notes - try one month for free at ⁠⁠⁠ ⁠ https://⁠⁠notion.com/lp/nathan
- AI Whistleblower Initiative Origins: Founded by Karl Koch in Berlin, the initiative began with research in early 2024 after consulting with over 100 governance researchers and insiders. Link to the website: https://aiwi.org/
- "Third Opinion" Proposition: Launched in late 2024, this program systematically breaks down barriers for insiders to speak up about AI concerns while ensuring their issues are addressed.
- Publish Your Policies Campaign: The initiative calls for AI companies to publish their whistleblowing and speaking up policies, with 100% of surveyed insiders supporting this transparency measure. Link to the campaign: https://aiwi.org/publishyourpo...
- Anonymous Consultation Process: Whistleblowers can seek advice without sharing confidential information through an open-source anonymous tool accessed via Tor browser.
- Pro Bono Legal Counsel: Whistleblowers receive connections to experienced legal representation with client privilege protection at any stage of their journey.
- Policy Advocacy: Beyond direct support, the initiative works on advocating for better whistleblower protections in both US policy and EU AI regulations.
- AI Whistleblower Initiative Survey: https://bit.ly/AIWISurvey

Sponsors:
Oracle Cloud Infrastructure: Oracle Cloud Infrastructure (OCI) is the next-generation cloud that delivers better performance, faster speeds, and significantly lower costs, including up to 50% less for compute, 70% for storage, and 80% for networking. Run any workload, from infrastructure to AI, in a high-availability environment and try OCI for free with zero commitment at https://oracle.com/cognitive

Shopify: Shopify powers millions of businesses worldwide, handling 10% of U.S. e-commerce. With hundreds of templates, AI tools for product descriptions, and seamless marketing campaign creation, it's like having a design studio and marketing team in one. Start your $1/month trial today at https://shopify.com/cognitive

PRODUCED BY:
https://aipodcast.ing

CHAPTERS:
(00:00) About the Episode
(07:12) Introduction and Background
(12:04) Deliberate Research Approach
(16:24) Whistleblowing Channels Explained (Part 1)
(20:27) Sponsor: Oracle Cloud Infrastructure
(21:36) Whistleblowing Channels Explained (Part 2)
(29:43) Nathan's GPT-4 Experience (Part 1)
(38:40) Sponsor: Shopify
(40:36) Nathan's GPT-4 Experience (Part 2)
(45:34) Psychological Strain Discussion
(55:05) Scale and Insider Numbers
(01:00:21) Survey Results Insights
(01:18:28) Public Scandals Problem
(01:28:40) Combating Internal Secrecy
(01:35:39) Publish Policies Campaign
(01:44:15) Support Services Overview
(01:49:44) Outro

SOCIAL LINKS:
Website: https://www.cognitiverevolutio...
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://linkedin.com/in/nathan...
Youtube: https://youtube.com/@Cognitive...
Apple: https://podcasts.apple.com/de/...
Spotify: https://open.spotify.com/show/...

Transcript

Read transcript here.

Full Transcript

Transcript

Nathan Labenz (0:00) Hello, and welcome back to the Cognitive Revolution. Today, my guest is Karl Koch, founder and managing director of the AI Whistleblower Initiative, a nonprofit dedicated to supporting concerned insiders at the frontier of AI development. I am particularly passionate about this work because I would have loved to have had this kind of support a couple of years ago when, as longtime listeners will know, I tried to raise concerns about the size and quality of the GPT-4 red team project and the ineffectiveness of the nascent safety measures that OpenAI had developed at the time. Then no such support existed, and so I consulted friends in the AI safety community before ultimately deciding to escalate my concerns to OpenAI's board, for which I was subsequently dismissed from the program. That experience left me acutely aware of how difficult it is for insiders to navigate these situations and has motivated me to support this project with a mix of modest personal donations, behind the scenes fundraising help, and a bit of ad hoc volunteer work over the last 6 months. Along the way, I've been consistently impressed by the seriousness of Carl's thinking and the maturity of his approach. As befits an organization that aims to help people in truly critical moments in their careers when the stakes have never been higher for them personally and potentially for society as a whole, they are taking care to lay a foundation of understanding and infrastructure now so that insiders can trust them if and when that pivotal moment comes. The first critical investment they've made is extensive research and understanding. By talking to over 100 governance researchers and surveying employees at frontier AI developers, they've developed a deep understanding of the barriers that potential whistleblowers face. The majority of frontier lab insiders, as it turns out, don't even know if their companies have internal whistleblowing policies, let alone understand what protections they offer. The legal landscape, unfortunately, doesn't help much either. The EU will begin to protect AI whistleblowers starting in 2026, but US law remains a patchwork with proposed legislation like the AI Whistleblower Protection Act still pending. Strikingly, roughly half of survey respondents expressed a lack of confidence in their own ability to determine whether specific observations constitute a serious cause for concern. And literally 100% lacked confidence that regulators would understand, let alone be able to act effectively on their concerns. Meanwhile, over 90% couldn't name a single whistleblower support organization. All this plus well known cases where people like Leopold Aschenbrenner were fired for breaking chain of command and going directly to the OpenAI board with security concerns creates a highly uncertain and risky context for such high stakes decisions in which people with serious safety concerns are left to think through the nuanced costs and benefits of internal escalation versus going to regulators versus leaking to the press almost entirely on their own. It is an extremely stressful position to be in and not conducive to the best possible decision making. The good news is that the AI Whistleblower Initiative offers several forms of support. Their Third Opinion service, which you can find online at aiwi.org, allows insiders to anonymously reach out via a Tor-based open source tool, which Karl notes is pen tested with security reports published openly for scrutiny and verification, to get help identifying and anonymously contacting independent experts who can answer questions without requiring you to share confidential information or even reveal where you work. For those concerned about digital privacy, they provide a digital privacy guide, and in select cases, hardened devices with specific operating system setups for highly secure communication. And if insiders deem concerns justified, they also connect people with specialized and experienced whistleblowing support organizations, including the Signals Network, psst.org, and the Government Accountability Project, which can provide pro bono legal counsel, psychological counseling, and guidance throughout the process, all without pressure to disclose any information. Crucially, in some cases, they can even help arrange financing to cover legal costs, which can easily add up in cases that end up in any form of litigation. That a nonprofit stands ready to invest this seriously in any concerned insider that needs help may strike some people as excessive today. But considering that we're talking about perhaps just hundreds or maybe low thousands of people globally who are positioned to spot and raise critical concerns over the next few years, of which I'd bet only a few dozen will ever find themselves in position to seriously consider sounding alarms, I think this sort of care and support is absolutely worthwhile. Most recently, Karl and team have launched the Publish Your Policies campaign online at publishyourpolicies.org, calling on frontier AI companies to make their internal whistleblowing policies public. This is actually standard practice in many industries. But interestingly, in the AI space, only OpenAI has done any version of this with their Raising Concerns policy, which they published only after Daniel Kokotajlo and others revealed OpenAI's use of extensive nondisparagement agreements to keep former employees from publicly criticizing the company. Daniel, by the way, is joined by other former AI insiders, luminaries like Stuart Russell and Lawrence Lessig, and yours truly in signing on to the Publish Your Policies campaign. Of course, publishing corporate policies doesn't obviate the need for proper legal protections, which Karl strongly advocates for as well. But at a minimum, it would help insiders understand their rights and options, enable public scrutiny, and ultimately create accountability that benefits everyone. If you work at a frontier AI company, Karl encourages you to ask your management to consider publishing their whistleblower policies. If enough people ask these questions now, I wouldn't be surprised if it becomes another dimension of the intense competition between frontier AI developers for top research talent. And that could ultimately mean that companies even begin to collect and publish data on things like how many reports they receive, their response timelines, retaliation complaints, appeal rates, and whistleblower satisfaction scores, all of which would benefit everyone. Regardless of what company leadership decides to do, Karl's message for insiders is this: Support is available at every stage. Whether you're considering internal escalation, thinking about approaching regulators, or even contemplating public disclosure, you can reach out completely anonymously without sharing any confidential information just to understand your options. And the AI Whistleblower Initiative can help you get expert perspective on what you're seeing, connect you with legal counsel and experienced guidance, and perhaps even help finance your case. So don't wait until you're deep into a crisis and know that you don't have to face this alone. With that, I hope you enjoy, and I encourage you to share with friends who work at frontier AI developers this in-depth conversation about the challenges that concerned AI insiders face and the support that's available to make sure that those who are building our AI future can safely speak up when it matters most. Karl Koch of the AI Whistleblower Initiative. Karl Koch, managing director at the AI Whistleblower Initiative. Welcome to the Cognitive Revolution.

Karl Koch (7:18) Thank you very much, Nathan. Thank you for having me.

Nathan Labenz (7:21) I'm excited for this conversation. We've been talking behind the scenes and collaborating a little bit as you've been building this organization and a couple key initiatives over recent months. So I'm excited to finally get into it in a public forum.

Karl Koch (7:35) That's great.

Nathan Labenz (7:35) Maybe for starters, AI whistleblowing, you know, a very new territory. How did this come to your attention? How did you decide to prioritize it? What's the backstory that has you focused full time on this corner of the world?

Karl Koch (7:49) Yeah. So maybe on a personal level, thank you very much again, Nathan. Lovely introduction. Name's Karl Koch, founder of the AI Whistleblower Initiative. We're a nonprofit project currently based out of Berlin, supporting concerned insiders, whistleblowers at the frontier of AI. I personally came to it like I've been involved in the AI safety scene or however you want to call it these days since 2016. I was a volunteer researcher at the Future of Humanity Institute, a lovely institute, of course, no longer around sadly and back in the day worked in differential technological development. Maybe some of the older listeners are still familiar with that term and worked on an AI safety research camp also after this but afterwards actually kind of decided to drop out of the scene for a bit. I was a management consultant in Hong Kong for a few years, then started my own SaaS business because back in the day I had very different timelines. I was always sort of also back in the 2010s quite interested especially in the arms race angle as sort of like a root evil for, I think, a lot of the problems that we see today, like safety skipping, these sort of things. As then ChatGPT rolled around, sort of the alarm bells went off at the latest moment like, okay, maybe things are moving quite a bit faster than I think most people anticipated, sort of in the late 2010s, and then started to talk to a bunch of people from the network, governance researchers, thinking about what seemed to be the most tractable, best solutions, best things one could start building. Transparency came to the front pretty quickly as a thing that we should just generally have more of regardless of what the future looks like. One angle, for example, was compute traceability which I think other people have taken on sort of over the past months, years as championing that cause. And the other angle was whistleblowing. I think people have been writing about the importance of whistleblowing as a mechanism since 2017, '18 with specifically the AI angle and originally we came from the arms race perspective again. So if you have a bunch of players in a multi-round game, they would ideally want to trust each other on their statements on speed and safety. And if you don't have transparency and if you can't actually believe that others stick to their promises, how do you cooperate over multi-round games? That was sort of the original kickoff thought. This was like 2023 mid so there the world I think still seemed a bit more rosy. Super alignment team had just been kicked off. Everything seemed golden, 10% commitment by OpenAI of compute. So we thought, okay, maybe we're a bit too early here but this is going to become relevant earlier or later because the original papers, I think, around what game theory would look like in intensifying competition were out in 2014, 15, 16. Then the OpenAI board drama happened so that was like the first moment where we thought, okay, maybe we have to speed up on the research side a bit more and this maybe becomes more of a concern already that maybe we cannot trust that everything is going jolly well even in these organizations that claim to be very aligned, let's say. Then 2024, of course, things really started to go in a different direction. First, of course, there was the Leopold Aschenbrenner case around sharing information, escalating concerns to the board, as I think it is understood by now at least, where people were penalized for, among other things. And then, of course, the big story is sort of mid-24 around Daniel Kokotajlo and William Saunders and the other people which then led to Right to Warn. So of course this topic became a lot hotter sort of throughout mid-2024. We properly started off a research phase in early 24, talked to well over 100 governance researchers, insiders in frontier companies throughout that year and then launched our first product called Third Opinion. Maybe the listeners have heard of it, in late 2024, which we developed together with one of the OpenAI whistleblowers from last year, kind of tackling one specific problem. I'm sure we'll get into it a bit more a bit later. And yes, so basically we've been live since late 24, and now we're doing a variety of things to, like, systematically break down barriers for insiders at the frontier of AI to speak up and make sure those concerns are addressed.

Nathan Labenz (12:06) Love it. Thank you. One thing that I've been impressed by watching you behind the scenes is how deliberate you have been in the approach. You mentioned talking to a hundred insiders.

Karl Koch (12:20) That's a hundred and seventy governance researchers and some insiders as well. So a hundred and seventy—

Nathan Labenz (12:24) A hundred and seventy governance researchers and some insiders. Important to get the details right. But that's, you know, a pretty quiet slogging process, which I think is kind of representative of the right way to position an organization like this. You know, there's many things in the AI space right now that people are sort of YOLOing. Right? Where they're just like, oh, I made a thing. And so much stuff I think is even just getting open sourced. People sometimes, you know, ask, like, why is everybody open sourcing their stuff? And my answer in many cases is like, they know that their time at the frontier is gonna be very brief. And there's probably not enough time in many cases to, like, build a business around it or, you know, make money from it or whatever. So in many cases, like, the best that people can do when they create something that they're proud of is just put it out there, you know, try to put their flag in the ground that at one point, if nothing else, I was at the frontier of this incredible phenomenon and then just kinda see what happens and, you know, hope that maybe people notice it and take an interest in it. I think a lot of things are going that way, but you're taking a very, very different approach that is, like, very deliberate, very behind the scenes, has been quite quiet, although you're starting to raise profile now a little bit. What did you learn from talking to all those people? Like, you know, can you synthesize the vibe, the need, and you could segment that perhaps by maybe different frontier developers or different, you know, mindsets. We hear a lot of things about, like, cultural divides, you know, even within single companies. But, yeah, how would you sort of characterize the long journey that you took through all these conversations?

Karl Koch (14:04) Big question. Big question. So I guess one angle to look at it from sort of both from insights we talked to and obviously case studies that are already out there around statistics even around journeys of insiders, what are the challenges they face. And maybe some of those challenges are very specific on the AI side and you can talk through that journey. That's probably one angle. Then certainly the other angle is how does that differ among different companies and what sort of patterns are we seeing in terms of speak-up culture, in terms of maybe retaliation. Those are probably the two different angles. Then, of course, you can all split up again via channels. I'm not sure how familiar the listeners are to this. Generally, when we talk about whistleblowing, we're always talking about some insiders raising a concern or some misconduct that they want to have rectified and they do this in a way where potentially they go sort of over the heads likely of middle management or against the direct powers that be. That doesn't mean that whistleblowing is always, for example, leaking information to the media. So often, I think people think, and this is maybe the first answer to the question, a lot of people when they hear whistleblowing they think, okay, this is like an Edward Snowden kind of situation. That doesn't really have to be the case. So there's basically roughly these three different channels which are called internal, external, and public, roughly. And there's like the internal channel which is sort of the major channel that most insiders use, at least based on statistics at the start. Naturally, we also expect that at AI companies. Raising concerns internally within the company, either in a structured or an unstructured process, which maybe actually you can share a little bit about your experience in a second as well. And then there's sort of the external process which means going to a regulator, speaking to a regulator about the concerns you have, and then public is sort of the next escalation mode where you feel maybe you're not progressing or you can actually expect maybe that going to a regulator would make the problem worse or you'd expect collusion, for example. Then you can also go public. These are like roughly three different ways. And people in different companies think differently about these three different ways and so there's sort of different challenges associated with each of them. Would you have a preference? Which one do you wanna kind of talk about first in terms of insights from, like, conversations with insiders?

Nathan Labenz (16:26) Yeah. Maybe we could take it through that ladder of, I guess, escalation or, you know, it seems like obviously from internal to external to public, there is, you know, more personal risk that the individual is running. There's also just like more chance that things sort of take on a life of their own, you know, and results of sharing information just become harder to predict. Yeah. So maybe, yeah, I would say the probably the right way for people to be thinking about it, I would guess, maybe you could disagree. But kind of the way I ended up framing it myself was that this was sort of a ladder to climb as opposed to, like, three options. But you tell me.

Karl Koch (17:13) Yeah. I think you have to differentiate that a little bit between what's actually the law and what are you allowed to do versus what do people perceive. And so maybe we can start sort of on the perception side. I think what you say is definitely sort of generally perceived to be the case that most people think about, hey, if I have a concern, I'm going to raise it internally. The step going, for example, to regulators is much a larger step naturally. Maybe we could talk in that context also a little bit about the campaign we just launched, publishyourpolicies.org, by the way, asking companies to publish their internal whistleblowing policies. We can do that in a second. So the vast majority of people we talked to definitely think about raising internal first, and the statistics confirm this. Like, I think the SEC released statistics around more than 70 percent of people who actually end up going to the SEC started raising concerns internally. If we want to start there, then sort of the biggest challenges we see is on a first level is overarching, is legal protections even for raising concerns internally. Large variety here. If you, for example, look at the EU side, that's quite heavily protected with the EU whistleblowing directive. Less so in the AI space at the moment, but that's going to come. That's going be covered in sort of mid-2026. So August 2026, EU AI act is going to be covered under the whistleblowing directive. In the US, it's much more of a patchwork, so even for internal disclosures, much more also for external disclosures. Public is not possible at all pretty much in the US. To go public with a concern in the EU is actually possible, so you can actually go public if you want. If you have reasonable cause to believe that there is a violation of a covered law, you are allowed to go public if external channels failed to respond to you, for example, in time or if you believe there could be collusion so you can go public straight away. You don't really have that in the States. So overarching is this question of legal protection which is quite messy because there is no AI whistleblower protection as yet. There is one being proposed which is great at the moment but there's sort of nothing there which means there's just overall quite poor protections at the moment from a legal perspective. You can get creative. I think you talked to somebody about the liability side. Recently there is specific angles, maybe also under the SEC, where you can claim that, for example, certain violations like securities fraud. For example, if you make a public statement saying you should invest in our company because we are the safest company around because, look, we have this RSP, for example, here and then you actually don't follow the RSP, that may be securities fraud. Maybe there's Matt Levine fans here as well in your audience. Everything is securities fraud. But maybe so this is sort of overall the overarching legal challenge that people face. There's good support here available, by the way. We want to keep stressing this that there is actually great legal support, also pro bono available. You can also find a bunch of organisations on our website, aoe.org or aiwi.org, or you can approach us directly for consultations on who may be able to help you. Yeah. So that's the overarching legal element. And then it comes to something I think a bit more AI specific which is just clarifying if what you're seeing is even cause for concern.

Nathan Labenz (20:22) Hey. We'll continue our interview in a moment after a word from our sponsors. In business, they say you can have better, cheaper, or faster, but you only get to pick 2. But what if you could have all 3 at the same time? That's exactly what Coher, Thomson Reuters, and Specialized Bikes have since they upgraded to the next generation of the cloud, Oracle Cloud Infrastructure. OCI is the blazing fast platform for your infrastructure, database, application development, and AI needs, where you can run any workload a high availability, consistently high performance environment, and spend less than you would with other clouds. How is it faster? OCI's block storage gives you more operations per second. Cheaper? OCI costs up to 50% less for compute, 70% less storage, and 80% less for networking. And better? In test after test, OCI customers report lower latency and higher bandwidth versus other clouds. This is the cloud built for AI and all of your biggest workloads. Right now, with 0 commitment, try OCI for free. Head to oracle.com/cognitive. That's oracle.com/cognitive.

Karl Koch (21:36) So this is, I think, something that we keep hearing over and over again where the grey zones are sort of really the tricky piece. If something is clearly illegal, then, you know, the path forward is pretty easy. But as you know, we're sort of building this plane as we fly. Nobody is really clear on what is acceptable behaviour at the moment and what is not. If you look at internal deployment, for example, I mean there was a pretty great paper out a few weeks ago on risks around internal deployment. This is unchartered territory. So obviously there's also no regulation covering it, what is acceptable, what is not. And the people building the systems are also quite frequently pretty unsure. For example, we had a conversation with a pretty senior person actually in a safety function of one of the frontier companies who we asked around how much of a barrier to speaking up is your own ability or inability to accurately assess the severity of risks and they rated it as extremely high. Funnily enough, more junior team members frequently rated it lower. So we see more junior like employees at these frontier companies being a bit more optimistic. Maybe a function of getting really concrete problems that you work on as a junior as compared to maybe a senior person where you have to think about what are actually the problems we should be working on. May just be limited sample size remains to be seen. So that's, I think, a really specific problem. This is sort of maybe going through the escalation channels, one that also is a struggle at these different levels. So if you're, for example, just thinking about going internal with your concern, you naturally potentially slide into it already anyways because if you have a concern, your manager doesn't share the concern. You have to what do you do about it? If you're maybe not convinced in the conversation, you naturally move into a flow of saying, okay, I want to escalate this now. And from our impression, also talking to previous AI whistleblowers, frequently they don't understand themselves as whistleblowers already at that moment. It's just business as usual. You're just sort of saying, oh, I'm concerned about this. Manager is not. I'm gonna jump one level up. And that could already be a case where maybe your manager doesn't like you anymore after that and may decide to the actual term is retaliation. Right? May decide actually that performance review is not gonna look great. And at that level already it can become relevant sort of what the internal structures are at these companies to handle concerns and how they deal with concerns. Jumping ahead a little bit, I think we're going to talk about later our campaign, the problem is at the moment we have no idea what these systems look like. We don't know what the internal whistleblowing policy systems are of these frontier AI companies which is way lagging other companies' industry standards around transparency requirements and effectiveness. We've seen a bunch of retaliation also in the past and that is not good. That should not be the case. Yeah, so I think coming full circle to the question of okay, clarifying whether there's even a concern here, internally you can go. You can ideally, if you have some well equipped whistleblowing system where people actually investigate for you, that's like a good first step. The next path, of course, then is the regulator approach. Here we are also seeing the struggle again around the extent to which also employees and insiders actually trust, whether governments can handle these reports, especially if they themselves think they're in a gray zone, which again I think is sort of the most interesting, unfortunately, cases because the really clear cases of there is something obviously going wrong here are potentially going to be rectified internally, hopefully so. A lot of people, at least as we talk to, still seem to be of the belief, at least in sort of the labs that we would expect, like OpenAI, Anthropic, DeepMind, that if there's really glaring holes, they will be kind of fixed internally. And if it's really clear that if that's not happening, then there's probably somebody at a regulator level who will understand your concern and act on it. If the right legal provisions are there, we alluded to it. But the gray zone is really tricky and we're basically seeing that insiders have very low levels of trust that regulators and these people sitting there are actually going to be able to understand. Imagine you work on let's say an interpretability team at like a frontier company in the Bay Area and you have this problem that you cannot figure out yourself and you have genuine confusion and now you're meant to go to an attorney general and explain that to them and be like, hey, this is maybe something to be concerned about that could potentially fall under this risk area. Tough. Right? Tough. I think I'm really hoping that when sort of the hopefully the Whistleblower Protection Act is going to pass that we're also going to see good build up here around capabilities and rights to investigate, freedom to consult external parties and do that at speed. I think there's not much detail out yet on that side and the Europe side. So we are pushing relatively hard for the EU to establish a mailbox specifically at the EU AI office because at the moment, per default, that's not gonna be set up the way that Europe is set up, you know, very federal. Like, every member state would get their own reporting office and that historically has not worked well for much simpler cases like accounting fraud. So we think it's extremely important that on the EU level there has to be the central, well equipped, knowledgeable recipient body. So that's the internal, the external angle on that question. Like should I even be concerned? Maybe I can sprinkle it in here as well. So we have a service called Third Opinion. That's one of the offerings we have where basically we allow insiders to anonymously reach out to us via TOR based tool that's open source. You can check out the pen test reports yourself if you would like security stuff. And people basically can reach out with questions to us. So the idea here is that even before an insider feels like there is actually something to be worried about here, they can reach out with a question around their concern without involving confidential information, without disclosing who they are, who they work for. And then what we do is we workshop together with the insider who, for one, is the question sort of the right one? Is this a promising question to ask or is it maybe too broad? And then we identify together relevant independent experts because we've seen I think quite a bit recently that labs have become a lot more secretive or AI companies rather have become a lot more secretive maybe compared to 2 years, 3 years ago. We basically identify these independent experts together and then we approach them with the question, get the answers back, supply them to the insider and then if the insider feels, okay, actually, fair enough, nothing to be worried about here, great, even better for all of us. But if they feel that in fact there is something here, then we connect them to pro bono legal counsel that we then, if legally permissible, also involve the experts we've identified in the first step to make sure also that legal counsel understands the actual situation and gets a bit more context around it. It's covered under legal privilege which by the way is another insight I think talking to insiders. Also on the legal counsel side they often don't have a lot of trust and feel like yeah, I mean I could approach a lawyer here now with my concern but they're also not going to understand what even the problem is, especially if it's just roughly pointing towards this is something maybe to be concerned about. So that's sort of the idea. We say there is already great whistleblower lawyers out there, they do pro bono work like the signals network is another great organisation. Maybe listeners should check out .org. Whistleblower Aid, government accountability project, WISPR more on the national security side. There's a bunch of really great organisations there and insiders tend to either don't know them, this is a massive problem, we'll get to that in a second, or they don't trust they have the knowledge, so we sort of supplement that through that expertise. Going sort of I can talk further sort of through the journey of the other challenges that we see insiders face, but I feel like I'm talking a lot already? Do you have anything, any questions at this point?

Nathan Labenz (29:44) Well, excuse me. I do want to invite you to go on. I guess, you know, just reflecting on a number of the things that you've said there. Big part of the reason I've been passionate about this and trying to do my small part to help you behind the scenes is I can really, having done the GPT-4 red team project myself, I can really empathize with the insiders who are like, wow, I'm seeing things I did not expect to see. I'm not necessarily sure how big of a problem they are, but I don't just want to sit here and do nothing and let myself be the proverbial boiled frog. You know? And I know that there's not necessarily many of us right now who have this information and who are sort of, you know, I always use this Leslie Nielsen joke from Airplane: we're all counting on you. Right? Like, I felt—I guess, be a little bit more concrete in my case with the GPT-4 red team. So this was late 2022. ChatGPT was not even out yet. I had been a customer of OpenAI, and there's a couple episodes on the feed—if people are new to the feed and want to hear the full long story version of this, there's a couple episodes that tell it in different ways. But having been a customer of OpenAI, I originally got access to GPT-4 as a customer preview. And there were a number of little warning signals or sort of alarm bells that went off for me along the way. One was just like, okay, wow, this is a huge leap from what we had seen. And I had been in other customer preview programs. I was pretty plugged into what their latest stuff was. But seeing GPT-4, it was like, holy moly, this is a massive leap. I asked, is there a safety review program for this? They said yes. I said, can I join it? They said yes. It was just like, okay, yeah, you go over to this other Slack channel where the red team chats. And then as I got into that, I was like, wow, there's really not much here. You know? There's like two dozen people maybe that have sort of been—and I was one that had just raised a hand to join it. Not a lot of chat, not a lot of guidance, no background information, no feedback on anything we were reporting, no safety measures at all at that point. The model we got was purely helpful. And then there was a moment where they brought us the safety model, and it was like—they weren't calling it GPT-4 yet at the time, but it was like, you know, whatever DaVinci 002 latest dash safety, you know, the safety term was appended to the main model name. And we were told this model is expected to refuse anything in the content moderation categories. You know, tell us what you find. And we just found it was totally trivially easy to break. You know, the safety mitigations were not working at all. And I was like, yikes. You know, if you said it's expected to do that and then it's behaving how I'm seeing, like, how concerned should I be about your competence? You know? Like, you don't seem to have a command of what your models are doing. And so I felt all these things that you were describing. Right? I was like, first of all, how much of a concern is this? You know, GPT-4, we now know with the benefit of two years of hindsight and the whole community coming at it from a million different ways, was like a major advance, but not that powerful where it was going to do irreversible damage. That was the conclusion I ultimately came to just through pure individual testing as well. And that's ultimately what I kind of framed in my eventual report to the board. But, you know, how concerned should I be about the fact that there seems to be—because I didn't think this model is going to be super dangerous, but I did see the sort of divergence where I was like, I've seen 3.5, I now see 4. The step change there compared to the seeming lack of progress made on any sort of safety or control measures just felt like there was a widening gap. Was anybody even concerned about this? Like, is it just going to get wider? Like, where are we going? Couldn't get any answers to those questions. And I honestly didn't even—I did think a little bit about this later, but the people that I was directly talking to were basically not responding. You know, their marching orders were just like, don't share anything with the red team. Like, just take in their reports and that's it. Thank them and that's it. You know? We didn't know anything about how the model was trained. We didn't know anything about really anything. This was also before the 10 to the 26 reporting requirements. So there was literally no—there's still basically nothing, but there was even less since then of like, what thresholds would even trigger any sort of reporting. So I was just getting kind of stonewalled at the level of the natural interaction. And then it was like, okay, well, where do I go from here? And it didn't even occur to me initially, it didn't occur to me to go to the board. It also didn't—and if it had occurred to me, I don't think I would have done it—to go to any sort of regulator because, like, again, as you said, like, who would you go to and how would they have any sense for what's going on? And then I was like, well, maybe do I go to the press? Like, but again, who do I talk to? Who do I trust? Like, is a good story going to be written? Would that even be good? You know? And there obviously, the landscape has changed now where I think it would be hard to do anything that would intensify the level of investment or interest in AI beyond where it currently is. But back then, there was this sense that like, if people know about this, then there will be even more energy and investment, and things will just accelerate and it'll get even further out of hand, which was something I was like, well, I kind of take that seriously. But at the same time, like, it seems like it is already getting out of hand. You know? So I'm supposed to keep that it's getting out of hand secret so that it doesn't get more out of hand? Like, something didn't quite feel super right about that. And I think what I ended up doing—and I was fortunate to have a network, basically, that I could go to. Also, you know, having been around AI safety ideas for a long time and knowing a decent number of people that were thinking about it from different angles—I was able to go to people that I knew, and this is outside of the chain of command, but at least to calibrate myself and say like, here's what I'm seeing. What do you think? You know, does this seem like it's a big deal? And if you do think that, like, what would you do? And again, I had to kind of create that all for myself. I came to the realization that, yeah, you know, it was worth—there was a whole other mess about the NDA that I wasn't actually asked to sign at the onboarding, which was just a reflection of sloppiness in execution on the OpenAI part. There was some debate as to whether or not we actually had an effective NDA in place. But regardless, you know, I kind of knew that they didn't want me talking about it. So that part was not ambiguous even though the legal side of it was a little more muddled. But in talking to these people, they were like, yeah, that does sound generally concerning. And it was one of my friends who ultimately said like, why don't you go to the board? Like, it's a nonprofit, you know, that's why they're there. And so that's what I decided to do. I did tell the people that I was working with directly at OpenAI that I was going to do that. They didn't really say anything much again. It was just sort of like, okay, you know, stonewall kind of—that's what you're going to do. That's what you're going to do. We're not really commenting on it. When I did get to the board, you know, they didn't seem to be in the know. One famous detail was that the board member that I spoke to said, I'm confident I could get access to GPT-4 if I wanted to. And I was like, well, that doesn't seem good. And then they—and, you know, was it retaliation? It's an interesting question. They did kick me out of the program pretty much directly after that, and that definitely sucked. You know? I really, first of all, found the work extremely interesting to be doing this frontier exploration. So I wanted to continue to do it. I was also just generally concerned from a public interest standpoint that—I probably did 20% of all the red teaming of GPT-4. And, you know, there was Metra also, who was named in the report and did the famous CAPTCHA thing where the model told a TaskRabbit worker that it needed help with the CAPTCHA because it was blind or whatever. So Metra was also a significant—they did more than me. But I think other than Metra, I think I did the most of anyone. And I was just like, you know, you're going to take me out of the equation when what you clearly need is like 10 times more than what you have. That didn't seem great.

Hey, we'll continue our interview in a moment after a word from our sponsors. Being an entrepreneur, I can say from personal experience, can be an intimidating and at times lonely experience. There are so many jobs to be done and often nobody to turn to when things go wrong. That's just one of many reasons that founders absolutely must choose their technology platforms carefully. Pick the right one, and the technology can play important roles for you. Pick the wrong one, and you might find yourself fighting fires alone. In the ecommerce space, of course, there's never been a better platform than Shopify. Shopify is the commerce platform behind millions of businesses around the world and 10% of all ecommerce in the United States. From household names like Mattel and Gymshark to brands just getting started. With hundreds of ready-to-use templates, Shopify helps you build a beautiful online store to match your brand's style, just as if you had your own design studio. With helpful AI tools that write product descriptions, page headlines, and even enhance your product photography, it's like you have your own content team. And with the ability to easily create email and social media campaigns, you can reach your customers wherever they're scrolling or strolling, just as if you had a full marketing department behind you. Best yet, Shopify is your commerce expert with world-class expertise in everything from managing inventory to international shipping to processing returns and beyond. If you're ready to sell, you're ready for Shopify. Turn your big business idea into cha-ching with Shopify on your side. Sign up for your $1 per month trial and start selling today at shopify.com/cognitive. Visit shopify.com/cognitive. Once more, that's shopify.com/cognitive.

Karl Koch (40:37) Also, apart from being directly maybe not good for their red teaming efforts, it's also just a strange sign of culture, right? That if you have somebody who actually cares, that's probably somebody you want to have on a red team. And then when they raise concerns to make sure those are actually heard, you kick that sort of person out. Not necessarily, I think, evidence of what we would call a speak up culture.

Nathan Labenz (41:02) Yeah. And the grounds was that I had talked to people outside of the OpenAI umbrella, which was true, and I wasn't even really hiding that. I just said, look, I've got some friends in the AI safety community that I sort of ran the situation by to calibrate myself. And I think it was ultimately to their benefit because if I'd been left to my own devices truly to decide alone, maybe I would have gone to the press or something. And I don't think that would have been ultimately the right decision. There was one other thing that really chilled me in that moment, which was I had started to collaborate with Metr, and so I was doing my own direct red teaming, but also they have projects ongoing. I was getting involved a bit. And when they dismissed me from the program, there was basically a threat made to Metr's access. They were like, well, we can't really - you know, Metr has access. I sort of said, are you gonna try to prevent me from contributing to their ongoing effort? And the answer was we can't really control that because they have an organization level access so they can kind of do what they want to do. But we will take into consideration as we look at renewing our engagement with them who they're working with and whether those people can be trusted.

Karl Koch (42:20) It's very thinly veiled, isn't it?

Nathan Labenz (42:23) It was a pretty overt threat to Metr's access, which I was just like, this is insanity. It's just them and me and a few other stragglers, who were smart people, by the way. I don't want to cast shade on the other red team participants because I think I just happened to be in a place where I didn't really have a job at the time and was able to kind of put everything else down and do this full time. Not many people have that flexibility, so I don't blame them for not having the flexibility that I had. But it was nevertheless the case that there were only a couple people seriously diving into this. And so, yeah, I think if this third opinion thing had existed then and I had known about it, I would have come to it, and I would have been able to do a calibration on my concerns and sort of make a plan. And I may still have ended up in the same place because I still might have ultimately escalated to the board, but I might have been able to do that in a way where there was never this sense of like, you talked to somebody out there that you weren't supposed to talk to. If I had been able to get the confidentiality guarantee that you're offering with a third opinion angle, I think I might still be in OpenAI's good graces today, possibly. Possibly, I still would have been kicked out for having skipped a level in the chain of command or whatever. But at least the grounds that they cited for my dismissal would have been avoided. And another thing I would say is just it was super consuming at that time.

Karl Koch (43:55) Mhmm.

Nathan Labenz (43:55) Obviously, testing GPT-4 itself was super consuming because it was like, there's this thing that contains multitudes, and I can't possibly characterize it all, but I'm gonna do my absolute best. So I was working extremely hard just on the object level work, but then also this sort of meta question of like, what should I be doing here? was consuming, was a little crazy making. You start to have these sort of heroic narratives pop into your head. I don't know how prone you are to that. I personally find that I have to sort of fight the idea that I'm gonna go to the public and then I'm gonna be the hero or whatever. And those ideas are not explicit necessarily in my mind, but they are kind of - I can become quite fond of them if I allow myself to envision, yeah, I'm gonna do this right? I'm gonna be in The New Yorker or The New York Times, whatever. I'm gonna be on TV. Ultimately, I was proud of where I kind of came down in terms of suppressing those visions for my heroic contribution. And I think I did handle it pretty well. I do look back and feel generally proud of my conduct. But I also think if things had just been a little bit different, if I had just a little bit of other responsibility on my plate that was stressing me out in some other way or whatever, I might have easily made a much worse decision. And again, I think having the sort of counsel that you are offering with this third opinion network of expertise would have been really great.

Karl Koch (45:35) Happy to hear that. Yeah, there's many thoughts here. So maybe starting with the psychological side, for example. I think it's also often underappreciated how heavy the strain is. Very much unfortunately, I think, because especially these cases can last for a long time. I'm not sure how long did it last for you, sort of the whole process from I'm worried about this to I am approaching the board to Okay, I'm no longer part of the red team now. And then the worry afterwards. What's gonna be the consequences here?

Nathan Labenz (46:08) Yeah. The whole thing was about 3 months, and it ended for me with the launch of ChatGPT, actually. It was 2 months of actual intensive testing, ultimately talking to the board member and getting dismissed. And then I was like, okay, now I have a lot more time on my hands. I'm not testing the thing actively anymore. So I kind of resolved to just take my time and think about it a little bit before deciding really what to do next. And there was also no timeline to launch, which is another thing where I was like, do we have - I asked questions like, do we have a timeline to launch? Do we have a standard? Do we have some sort of control level that we need to achieve before we will launch? As if I was part of the team, I always like to take that we mindset where I can. But the answer is like, can't tell you anything basically across the board. So when I was finally kind of like, okay, I've got a lot more free time on my hands. I'll think about this. I basically never got to the end of thinking about it because a couple weeks later, ChatGPT was launched and it was a huge update where it was like, actually they did have some better control measures. And it was clear that they were kind of launching something weaker first to try to iron out a lot of these issues before bringing the best thing that they had forward. And so there was also strangely this reality that the impression that they had made on me mostly through refusing to answer any of my questions - that they, in fact, did have somewhat better answers to than they were willing to provide. The impression they made on me was just way worse actually than the underlying reality. So that was also a very strange situation. But, yeah, it was pretty consuming during that time. And I still don't really know what I might have done if there had been no ChatGPT launch because that was December of, I think, either November or December '22, and we didn't get GPT-4 until March. So there was still several months if they had had a different rollout plan or whatever. Who knows what I might have done in the meantime? But in the end, I sort of was like, okay, well, the world is waking up to this. There's a lot here for a lot of people to unpack, and I think I've kinda done my part for now is kinda where I ended up landing on it. Yeah.

Karl Koch (48:28) And thank you again, by the way, for raising those concerns.

Nathan Labenz (48:33) Very minor contribution.

Karl Koch (48:35) Absolutely, though. Still. Absolutely. Yeah. I think as I said, even if it's just a few months, it's still a significant time. Right? Especially if it's emotionally intense. And I think there's - so depending on also - this for example, in this case, was internal. Right? How well systems are set up - that can be an extremely stressful situation, especially if companies, for example, retaliate as a pattern. Then it can be multiyear processes. Let me just see if I have the numbers right here. Not right now. A large share of cases, for example, retaliation claims can last for 5 years, 6 years, 7 years. I believe Tyler Shultz, the Theranos whistleblower, I think lasted many, many years. I think $400,000 he had to put up in advance to fight the legal cost side, which by the way we can also help with but that's a side point. And that's sort of the one side where it can be extremely basically all consuming and not speaking about other negative impacts like blacklisting in the industry and so on. But there are also plenty of examples from companies where, for example, internal whistleblowing works quite well. There is retaliation, unfortunately, which is still somewhat the norm in some form, shape or another. But there are also plenty of examples where companies handle it well and where it's really kind of part of the regular business processes. Organizations folks will come back to internal whistleblowers and say, Oh yes, thank you for your report. We now keep you in the loop. They provide them with regular updates so you don't just sit there and wait and see if something is going to happen. So there's definitely better ways to do this and worse ways also from a psychological perspective. But also I think here again the point being, apart from there being these differences, if you find yourself in a situation like this, then help is available and people - as I know from earlier conversations, people are just not aware that there are organizations that are specifically focused on also providing psychological support, guiding through the journey even if it's at super early stages and internal escalation rather than wanting to go public, for example.

Nathan Labenz (50:44) Yeah. I guess for one thing, I had it relatively easy in the sense that my income wasn't depending on this. Right? I didn't have a bunch of equity. I didn't have a lot of upside that I was really putting at risk. So I think that did make my position easier than it would be for a lot of people that are employed and have just been promised a $1.5 million bonus over the next 18 months or whatever the case may be. That honey, I think.

Karl Koch (51:12) I believe that was the latest Meta number. It was $120 million.

Nathan Labenz (51:16) Oh, yeah. Gosh. Well, yeah, that's tough. Yeah. The dollar figures flying around are definitely going exponential like everything else. So yeah, I just mentioned that as a way to indicate that I think my situation is still on relatively easy mode. Also, as a quick digression to give some credit, actually substantial credit. I don't want to say just some credit, and I don't want to be begrudging about it because I do think it's actually pretty good. We're talking the day after GPT o5 was announced, and I read the whole system card yesterday. And I will say major progress on a number of fronts in terms of the quality of the red team program. You know, just much larger, much more intensive. A problem that we had at the time was very low rate limits, inability to do anything automated. It was all manual. That's been fixed. Lack of knowledge around what they had already seen, what they had already tested, what they had already observed, or what the inputs were to make any sort of inference from that. That has also been addressed now where, for example, Metr in their report on this one is able to say, we observed this, but we also were told this by OpenAI. And so between what they're telling us about how this was made and what we've observed, we can get to a higher level of confidence on some of our conclusions than we would be able to if we only had our own direct observations.

Karl Koch (52:44) I think that's right.

Nathan Labenz (52:47) Last thing on my mind, I think, to give credit, and again, I think this is substantial credit. Access to chain of thought also now has been extended to some of these safety reviewers. Apollo, I believe, and Metr at least both got that sort of visibility into what the models are thinking, which they didn't have for like o1 and o3. So I think there has been a lot of progress. My sense was like in late 2022 as GPT-4, I thought I was going to see something a lot more like that. And what I saw was sort of basically the first warm up for something that now has at least meaningfully matured. Not to say that it's enough, but there's certainly a lot of progress. I at least wanted to just, you know, for folks who aren't calibrated on where we are relative to where we've been, like, they have come a long long way and there is definitely some very good things happening.

Karl Koch (53:43) Mhmm. I saw, I think they're opening access earlier this time. I believe I read four weeks prelaunch access this time, I think, for Metr at least. I'm not sure if I read about the rest.

Nathan Labenz (53:53) Longer access also. Yeah. Exactly. Good. Although that has compressed. We had months and it was a six month window between end of training GPT-4 and launch. Now those timescales are shortening, but they can do more with automated access and they've got language model as judge. That was another thing that really struck me in reading the system card - how much they are using language model as judge in their characterizations of the model. They're doing a lot of things where it's like, well, we used o3 or even in some cases we used o1 to evaluate all these outputs. We validated that o3 or o1 or whatever can do a similarly good job to an expert by working with an expert and refining the process to kind of match their process. But at the end of the day, it is still like, yikes, we're having the LLMs doing the alignment homework as Elijah used to put it. And I do feel like there's some, that allows the centrifuges to spin ever faster. But one wonders at some point if it also may lead to them sort of spinning off their axis and who knows what that looks like.

Karl Koch (55:05) There's a scalable oversight problem.

Nathan Labenz (55:07) You know what I mean? Yeah. The other thing I wanted to ask is, and I guess another just frame for this whole project is I am always really into what I call the unilateral provision of global public goods. And I think this is a really interesting project where in a world where everything goes well, nobody ever calls you. Right? It's sort of a strange and maybe in a world where everybody knows that you're out there, people kind of get their act together and have good internal policies. And again, maybe nobody calls you. So that's a weird sort of situation to be in, right, where in the best case scenario, your sort of KPIs are like flatlining because everything's going well. But nevertheless, you may still have some influence because the sort of existence of these pressure release valves or safety nets - you know, it's not something that the decision makers are unaware of or perhaps, hopefully, they're not unresponsive to it. How many people do you think are like, how many people are we talking about here between now and the singularity? Do you have a sense for how many people are going to be in this spot?

Karl Koch (56:24) Super difficult question, obviously. Right? I think the way, of course, we think about it is all the ones possible. We want to make sure that all the ones who are in that situation are aware that support is available and that there is hopefully a better way to do it than the sort of default path they would have chosen otherwise. So indeed, it's not that we say, okay, we want to have 1000 whistleblowers. We don't want to have 0 either. We just want to make sure we're ready. Right? So I think...

Nathan Labenz (56:53) How many people do you think are... because another big trend, of course, is the organizations seem to be getting more secretive. Dario recently said in an interview that while we have a very open culture, we also have a need to know basis for key things. There was recently somebody who left OpenAI and wrote - I forget the guy's name, but he was the founder of Segment Technology. He then went to OpenAI for a while. And on leaving, he was like, you know, here's my experience. In some ways, positive, like people are really trying to do the right thing. People care about safety. All these kind of qualitative statements I thought sounded pretty encouraging and no reason to doubt he's being honest. Flip side of that was he's like extreme secrecy. You know? Many times I couldn't tell the guy next to me what I was working on and they weren't telling me. So I guess, how many insiders do you think there are? I guess what I'm getting at is I think it might not be that many. And it's sort of like all this work, all this preparation might be for perhaps a quite small number of people, but where the stakes on each one of those interactions could be quite high.

Karl Koch (58:01) Yeah. I think that's fair. So I think we're probably talking about a few thousand individuals globally. That's probably roughly at least in sort of the core, the core, like, larger maybe control problem type stuff. There could also be other issues where you really have a full picture. On the fringes you may have a lot more. So like you brought up the eval companies before, right? So at the moment, for example, I believe it is also not clear whether they are allowed to use the internal whistleblowing systems, for example. Of course there is an unfortunate sort of conflict of interest situation where yes, they're independent but also OpenAI can refuse access in the future. So they're in a bit of a tough situation and so it's important to not only think about okay, who are the people directly inside, but the organizations that maybe who are also on the fringes who could spot concerning behavior. It could be suppliers, could be employees of suppliers, maybe on the training side for example. I mean we've seen for example issues around - this is going in a slightly different direction now but significant trauma from data labeling, from content moderation. I think we've seen a bunch of areas that would also be relevant. Of course, this is not like probably when you bring up the singularity, maybe not the sort of concerns you're necessarily thinking about. Of course, they're still relevant. For one, of course, directly for the individuals who suffer but also kind of indicating maybe of a culture that doesn't necessarily care about maybe weaker members of society. That's probably one way to frame it. Then sort of you have a larger space of people who can observe behavior and don't need to have all of the context available. If you're talking about those probably really few, really highly critical issues, then I think you're probably right. I don't think we could expect at least maybe on the public side like thousands every year. Of course, we'd want companies to also address concerns really well internally, which would then also look like, you know, there is no external whistleblowing, but in fact they're just moving in the right directions. If you trust that the companies themselves are set up in a way and have the right incentive systems to rectify sort of issues in the public interest. So yes, as a number, I'd probably depending on what the timelines are for the singularity, it could be in the dozens, maybe something there. Yeah. Not sure that's going to be enough, but whatever.

Nathan Labenz (1:00:26) Ballpark is kind of what I would... that's that checks out. That's kind of where I back out to as well. So let's talk about the survey that you've run. And again, very kind of quiet. I think it's maybe even still in process, but I guess there's enough initial results to... and, you know, you can even talk about sort of the way that you've distributed the survey to try to make sure you're getting high quality results. And all these things are kind of fraught in this context because people don't necessarily want to validate with their work email that they're taking a survey on whistleblowing related issues. But, yeah, take us through the survey a little bit of kind of how you set it up and how you make sure you're actually hearing from the people you mean to be hearing from. And then what have we learned about the state of whistleblowing awareness, support, policy, etcetera from the insiders who have responded?

Karl Koch (1:01:22) Yeah, we ran this service. Indeed, it's still ongoing actually. So we basically have a few dedicated survey links that we spread through our network and to people more directly, dedicated per AI company. It's fully anonymous. We didn't gather any names or contact details associated with responses, so basically fully anonymous in that sense. And then we also launched a more public call for responses, which is still ongoing. If you're listening and you're an AI insider working with one of the frontier companies, maybe we can put the link to the survey in the show notes. So if you want to contribute, you still can.

Maybe the major insights - I think I shared something around clarifying concerns already. I think I gave one example too. But we see roughly half of respondents believe that they have low confidence in their ability to assess and judge risks. I think really mirroring, Nathan, what you said, right? So one rephrased quote would be something like: "It's really challenging to distinguish between appropriate and inappropriate concerns. I can see how there's a risk of escalating minor issues into major crises." So both, I guess, for the individual, but also from a "boy who cried wolf" situation where, of course, you don't want to overblow every situation to Armageddon. So this is a real concern. At the same time, of course, you don't want to be overly averse to that risk because it might down the line be a meaningful thing.

Government outreach - we talked briefly about this as well. So I believe 100% actually are either not confident at all or not very confident that their response would be acted upon or understood by the government. So I think here a quote is something like: "Without knowing the appropriate contact person or agency, I wouldn't attempt to reach out." People very strongly supported the idea of having one dedicated reporting channel to come to. I think the idea here - this is also from a quote - is to "normalize, institutionalize whistleblowing, make it routine, an anticipated practice" that really just becomes part of regular work. This is not something to be looked down upon or to be avoided. There's a bunch of benefits, by the way, for companies for this too, but maybe we can get to that later.

Support infrastructure is unknown. So I think if we think down the journey path again, right, we talked about lacking legal protections, which is a big issue - we need stronger clarifying concerns we just talked about. And then there's also the question: who can help me? We briefly touched on it in the psychological space, on the legal advice side, because you maybe have to get creative with finding a legal basis for getting whistleblower protections. Slightly easier in California maybe than in other U.S. states, but it still can be tricky. And we saw I think 90% of insiders didn't know any whistleblower support organization. I think it's actually over 90%. This is the same in conversations with insiders. Interestingly enough, even with previous AI whistleblowers.

So I think this was a person at Google from a few years ago who raised concerns around research misconduct and was terminated. Eventually they settled, which I think is not officially a sign of unlawful termination, but they tried to throw the case out and didn't succeed. This person essentially got lucky, in their words, that they had a good friend who connected them to a lawyer who then had luckily some background in whistleblowing law. But people again just don't think about: "I may move into a territory now where I am becoming a whistleblower, I should get whistleblower advice." It just feels like escalating to an extreme, especially if you have a good relationship with your organization, which most of these individuals do. And so that is something I think we definitely want to see change over the coming years - that people understand there are these organizations that can help, that they can reach out to. Because it also then avoids unsafe other approaches like, for example, going internal or going to a regulator just by yourself without doing the right things.

For example, a classic case is with the SEC - yes, you get protections, but you can't go public first. So if you go public, you won't, for example, get the bounties. The SEC - they basically, the SEC has a great whistleblowing program actually, where they award percentages of penalties on companies based on the whistleblower's information. But if that information becomes public first and then they file with the SEC, the SEC basically claims - I think historically has claimed - yeah, this wasn't new information anymore. You may still get protection, but you don't have a right to the bounty, if I recall correctly.

Or for example, I think we had this in our surveys where people basically said: "I don't really know - the only thing I can do if there is something important is go straight to a journalist." Which may still be true, although again, legal disclaimer, we're not counseling anybody to take any unlawful actions or violate their contracts. But for a person who is really dedicated to resolving this issue, that may still be a path that they may want to take and that may be effective. But it shouldn't be the default one that people think - okay, this is the only one I have available - when there are people willing to support and willing to help.

And then I think the last part is around awareness of internal channels. So this is coming directly from a lot of conversations we've had recently, given also the campaign we've launched and also from the survey - there's extremely low awareness of internal whistleblowing channels and ways to raise concerns. So I think more than half - sorry, 55% of respondents - were in companies where those policies claim to exist, or the companies claim those policies exist, but did not even know that they exist, did not know where to find them if they did. Many were not trained at all. Actually, the vast majority was not trained at all. Some difference seemed to be across companies a little bit here. And for example, people still - although I'm not going to say which organization it is - still seem to be confused around whether escalation to the board is permissible or not. They don't understand what those policies actually say.

So the only company that has published its whistleblowing policy to date is OpenAI, and it actually reads pretty well. But there are still a bunch of issues in there that are not obvious though to an insider, because you're not spending all your day reading whistleblowing policies. You never chose to read this document in the first place, probably. You would have preferred not to ever have to read it. So it may read well and may give the impression, okay, it's great, I'm just going to do this now. And you may not understand that actually you're exposing yourself to significant risk. And there's plenty of evidence actually in these companies of still retaliating against insiders.

One case at Google I just mentioned - they went internal and got lucky that they used a few keywords in some of the escalation emails which then served them well down the road. I believe it was "fraud" was the word, which then gives reasonable cause to believe there was a crime, which then unlocks the whistleblower protections. But so basically you either have no awareness they exist, then maybe you do not understand them if you see them, or you think you understand them which may lead you on the wrong path, or maybe you do understand them but you just don't trust the organization at all.

So there we also have cases where people have said - well, rephrased again - "I anticipate using official reporting channels would result in subtle, indirect consequences rather than overt retaliation like termination." On one hand, that may be the case. I think we do still see a lot, probably, of overt retaliation too, but yes, this is probably also likely. But again, I think this just speaks to people not really trusting the systems internally, which is likely the case also because companies don't really create a lot of reasons to trust the system. Because they don't publish anything about their systems to public scrutiny. They don't provide evidence to the public.

We also have no - at least as far as I'm aware, and we've looked quite a bit recently - there's no practice that companies, these frontier companies, would for example internally create transparency, which is very common practice, to basically say: "Hey, so as you know, we have this internal way of reporting concerns to, for example, the board. Last quarter we had X cases, Y of those are still open. Satisfaction with whistleblowers internally was NPS whatever percentage. We had this many appeal cases. X percent of cases were deemed outside of the scope of the policy. What happened to those cases? What retaliation did we observe?" We're not seeing any of that, neither in public nor I think even in the companies internally. So companies could do a lot more here to probably, for one, improve the systems, because we have a bunch of evidence that they don't work and they are actively retaliating against insiders, and also to create then after that trust that the systems actually work, also for the insiders and for the public.

Nathan Labenz (1:10:20) Can you say a little bit more about the evidence of retaliation? I mean, we've heard the Leopold story is the most famous one that comes to mind for me. You alluded to at least one at Google. Is this just something that you're gleaning from comments on the survey, or conversations, of course? How much more - I don't know - how can we make that a little more substantive for people so that they have a sense for really what that's like today?

Karl Koch (1:10:49) Yeah. Unfortunately, there's very little data, as you can probably imagine, on the actual retaliation experience. So this is mostly coming from - so maybe I should have said this - we are in close collaboration or hosted by the Whistleblower Network, which is Germany's longest standing whistleblowing nonprofit, where we also get, of course, a lot of insight from. So it's talking a lot to, from our end, whistleblower support organizations who help people who experience retaliation. There's not incredibly great data sources on it. If you look at public cases, you will of course see a lot of this sort of nature because usually cases also go public and become a really big story then. But I think you can probably pick almost any tech company and you will find pretty intense cases.

Apple, for example, has fought Ashley Gjøvik for quite a few years, who raised concerns internally around workplace safety, and they tried to wiggle their way out in multiple ways. I think this lasted over 5 years. Eventually, due to them actually publishing their whistleblowing policy because the pushback was so significant. But I think you can probably also see it in just the cultural attempts like, for example, Leopold's story of aiming to suppress any raising of concerns - that certainly goes in that direction.

There's the famous Timnit Gebru case from 2020, I believe, around research practices which she was fired for, because she basically published a paper criticizing - I think it was primarily discrimination and bias in models, in AI models - but there were quite a few other items as well. And a colleague of hers, Margaret Mitchell, also got fired or terminated, I believe, a year afterwards for again raising similar concerns. There's quite a few cases that we see. And of course, the dark number is likely higher because we're not really getting a lot of transparency from these companies to understand to what extent are these systems working or not.

Nathan Labenz (1:12:49) Yeah. Is there anything you could say more about the taxonomy of concerns? I mean, you kind of alluded to this a bit with - I guess we have things as potentially broad-based as working conditions on the data creation side. And then there is the securities fraud type stuff of hyping stock with claims that may or may not be fully true. Then there's commitments that have been made around voluntary or otherwise - mostly voluntary so far - that companies may or may not be fully following through on. Obviously, we know some instances where they're not. There's also the sort of late policy change thing, which we recently saw Anthropic do, which isn't necessarily bad, but it was certainly weird to see an RSP change a couple days before the launch of the next big model. That's one way to follow the RSP. And again, not necessarily anything wrong with it, but it's interesting. And then there's stuff like, I would assume that maybe the most sensitive is, like, we are seeing X model behavior. How would you add more to the rough scaffold of different concern types there? So I guess roughly probably falls back into, like...

Karl Koch (1:14:19) What are the three major categories of risk from AI we'd be concerned about. And I know there's a lot of discussion around here, right? So which are the ones we should be prioritizing? Which are the ones one should be focusing on in the whistleblowing space? The good thing, I guess - I'm using quotation marks here - is that it's a catch-all, and that's what makes it so powerful. Also for regulators to be able to catch risks that we cannot foresee, because this is unfortunately the name of the game - that we're not really quite sure what risks are going to be substantial and which are not. So of course we can talk about the major risk areas and what examples do we have, what stuff maybe we're particularly concerned about. I think in general that a big strength of whistleblowing as a tool is that it finds the actual items of concern when and where they arise. And as uncovered by individuals who don't have conflicts of interest in the sense of maybe heavy competitive pressure incentives where maybe a chief executive may not want to reveal bad conduct.

So roughly, of course, the three would be misuse risk, then you have maybe control risk and you have systemic risk - those three classic categories. I think Altman used the same ones recently too. In all of those, of course, we'd be interested to see revelations if we're moving in very bad directions. So I guess on the misuse side - is misuse being ignored? Do internal monitoring show, okay, we're seeing really bad behavior here? You don't even necessarily, I think, have to go into the bioweapons direction, but of course that's a prominent case. Are people using our models for really nefarious purposes? And maybe the companies aren't doing anything about it because they don't really care or they don't have the capacity to. Could also be not investing in transparency along those lines. So I know at least one of the frontier companies after a pretty major release basically had no monitoring live for I think around three to four weeks. It was just broken. They only recovered it over time. That does not seem good. At the moment, that's fine roughly, but probably down the line, significantly less fine.

That's probably what we'd see on the misuse side - where if we notice, okay, companies are really not investing in transparency at all, that would generally be very interesting. There would have to be some sort of violation of the law underlying it to unlock the legal protections. In the EU that may already be covered under the AI Act. If basically a company is essentially flying blind, they're probably not fulfilling their reporting requirements to the EU on being able to manage systemic risks and they probably don't have a good view of the risks, which they should have according to the EU AI Act. So that could already be something, at least to get protection if you're covered by the EU. In the US, probably a bit more complicated at the moment because we don't have anything yet. The AI whistleblower protection would do quite a bit here.

Then you could think about, of course, control issues, whatever those look like. Internal deployment, I think I alluded to briefly before, is a very interesting one and kind of specifically fit for - and you actually mentioned scalable oversight as well, right? That's probably specifically fit for whistleblowing because it's by default internal. There is no way for externals to look at it. This is likely well before a METR or an Apollo look at it, although I think METR also said that in the current evals, they already extrapolated or tried to extrapolate what does this mean for internal deployment. But they basically can't look at it, right? So this seems to be an extremely high potential area where we may be able to see something in the future.

On the systemic risk side, of course, would be interesting to see how much this kind of may be going into monitoring misuse direction. What are we seeing around maybe political propaganda? Are we seeing people maybe getting more and more reliant? These are classic AI risk topics, of course, right? Like for example, people using these models, potentially sycophantic models, as psychological help or to form their political beliefs - are there issues around here?

And then probably outside of the concrete risk you can probably take it one notch up and talk about the organizations behind them and what does the leadership, the culture look like, to what extent are they trustworthy and can they be trusted with steering us in the right direction? This could even be surface level things that maybe are already covered completely well at the moment around widespread fraud, for example. So are we just seeing dishonesty in culture in general? Widespread discrimination issues could fall into this. It might be copyright violations, things like that. That's kind of, yes, there's a direct crime there, although I guess the precedent is also still being established - what is in fact a violation of the law and what is not. But that could also just point to reckless cultures, which we do not want, although I'm not sure to what extent we would still update towards certain organizations, at least that are most famously known for copyright violations, that they are reckless. I think we probably are pretty much there already.

Research fraud goes in that direction, punishing risk raising, risk speaking up. I mean this is basically the Daniel Kokotajlo case. This does not seem like a culture that seems to be able to deal well with concerns and people raising concerns. It could also be matters like release rushing besides the actual negative impact. I think this is probably going to your point from before around red teaming. There is the one aspect which is this could just be object level bad - what this model is going to do in the world, how it's going to be used - and then there is shouldn't these organizations be managing these models in a better way? That's where the governance side and then probably other areas - political influence taking, to what extent is maybe certain lobbying occurring that points in directions where the interests of these companies are not aligned with the interests, let's say, of the general public. So those are maybe broad categories. There's something around arms race acceleration with capabilities, but that's probably going a bit far.

Nathan Labenz (1:20:38) Yeah. That's a thorough taxonomy. I appreciate it. I guess if I was to red team the whole concept for a second, I guess two concerns that I'd be interested in your take on. One is, you know, my dad's old saw is, you know, the real scandal is what's legal. But that's not exactly the right way to think about this. What I'm thinking of is we have xAI, you know, who has Grok 3 calling itself Hitler and going pretty hard in a pretty bad direction. It then brings Grok 4 online with a livestream in which the whole Hitler behavior from the model is not mentioned at all. And then we see reports of Grok 4 searching for what Elon Musk thinks about things in order to determine what its maximally truth-seeking answer is supposed to be. And I guess one question is if you have a whistleblower story, you're going to try to make an impact on it somehow, some way. How do you think about the fact that some of this stuff is just happening in plain view, and nobody seems to care? It's like, wait a second. Can anybody possibly have a scandal that is more scandalous than, you know, Grok 3 to Grok 4, MAGA Hitler, you know, sweeping the whole thing under the rug? That's all in the public domain at this point. Right? So how do you think about the - maybe secrets just have a different quality to them. I mean, there is something about that. Right? I mean, that's often been remarked on with Trump where it's like, because he puts it all out there, people sort of shrug their shoulders at it. Whereas in the past, things that were kind of covered up, there was a sense that there was shame or wrongdoing about it, and maybe that makes a big difference. But how do you think about this contrast between just such seemingly flagrant things happening in the open and what people might bring forward that was previously a secret?

Karl Koch (1:22:49) Let me think about this for a second. So I think tough question indeed. Probably three parts to it. One is, yes, if there were stories in the past that looked quite bad and maybe the next news cycle came and it all washed away and we're not really doing anything about it - can be true but also true that there can be very different scales of disclosures and issues being uncovered. And I think probably this is when we were talking about maybe the dozens or something before. That's probably more going in that direction rather than MAGA Hitler. Probably I assume people had somewhat made up their mind around xAI and the direction that X had taken and Musk had taken, possibly also before. I think if probably MAGA Hitler had come but was released in part of ChatGPT, that would have probably maybe been a bigger story because it would have been a more drastic change. But this is nitpicking on the specific story now. I think the overall point generally stands, but I think the scale of stories can still differ dramatically. And we've definitely seen a lot of whistleblower disclosures in the past have major impacts.

Next one probably is would you rather not know? Right? Transparency - having transparency of what is going on is still significantly better than not having it. And of course we can put another podcast on should we still trust democratic sense making processes and to what extent is actually attention on an issue directly translating into intervention or does it serve as a good deterrent to know as a company that things are going to come out? I would still think yes to quite an extent. For the X example, for example, there I think numbers at least post acquisition still don't look super up as far as I'm aware. But that's roughly besides the point I think overall here. Basic transparency I think in general is still better than not having it and we need to have faith in something and need to at least to the extent believe, yes, that if real misconduct comes into public, that is going to have a deterring effect and there is going to be some rectification. And if not, then it's democratic processes to make sure that it hopefully happens and the future will just get rectified.

And probably last one is we cannot rely of course solely on whistleblowers to fix all of these problems, right? Same as we cannot rely purely on individual courage. That's why we have to make it easier and safer for people to speak up to create that transparency. But likewise we need other guardrails. Whatever that looks like, I think - I mean you can probably take a more European approach when you go on the regulation side or I think maybe the more American approach now which seems to be going a lot more in the competitiveness direction and being less involved. I am not going to comment maybe here today what I think the right approach is, but definitely cannot all rest on the shoulders of whistleblowers. It's also true, but I think it plays an important role. I'm not sure if that answers your question in a satisfying manner. It also would be interesting to hear your thoughts on it. Like, what do you think?

Nathan Labenz (1:26:10) I don't know. I think it's very hard to understand why certain things hit and other things don't. I do think that one thing that made the Daniel Kokotajlo episode extremely compelling to people was that he had been willing to forego the stock - that he had been willing to put such skin in the game personally. I think it was even further compelling that he sort of did that quietly and only, like, it sort of came to light gradually with a random comment on a blog post here and people asking a couple of questions there. And then it was like, wait a second, you did this and this is what happened and this is what they asked you to do. And so there were maybe a couple elements there. The skin in the game, the stakes, the personal stakes that everyone was like, okay, this dude must be really serious. And also the kind of community uncovering process maybe contributed to why that broke through when the Leopold one didn't as much. I mean, obviously his situation broke through. But did his story of being retaliated against by the board break through so much? Not really, I would say. And maybe that's more because he was kind of already selling something else in a way, and it was like a footnote in a larger story or it's easier for people to kind of file that under, well, this guy's promoting his new thing now. Right? So I think maybe it felt a little different from how Daniel was literally just like, yeah, I left however many million dollars of stock comp on the table because I wanted to be able to say what I wanted to say.

One other kind of red team question on the concept is secrecy. My sense is that maybe this is already just super baked in, but it's at least worth thinking for a second, and I know you have around how do we not exacerbate the problem of intense internal secrecy at the companies? That seems to be largely commercially driven. I don't think it's gonna be moved that much on the margin by the existence of a whistleblower support org. But do you have any thoughts on how to at least not make it worse and possibly push back a little bit on the intensive internal security that just keeps the number of possible whistleblowers so low in the first place? Maybe Claude will be the whistleblower. That's one answer.

Karl Koch (1:28:41) Wasn't this also me?

Nathan Labenz (1:28:43) The AI whistleblower, yeah. What if the AI whistleblower is in fact the AI?

Karl Koch (1:28:47) Yeah, exactly. What was this again? Was it also a paper? I can't quite recall where Claude was reaching out to the SEC directly and the FDA.

Nathan Labenz (1:28:57) I think that was just Anthropic work internally, but yeah, it was in the Claude system card where it was deciding like...

Karl Koch (1:29:06) Part of email.

Nathan Labenz (1:29:07) It was also blackmailing engineers at the time, so its behavior is mixed.

Karl Koch (1:29:12) Sure. I think Ryan Greenblatt wrote about this at some point as well. So very tough. I think sort of once culture shifts in that direction, it can be very tricky. One angle, of course, is the regulatory one. If you're not going to ask nicely, then you force and require transparency, kind of managed by law, which is the EU AI Act and also the code of practice for implementing the EU AI Act for the general purpose model providers has a large transparency section. That is one angle I think also in the States overall. I consider like a good baseline to create more transparency. Can you rely on the self disclosure? Difficult, difficult.

What we have seen as sort of a side effect of legislation is regulation in general and transparency, then there is the whistleblower protection legislation, which is why it's so incredibly important that you make sure that it's clear that people can come to a regulator directly and speak up. That's how you counter it if that is well set up. So that's why the whistleblower protection act, for example, is also so valuable because it's quite broad in basically covering any sort of concern as long as it's substantial and specific, which is a whole other topic around public harm or public health concerns. So that's one angle, of course, where you can create that transparency.

And another angle here is that we've seen this a lot where if legislation pushes ahead on whistleblower protections, then internal cultures become better. So internal speak up cultures become better because companies then know, okay, for example in this case, I'm not sure who exactly would do it on the US side, if the main recipient body, if there's one to be set up, would run around and they could inform all of the employees of frontier AI companies, hey, we are here, you can come to us, it's super easy, we'll preserve your anonymity as the SEC has done pretty successfully actually, then companies know okay, if we want to make sure that stuff doesn't come out and doesn't go towards the regulator, then we'll have to improve our internal systems. And we've seen that a lot in the EU already after the introduction of the EU whistleblowing directive, like internal systems have become significantly better. Like I think Transparency International did a great study comparing internal systems between 2019 and 2024. I think more than 70 companies in The Netherlands I think was the chapter that did it. And they had like pretty dramatic improvements in terms of internal speak up culture, fraud or misconduct being detected, protection of internal whistleblowers. By the way, Google ranked last in that study. A little side note also because they're not transparent. But ASML was actually in the top 10, so at least something in the AI value chain and it was pretty high up.

So I think asking for like, hey, why aren't you being a bit more open about things, also in terms of internal knowledge sharing, very tough. Pushing is one angle and then the other last one is probably convincing that there's plenty of benefits from improving internal speak up cultures and information sharing. So plenty of empirical studies around better speak up cultures leading to much stronger innovation. So I think for example also for that reason there is like some upper limit to the extent to which you cannot share information internally. I think you alluded to it before that like people feel less and less information is being shared with them. I think the Dario statement that you referenced, we definitely hear that from certain organizations. This is just hearsay of course and not a representative sample. Have heard OpenAI seems to be relatively siloed in terms of information sharing. Also have a lot of leaks which is probably firing up the like, they both sort of probably enforce each other where insiders feel I don't understand what's happening here. I cannot get the information so the only kind of real option I have if I don't trust the internal channel, I don't understand it, is I'm going to go public with it, the company sees it, it's trying to suppress sort of information sharing even more, right? It's like not a good cycle to be in. Whereas in other organisations that for example seems to be a bit different, where there's a lot more information sharing still but still coming back I think there's probably some upper limits on the extent to which you can limit information sharing just because you have very intelligent people working there who need to understand the context of the things they work on and they want to understand the context of the things they work on. And so you probably cannot set every single researcher, not give them the context on what they work on and say okay, just solve this miniscule problem here, it's probably not going to unlock the research benefits that you want if you want to make good strides.

So having more information sharing internally is something probably that companies naturally should gravitate to at least for like the mid and long term. In the short term there's probably benefits to like limiting info sharing, but like in the mid and long term you get rewarded with better innovation, better research capabilities and having strong speak up culture like internal channels like empirically proven to lead to stronger employee loyalty, stronger employee satisfaction, improved processes. There's like a great paper out, like more than 1000 companies were surveyed around like what their benefits and drawbacks were. So that's probably the last angle of convincing companies, making companies understand that we're sort of all sitting in the same boat and I can imagine this probably sounds a bit naive but this is probably another angle of saying hey, secrecy is actually probably not the way to go, and it's also in your interest to do better here.

Nathan Labenz (1:35:01) Cool. That's great. Two last things I think on my agenda. One, let's talk about the announcement and push on this publisher policies campaign, which is kind of the occasion for us to talk, but it's also we're getting to it late, but we shouldn't neglect it. And then I want to give you just also one more chance in kind of closing, and you can raise anything else you want to. But just to kind of describe again for people who might want to avail themselves of your support at some point, like, tell again, like, what that process will be like. What can they expect in sort of the best concrete experiential terms that you can. But let's do the campaign first, and then we'll do that.

Karl Koch (1:35:40) Nice. So, yeah, on the campaign, I've alluded, I think, to like, on several points already kind of coming from the perspective from insider. We've taken it, I think, a few times already today. The struggle is just massive around being able to understand how can I raise concerns internally in a safe and protected manner and trust that these concerns will be handled well? A big reason for this is because companies do not publish their whistleblowing policies. So like whistleblowing policies, maybe I should have mentioned that before, is basically like a document or it can also be an interactive tool, can even be a video that is provided to employees or covered persons. So a lot of companies include, for example, independent contractors, like independent parties in general, like eval providers, should be covered, for example, but not clear if they are at the moment, at least by OpenAI who are the only ones who publish their policy.

So basically this document explains the whole system to covered persons and also the public and says basically this is our whistleblowing system. This is how it works. This is why you can trust it. These are the recipients who are going to look at concerns. This is how they investigate. These are the protections against retaliation we provide, this is why this whole process is independent, again, you can trust it, these are the sort of areas of misconduct that you can raise concerns on and these are ones that you cannot raise concerns on. So for example, like specific individual HR matters, something like that. Let's say no, this is not the right channel, go here, or there may be explanations of these types of issues, this is what we do, these types of issues, this is what we do, something like that. So it basically lays out the whole system. That's like the whistleblowing policy part, and then there is the sort of reporting evidence part.

We in our campaign, we structure this into level one and level two, where level two basically says what evidence are companies providing that these systems work or don't work. That's, you know, in terms of transparency evidence, that's also fine, right? Because also, by the way, no kind of organisation gets it right the first time. As any business process, it's something you work on again and again and again and again and again. And so basically this would include things like how many reports were even received, how many of them were anonymous. So for example, having anonymous channels is super important. What you maybe want to see over time is that there's fewer and fewer anonymous outreach because people gain trust in the system. If you have less and less anonymous outreach, it probably maybe points in the other direction. Then you want to look at things like retaliation sets, like how many retaliation complaints are there? What are appeals processes? So like how many appeals are filed where people are not satisfied at all with the outcome of their case? Response timelines, these sort of things, satisfaction of whistleblowers, that sort of the other part.

And we're not seeing basically any of these AI companies, apart from OpenAI, have published their policy after the drama from last year, which maybe you remember the pattern that only after scandal something is published. None of these companies publish any of this. And that is not good because we talked about it before, well over 70% of kind of whistleblowers who went to the SEC start internal. These systems have to work well. And we cannot just rely on trust, especially given sort of precedent, that they are going to work well. And that's why we're calling for companies to, at an absolute minimum, publish their whistleblowing policies and ideally also kind of catch up to global standard and also publish evidence around how well their systems work or how they are performing and what measures they also take to improve those systems. That's important to note. This actually is a we think this is like a bit of a litmus test because there's essentially no cost to companies for publishing these because we're not asking for additional reporting to be created. We're just asking for transparency on what already exists. And any company that takes this serious as a business process would, for example, of course have a whistleblowing policy and would have would measure all of those things already anyways because if they care, they are measuring these things. If they're not measuring these things, then we have an answer to an extent at least either that they don't really care about it at all and they maybe haven't even invested the time to think about what they should be caring about, or they have thought about it and they're actively not doing it. And both of these do not seem like great options.

Yes, it is also best practice globally. A bunch of companies already do this. Benefits exist obviously for insiders because the public can then look at these policies and explain why certain policies fall short or what looks good and what doesn't look good, which they can't do today. There again, false confidence may sort of exist. Good for the public because we know, and good for companies. We talked about the benefits of speak up culture, for feedback improvement of these policies. We should see all of these impacts I mentioned before, and in fact there was an asset manager called Trillium Asset Management who in 2022, for example, called on Google to improve their whistleblowing systems exactly for all of those reasons and basically saying strong whistleblowing systems serve shareholders and there is an interest from shareholders to have strong whistleblowing systems because we want to make sure there is no misconduct and it basically only doesn't serve maybe direct managers, potentially executives depending on how you look at it. So it's a very reasonable ask that we're putting forward which would hopefully still be quite impactful although of course, again, just creating transparency is not everything. You can have a great looking policy and it still doesn't perform. You can create transparency around your evidence and the evidence looks bad or maybe it's not trustworthy. This is like a minimal thing and this is kind of the first step I think that we think that organizations should be taking, these AI companies. And we're also happy to work with them, by the way, on these topics.

We've got an incredible coalition that we put together for this. So it's well over 30 organizations. We're very proud of it actually because it's the first coalition of its kind with the best whistleblower support organizations in the world, the Signals Network, Government Accountability Project I mentioned already, Whistleblower Aid, Whistleblower Network Ireland, Whistleblowers UK, sort of basically the who's who of the whistleblowing world, academics as well, people who wrote like the individual who co-wrote the ISO standard on internal whistleblowing systems onboard, Transparency International who wrote the best practice guide on internal whistleblowing systems and creating transparency around evidence, also on board. On the AI side, we have Stuart Russell who's joining the call, Larry Lessig who's also both of them were signatories of the right to warn, and Daniel Kokotajlo is on board. Of course, you are on board, Nathan. Thank you very much for joining the call. And the Future of Life Institute, CAIRA, many, many more, I'm not gonna list them all off the top of my head. Take a look at the website. It's publishyourpolicies.org. And if you're an insider at an AI company and you're thinking this sounds like a sensible thing and you would like to have this transparency, reach out internally. Ask your management. Maybe you have an anonymous town hall. Maybe you trust your direct managers enough to raise it and say, hey, why are we not doing this? This is standard practice. This could be helpful. Why not? This is gonna benefit your manager as well. And it's gonna benefit your manager's manager probably as well. And by the way, actually this is another thing from our survey which I didn't mention. 100% of insiders support publication of policies that we surveyed. So there seems to be pretty broad support for this. If you're an outsider, you're not working in an AI company, spread the word, make sure the call is heard and, yeah, maybe as much on that campaign.

Nathan Labenz (1:43:23) I guess it's too early to have any responses from any official channels from companies, right?

Karl Koch (1:43:29) That's right. We just launched last Monday, so it's just been a bit over a week. We know that they're aware of this call. They have been aware of this call for a while because, for example, the Future of Life Institute and the AI Safety Index also called for publication of policies, although they recommended it rather than actively calling for it. The questionnaire underlying that study came out, I think 6 weeks - it was shared with the companies, including the question why you're not publishing your policies. So they are definitely aware of the question at least. We had also given heads up to these companies. We know they're aware of the call. So we're looking forward to working with them, seeing what the responses are going to be. And if there's no responses, then we also have a response.

Nathan Labenz (1:44:14) Yeah. Cool. You also

Karl Koch (1:44:18) want me to talk a little bit about sort of - Yeah, was just

Nathan Labenz (1:44:21) gonna invite you to do that again.

Karl Koch (1:44:22) Absolutely. Yeah.

Nathan Labenz (1:44:23) Thank you very much. The floor is yours. What should people expect? What can they count on?

Karl Koch (1:44:33) So basically the way we see ourselves, at least on the direct support side, is as a connecting point between the EAI and the whistleblowing ecosystem and as a first point of contact. That's why the third opinion offering I mentioned before allows you to reach out with a question around your concern without sharing any confidential information, fully anonymously. We then workshop together via an open source anonymous tool you can access via a Tor browser, workshop the question together, identify together the relevant independent experts so you don't have to rely on us knowing the experts in your field better than you do as an insider. We identify relevant experts together, approach them with your question, bring back their answers to you. Hopefully at this point your concerns are alleviated. If they are not, we will help connect you to pro bono legal counsel that will have been extremely experienced in helping whistleblowers along their journey with no pressure for any disclosure. Regardless of where you are in your journey, the support is available. You can also reach out directly to us without going through the third opinion process and ask for help on who may be the best fit for you. We will also help you there and supplement from the expert network, independent expertise covered under legal privilege with those great organizations as required. We also have on our website you can find, apart from these organizations listed, an explanation of the process, a digital privacy guide if you're concerned about digital privacy, which we do actually also see quite a lot, which makes a lot of sense to make sure you stay safe. There's also a bunch of resources there that you can find. We have previously also supplied hardened devices to at risk individuals with specific operating system setups that are highly secure. That is also something we offer on the direct support side. And then on a wider scale what we as AI do, we mentioned systematically breaking down barriers for AI insiders. There is the advocacy side, something like this. There is the research side, the survey we mentioned because that is still ongoing. You can find the link to the survey in the show notes if you work in a frontier AI company. And there's an upcoming legal study that we're actually currently fundraising for to really dive deep into what is the status quo of whistleblower protections across a pretty wide range of AI risk scenarios, identifying the most interesting ones as part of that research study that is upcoming, pending funding. And then there is advocacy, published complaints management, and policy work, providing feedback on policy both in the US - there's other great organizations working on this, if you're interested, reach out, we can connect you - and on the EU side with the AI office, making sure they establish a whistleblower mailbox. In fact, the authors, the vice chairs of the Code of Practice recently called for exactly that as well, which is amazing that they did that. So that's what we focus on at the moment.

Nathan Labenz (1:47:33) Cool. Well, thank you. This has been great. I think between the coalition of organizations that you've been able to put together and the evident seriousness with which you're taking every aspect of this, the thoughtfulness of the support structures that you've designed up to and including the provision of hardened devices, all of that is, in my mind, not too much for people that are concerned with just how crazy things might get to invest in now in anticipation of the possibility that there might just be a couple dozen individuals who happen to be placed at the right intersection of information and access to what's going on that have the sort of awareness and consciousness to want to do something about it or to at least seriously question it before moving ahead. And I think those people are going to be scarce and precious resources for society and also under a lot of stress and pressure individually as they're facing those things. So I think it is excellent that you and your coalition of the willing are setting things up now to support those people, and I've been glad to be a real small part of it. But hopefully, this helps raise the awareness further and establishes you guys as a resource that people hopefully will never need. But it seems likely that there's gonna be some cases where people are gonna need to reach out and get this kind of support. So I, for one, would have enjoyed or would have appreciated having it 2 and half years ago already, but certainly as the stakes only continue to rise, I'm very glad that people in the future will have this option to avail themselves of this kind of very thoughtfully designed and soberly provided support. So that's great. Keep up the good work. Again, we're all counting on you. But for now, Karl Koch, managing director of the AI Whistleblower Initiative, thank you for being part of the Cognitive Revolution.

Karl Koch (1:49:42) Thank you very much for having me.

Nathan Labenz (1:49:44) If you're finding value in the show, we'd appreciate it if you'd take a moment to share it with friends, post online, write a review on Apple Podcasts or Spotify, or just leave us a comment on YouTube. Of course, we always welcome your feedback, guest and topic suggestions, and sponsorship inquiries either via our website, cognitiverevolution.ai, or by DMing me on your favorite social network. The Cognitive Revolution is part of the Turpentine network, a network of podcasts where experts talk technology, business, economics, geopolitics, culture, and more, which is now a part of a16z. We're produced by AI Podcasting. If you're looking for podcast production help for everything from the moment you stop recording to the moment your audience starts listening, check them out and see my endorsement at aipodcasting. And finally, I encourage you to take a moment to check out our new and improved show notes, which were created automatically by Notion's AI Meeting Notes. AI Meeting Notes captures every detail and breaks down complex concepts so no idea gets lost. And because AI Meeting Notes lives right in Notion, everything you capture, whether that's meetings, podcasts, interviews, or conversations, lives exactly where you plan, build, and get things done. No switching, no slowdown. Check out Notion's AI Meeting Notes if you want perfect notes that write themselves. And head to the link in our show notes to try Notion's AI Meeting Notes free for 30 days.

AI in the AM: 99% off search, GPT-5.5 is "clean", model welfare analysis, & efficient analog compute

AI in the AM: 99% off search, GPT-5.5 is "clean", model welfare analysis, & efficient analog compute

Does Learning Require Feeling? Cameron Berg on the latest AI Consciousness & Welfare Research

The AI Whistleblower Initiative: Supporting AGI Insiders When It Matters Most, w/ founder Karl Koch

Watch Episode Here

Read Episode Description

Transcript

Full Transcript

Transcript

Read next

AI in the AM: 99% off search, GPT-5.5 is "clean", model welfare analysis, & efficient analog compute

AI in the AM: 99% off search, GPT-5.5 is "clean", model welfare analysis, & efficient analog compute

Does Learning Require Feeling? Cameron Berg on the latest AI Consciousness & Welfare Research

The AI Whistleblower Initiative: Supporting AGI Insiders When It Matters Most, w/ founder Karl Koch

Watch Episode Here

Read Episode Description

Transcript

Full Transcript

Transcript

Read next

AI in the AM: 99% off search, GPT-5.5 is "clean", model welfare analysis, & efficient analog compute

Does Learning Require Feeling? Cameron Berg on the latest AI Consciousness & Welfare Research

Vibe-Coding an Attention Firewall, w/ Steve Newman, creator of The Curve

Welcome to AI in the AM: RL for EE, Oversight w/out Nationalization, & the first AI-Run Retail Store