Building & Scaling the AI Safety Research Community, with Ryan Kidd of MATS

Ryan Kidd, co-executive director of MATS, discusses the current AI safety landscape, AGI timelines, and tensions between safety and capabilities, and outlines how MATS structures its research, what top safety orgs seek, and how aspiring researchers can build relevant skills.

Building & Scaling the AI Safety Research Community, with Ryan Kidd of MATS

Watch Episode Here


Listen to Episode Here


Show Notes

Ryan Kidd, Co-Executive Director of MATS, shares an inside view of the AI safety field and the world’s largest AI safety research talent pipeline. PSA for AI builders: Interested in alignment, governance, or AI safety? Learn more about the MATS Summer 2026 Fellowship and submit your name to be notified when applications open: https://matsprogram.org/s26-tcr. He discusses AGI timelines, the blurred line between safety and capabilities work, and why expert disagreement remains so high. In the second half, Ryan breaks down MATS’ research archetypes, what top AI safety organizations are looking for, and how applicants can stand out with the right projects, skills, and career strategy.

Sponsors:

Tasklet:

Tasklet is an AI agent that automates your work 24/7; just describe what you want in plain English and it gets the job done. Try it for free and use code COGREV for 50% off your first month at https://tasklet.ai

Agents of Scale:

Agents of Scale is a podcast from Zapier CEO Wade Foster, featuring conversations with C-suite leaders who are leading AI transformation. Subscribe to the show wherever you get your podcasts

Shopify:

Shopify powers millions of businesses worldwide, handling 10% of U.S. e-commerce. With hundreds of templates, AI tools for product descriptions, and seamless marketing campaign creation, it's like having a design studio and marketing team in one. Start your $1/month trial today at https://shopify.com/cognitive

CHAPTERS:

(00:00) About the Episode

(03:50) MATS mission, AGI timelines

(13:43) Evaluating current AI safety (Part 1)

(13:48) Sponsor: Tasklet

(14:59) Evaluating current AI safety (Part 2) (Part 1)

(28:11) Sponsors: Agents of Scale | Shopify

(30:58) Evaluating current AI safety (Part 2) (Part 2)

(30:59) Safety research versus capabilities

(40:01) Frontier labs, deployment, governance

(51:51) MATS tracks and governance

(01:04:11) Research archetypes and tooling

(01:12:25) Labor market and careers

(01:20:09) Applicant selection and preparation

(01:29:33) Admissions, salaries, and compute

(01:40:34) Future programs and paradigms

(01:54:11) Outro

PRODUCED BY:

https://aipodcast.ing

SOCIAL LINKS:

Website: https://www.cognitiverevolution.ai

Twitter (Podcast): https://x.com/cogrev_podcast

Twitter (Nathan): https://x.com/labenz

LinkedIn: https://linkedin.com/in/nathanlabenz/

Youtube: https://youtube.com/@CognitiveRevolutionPodcast

Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431

Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk


Transcript

This transcript is automatically generated; we strive for accuracy, but errors in wording or speaker identification may occur. Please verify key details when needed.


Introduction

Hello, and welcome back to the Cognitive Revolution!

Today my guest is Ryan Kidd, Co-Executive Director of MATS, the AI safety research mentorship and training program, that with some 446 alumni working at nearly every AI safety organization you can name, is both the largest and by many accounts the most impactful AI safety research talent pipeline in the world today.

Regular listeners will know that MATS has recently come on as a sponsor to spread the word about their Summer 2026 Program, which runs from June to August, and for which applications are open now, with a deadline of January 18th.

But this episode is not actually part of that sponsorship – on the contrary, I've supported MATS as a Recommender for the Survival & Flourishing Fund and as a small-time personal donor, and with a who’s who roster of mentors that includes past guests Ryan Greenblatt of Redwood Research, Lee Sharkey of Goodfire, Nicholas Carlini, now at Anthropic, and Arthur Conmy, now at DeepMind, plus alumni including past guests Marius Hobbhahn of Apollo Research, Jesse Hoogland of Timaeus, and Ryan himself – not to mention papers from MATS fellows popping up everywhere these days – a full-episode deep-dive into MATS is, as they say, self-recommending.

Ryan's incredible network, and MATS' practice of tapping into their mentors' expertise to inform strategic decision making, gives him an unusually well-sourced point-of-view on the current state of AI safety, and so I couldn't resist beginning today with a discussion that touches on AGI timelines, the confusing mix of impressively ethical and flagrantly bad behaviors we see from frontier models today, and whether or not it's possible to cleanly separate AI safety & capabilities work.

My main takeaway from this part of the conversation is that disagreement, even among the most well-informed and technically capable people, remains extremely high, such that a high level of uncertainty about even the near-term future, and a portfolio approach to research bets, is really the only position one can confidently defend.  

I found this discussion very well-worthwhile for keeping my AI worldview up-to-date, but for those who just want Ryan's insights into the AI safety labor market and advice for how to stand out in the MATS application process, you can feel free to skip ahead to the second half, in which Ryan first describes

  • the research archetypes MATS has identified, including the "Connectors" who define new research agendas and often found organizations, the "Iterators" that systematically develop them through experiments and analysis, and the "Amplifiers" who help scale research teams.

From where, we unpack:

  • why Iterators have historically been in highest demand, and why that is starting to change as organizations grow and AI coding agents begin to lower the engineering barrier to succeed in AI research;
  • and why it remains hard to break into AI safety, despite the fact that many organizations are hiring, and how MATS hopes to help by identifying and developing top talent.

Along the way, we also touch on:

  • which kinds of research require access to frontier models and which don't, and how much compute is generally needed for AI safety research today;
  • how MATS assesses applicants, why some tangible research output is for practical purposes a requirement to make the cut, and how diverse their fellows are in terms of age and formal credentials;
  • and the balance the AI safety community should strike between doubling down on existing research agendas and developing new paradigms.

If you're considering applying, and if you're excited about a career in AI Safety research, I think you should, remember that the deadline for the Summer 2026 cohort is coming up on January 18th, so visit MATSprogram.org/tcr to learn more and get started.

With that, I hope you enjoy this overview of the AI safety field, and inside look at the talent pipeline shaping its future, with Ryan Kidd of MATS.


Main Episode

[00:00] Nathan Labenz: Ryan Kidd, co-executive director at MATS. Welcome to the Cognitive Revolution.

[00:05] Ryan Kidd: Thanks so much. I'm glad to be here.

[00:07] Nathan Labenz: I'm excited for this conversation. So I've mentioned a couple of times that I've been a personal donor to MATS. I think it's actually the first time we've ever met and spoken. But your reputation certainly precedes you. And I've seen a lot of great work come out of the program and a lot of great reviews. And in my, you know, research also as part of the Survival and Flourishing Fund recommender group got a lot of great commentary on the importance of MATS as a talent pipeline into the AI safety research field. So big supporter of your work from afar, and I appreciate the fact that you guys have come on as a sponsor of the podcast recently as well. This conversation, not technically a part of that deal, but we're in one of these. OpenAI style circular flow of funds sorts of things where we're somehow both inflating one another's revenue.

[00:54] Ryan Kidd: But yeah, I'd like to think we didn't buy our way onto the podcast.

[00:57] Nathan Labenz: Yeah, no, the enthusiasm is definitely is real because I've heard so many great things over time. So excited to get into this. I thought we would maybe just start with kind of big picture from your perspective. And I think, you know, having watched some of your previous talks, I know that you play sort of a portfolio strategy where you're not like, I have a very specific, narrow prediction, and I'm trying to maximize the value of this organization, this program for that very hyper-specific prediction. It seems like you're more saying, well, there's a lot of uncertainty out there in the space, and we're going to try to be valuable across a range of those scenarios as much as we can be. With that said, you can speak on behalf of yourself or on behalf of mentors or the community as a whole. Like, where are you guys right now? Where are we in terms of timelines, so to speak? And how has your strategy evolved over the last year or so as we've gained more information on where we are relative to the singularity?

[01:56] Ryan Kidd: Yeah. Okay. So I don't like to have opinions here or I don't like to have opinions very loudly. And the reason for that, I think, is because as you say, we are somewhat like a hedge fund or something or maybe an index fund, right? more likely, which is to say we have a broad portfolio, we adopt a bunch of different theories of change as valid, and we try and have our thumb in 100 pies. So I would say in terms of Mass's institutional opinion on this, definitely we tend to go with things like metaculous and prediction markets and the Forecasting Research Institute, FRI, their predictions and so on. So the current metaculous prediction for strong AGI, I think it's called, which is I think you can ignore most of the requirements of the test and just look at one of them, which is the two hour adversarial Turing test. That's predicted somewhere around mid 2033. Okay. So I think that is probably the best button we have for when AGI of that nature occurs. Now, we recently just had two days ago or three days ago, perhaps, dropped this new AI futures project, dropped this new report. which two Matt's fellows were, one was the lead author, one was a contributing author on, so very excited about that. And that just was updating their model. And I think they predicted something between 2031, well, 2030 to 2032, depending upon how you define AGI. They broke it down into all these automated coders, like they can do all the coding stuff, these top expert dominating AI across all these fields and so on. So I think, I don't know, somewhere around 2033 seems like a decent bet. But also we had, you know, Nathan Young recently compiled all these different like forecasting platforms. So Metaculous, Manifold, another Metaculous poll that was like for weak AGI, it was like a little bit less good at Turing tests and all these other like, I don't know, some kaushy thing or whether they're opening, I will achieve it. And he came out with an average of 2030. Now, I don't know, I still like the Metaculous 2033, but like I wouldn't bet against 2030 in terms of nearest of AGI. As for superintelligence, complicated, right? Could be six months or less, could be a very hard takeoff after this AGI thing. If it's like a very software only singularity scenario where you don't need a big bunch of hardware scale up, you aren't limited by compute, it's just recursive self-improvement or something, algorithmic improvement, AIs are improving the algorithms of training AIs and it's like, wow, that's a fast feedback loop, right? Or you might need a lot more experimentation, right? You might need massive hardware scale ups. You might need like just staggeringly more compute than exists in the world, in which case that could take you a decade, right, to get your singularity. So I currently think that 2033 is a decent central estimate in terms of the median for what we're preparing for, but obviously 20% chance by 2028. I think that's the metaculous prediction. That's a lot, right? So we should definitely be considering scenarios that are sooner, right, and particularly, I think the sooner AGI happens, the more dangerous it might be, right? The less time we have to do critical technical research to prepare, the less time we have to implement policy solutions. And I don't know, like if it's happening during a transition period for a U.S. government could be even wilder. So I would say like median bet on 2033 ish, but like really care a lot about the impacts of AI, like front load your concern to pre 2033 scenarios. And I think that Matt's mentors, I don't we haven't surveyed them. But I think if they were, if we were to poll them, we'd get something similar.

[05:33] Nathan Labenz: Yeah. I always say I'd rather prepare for the shorter timelines and then have a little extra time as a, and I'm sure we'll find ways, you know, that we'll still need it. But it does seem wise to me to play to that sort of, you know, first quartile possible, you know, range of outcomes. Is there still any room in the program or in the community? Like if I showed up and I was like a 2063 guy, would I be sort of out on an island on my own? And, you know, are there any work streams going on right now in the sort of brain computer interface or sort of deep totalizing Obviously, interpretability has different flavors, right? But with the recent turn toward more like pragmatic interpretability, I wonder if there's any space left for the sort of, you know, we really want to understand everything kind of interpretability, or if that has kind of been understood generally as like, eh, that's probably going to take too long for us to really be excited about pushing it right now.

[06:35] Ryan Kidd: Yeah, it's a good question. I actually think there is plenty of room for this, and here's why. The mainline kind of meta strategy that the AI safety community seems to be pursuing on the whole, we're talking in terms of funding, in terms of sheer like numbers of people and resources deployed, not necessarily in terms of like less wrong posts written or something, right? But in terms of like resources deployed is this AI control strategy, which is where like basically you build, well, perhaps it's better called alignment MVP. which is a term coined by Yan Leakey, former head of Super Alignment at OpenAI, now co-lead of Alignment Science and Anthropic. An Alignment MVP is an AI system that is a minimum viable product for accelerating the pace of alignment research differentially over capabilities research such that we get the right outcome. So basically, you're getting AIs to do your homework. And there's been a lot of debate on this. There's a very strong camp in the direction of like, this just never will work because as soon as an AI system is strong enough to be useful, it's dangerous, right? I think, you know, clawed code shows this is not the case, at least for software engineering, but perhaps for people who think that aligning AI systems requires like serious research taste, you know, they would probably say that this clawed code is nowhere near there, right? We're generally AI systems are nowhere near that level of research taste ability. Now, All of the things that you're mentioning that pay off only in 2063 scenarios, presumably, they only pay off over that time period, not necessarily because of human challenge trials or something. Maybe that makes a difference if you're interested in making humans more intelligent with genetic engineering or some of the crazy things that are being tossed around. But if you're mainly interested in, oh, this thing is going to take decades of technical work, maybe you can compress those decades into a really short period of AI labor. Right? If you can like get 'em to run faster, massively parallelize things and just, you know, in general, just get them to do your homework. Those 2063 AI alignment plans might be automatable over a shorter period of time. And so we, we should definitely be pursuing those because the more we do to like raise the waterline of understanding on these different scenarios, the easier it will be to hand off to AI assistance or to, to accelerate AI, AI input. I do think it's interesting you said BCI research, cuz I recall being at a conference once when someone was talking about, Okay, so the way we're going to solve alignment is we're going to solve human uploading, and we're going to put someone to the computer and get them to do 100 simulated researcher years or something. It's not very sci-fi, very pantheon. But then Eliezer Yakowski put his hand up and he said, I volunteer to be number two. Which makes sense, right? You don't want to be the first guy that might go wrong. But yes, people are seriously pursuing that. And I think it is interesting. I have talked to some BCI experts about a year ago and they said, there's no way that we get BCI in time for AGI. Sorry, it's not BCI, sorry. No way we get human uploading in time for AGI unless you actually have AGI, right? The time period required would require massive amounts of cognitive labor and human trials and stuff like that. And I don't know, it does sound very sci-fi, so I don't think we should rely on something like that, though I'm all for people pursuing moonshots on the side. That's part of what maths is about, right? We have this massive portfolio with a few moonshots in.

[09:53] Nathan Labenz: Okay, so there's a lot of different directions I want to go from there, and I'm trying to just make sure I keep running tally. But, you know, maybe an interesting first one would be How do you think we're doing on the AI safety front overall, maybe relative to your expectations? I mean, you mentioned Les Wrong and Eliezer, and there's this sort of, I don't know all the lore of Matt's, but I do understand that a lot of people who have participated in it over time come out of the Eliezer discourse and had a certain set of assumptions that were like, we're not going to be able to teach this thing our values, it's going to be extremely unwieldy from the beginning. And now we have Claude and it's like, man, that's come a lot farther than I thought it would have at this point in time. And I'm kind of surprised in general by how little I see people's pee dooms moving. It seems like the people that had really high ones remain really high. Those that were never worried remain not so worried. I kind of feel like I'm taking crazy pills at times where I'm like, I don't know. I see these deceptive behaviors. They kind of freak me out. It's amazing that that was anticipated as well as it was by the safety theorists, even in the absence of any actual systems to work with. But then at the same time, it's not crazy to me to say that Claude seems in many ways probably above average in terms of how ethical it is compared to the average person. I don't know if that's contentious to say, but Claude, it's pretty remarkable in that respect. What do you make of where we are? Are you as confused as I am, or do you have a sort of a more sort of opinionated sense of how well we're doing overall.

[11:34] Ryan Kidd: Honestly, Nathan, I'm pretty confused. Like, I do think that, you know, contrary to expectations, we are looking like we're... Language models understand our values, right? That's the first thing to update. Like, they understand them in some key sense. It's not just regurgitating like stochastic parrots. Language models are... really good at like understanding human ethical mores and extrapolating on them in, in, in, in, you know, in some scenarios. They're also really good at sycophancy. They're getting even better at deception, sophisticated deception. They do tend to deceive users in the right circumstances. Uh, though it does seem like there's some debate about this and it seems like some of, some of the deception, it's far from what we might call consequentialist, like, like ******** consequentialist deception in most situations, but I think like Alignment faking and some other papers have shown that there are such, you can like create situations where AI will deceive the user to achieve some like ulterior objective, which was something that was like deliberately given to the AI as an objective. So that's one of the constraints of these little. scenarios. I don't think there are any examples of AIs coherently deceiving users, like pursuing this like coherent objective, right? Not just what we might call good heart deception, where they just kind of, they like fall into deception of tendencies because of the limitations of training data. And I'm talking about this coherent deception. There are a few cases of this, if any, where this is like the sustained, coherent deception that appears like to arise spontaneously through the training process, which is pretty good given the level of capabilities we have. Like, it seems like people didn't think five or 10 years ago for sure that we'd have AIs that are as capable as assisting frontier science that are safe to deploy. And people are like, we're never going to put on the internet. Who would do that? It's crazy. And now they're on the internet. And notably, the world hasn't ended yet. That's not to say it will say that way. You know, certainly a thing you don't want to do with the superintelligence is let it out-of-the-box. But Yeah, it does seem like we're in a better scenario than many imagined. Now, there, of course, like we could be in the calm before the storm, right? It might well be that there's what they call a sharp left turn or just, you know, a radical change in the way AI is internally kind of process information and it might acquire these kind of coherent long run objectives. I could point to like Matt's mentor, Alex Turner's conception of shard theory as an example for how this might happen, right? So like instead of AI systems being this, like, you know, containing like a single miso optimizer that is kind of coherently forming under training, right? If you remember the old Evan Hubinger paradigm, your outer optimizer loop, which is training your AI system, causes it to develop an internal like optimizer architecture, which then can have its own goals that differ quite a lot from the training objective. And then like whenever you, and like presumably there's some arguments, such as there are arbitrarily many ways to have this optimizer form to produce the right outputs, because this thing is clever. And if its main goal is to produce paper clips or some other thing, then it's going to realize it's in a training process and it's going to give you the output you want, no matter what its goal is. We still could be in store for that kind of thing. But currently, it seems like we don't-- AI systems are really messy. They're kludgy, like human brains. They have some bunch of contextually activated heuristics. It sees there's a bracket there, and it's like, oh, maybe I'll put another bracket there. It's very simple, dumb circuitry. But then sometimes, it does stuff that is, like in-context learning, seems a lot like it's actually pattern matching to gradient descent. When models are learning from the inputs data stream and learning some a new complicated thing, it seems a lot like they're kind of optimizing over the input tokens, or rather optimizing to produce some output. So we might be in for a world where AI systems do spontaneously gain these MESA optimizers and these things... You know, these are a serious source of concern because they're, you know, very powerfully trying to optimize for some objective. And this is what, you know, this is the main concern I have, I guess, is that we have this kind of deception model, this inner alignment failure, perhaps, where AI systems acquire goals spontaneously, or maybe because they're being trained deliberately to be power seeking and make money on the internet. And then like, they decide to hide and we don't have interpretability tools good enough to detect them. So I guess I haven't really changed my fear about this scenario eventuating, but I have become more confident that we can elicit useful work from AI systems before we see obvious signs of this, I'll say. And I'm pretty confident that AI systems right now are not executing very powerful scheming against this because I think we would see some sort of warning shots. I don't think it's going to be as night and day. I think we'll see some kind of situations where AI systems are trying to scheme in really dumb ways before they try and scheme in very competent, difficult ways. Does that make sense?

[16:33] Nathan Labenz: Yeah, I think that's a good summary. I mean, EVAL awareness definitely stands out to me as one thing that is making everything a lot weirder and just harder to feel confident in anything. I'm not sure really what to make of the deception track that you outlined there. I mean, In some ways, I've often said it feels like we're sort of in a sweet spot where they're getting smart enough that they can help with science, and yet they're not good enough at using a web browser to go out and get too far in terms of self-sustaining or causing whatever havoc. And on the deception side, I'm like, yeah, it seems like the gradual rise of it seems like an example of physics being kind to us. It doesn't seem like we're seeing the sharp left turn. It's like we are seeing these like proto behaviors that are, you know, at least give us something to kind of study if nothing else. But I'm not quite sure how people get confident in the idea that like maybe they're just not that good at it yet. You know, like how, you know, when you said like we would see warning shots, are these not the warning shots? That's one thing I'm still kind of confused on. People seem very quick to me in some cases to be like, well, you know, that was a, structured eval and it was sort of led into it. Yes, it refused to be shut down or it took steps to avoid being shut down, but that was just because it wanted to accomplish its task, not because it had a take over the world objective. And I'm kind of like, well, okay, still though, it did resist being shut down. At what point should I start to consider that to be a warning shot? I'm not sure there's an answer to that. I'm not sure there's really a question there or that there's a way for you to answer it. I guess basically it's just another layer of my own confusion. And it seems like people are very often just led by like extremely different intuitions in response to the same fact pattern, and I'm not really sure what to make of that. The deception one in particular, not that you were doing this just now, but I've heard a lot of different ways that people say, well, we don't have to worry about that too much. And I'm like, I don't know. I'd really like to know that that's resolved at some point. One of my common refrains is if I was going to be part of a military that was going to go into battle with my AI systems, I would really want to know that deceiving the operator issues have been well and fully ironed out. I don't know, it seems like we're a little casual on that. Even at Anthropic, right? They have certainly done some of the best work on this stuff, but they still also kind of seem remarkably chill about it to me. It's strange.

[19:16] Ryan Kidd: I mean, I don't think people should be going to battle with AIs for many reasons. I think that's a pretty bad social more to allow that kind of thing. But that's another matter. definitely AI systems, I would not feel confident in current or future generative AI systems like not doing out of distribution failures, let alone like in critical scenarios, let alone like scary things like deception, you know, when it really counts. And I think that's a big deal. And we should be tracking two things, right? One, AI capabilities. So are AI situationally aware? Do they have the necessary prerequisites to be able to even understand that they are this AI? It seems like they do, right? And they even know their training date, they know some details, they can detect their text from other AI's texts. So AI's becoming increasingly situationally aware. which is one of the necessary prerequisites for really dangerous deception. Do they have the capabilities to hack themselves out-of-the-box, right, to steal money? Like we had this Matt's paper that came out and caused a stir recently. It was this collaboration with Anthropic's AI red team where they found that, oh, AI system, let's just like put it in a simulated environment with a bunch of real smart contracts and it can find $4.6 million worth of exploits. That's a lot of money. That's enough to set up your own server and run for quite a while. And that was a relatively short project. And it was pretty hands-off as well from the humans, but not entirely. So it does seem like we're getting dangerous capabilities, increasingly so. Hacking out-of-the-box, getting money, getting influence, all that kind of stuff. And then I think we want to be tracking all of that very closely. And I don't think we're at red lines currently, but we are approaching them, I would say. And separately, I think we should be tracking, as you say, this model organisms work where we try to elicit dangerous behavior from AIs. You can think of this as like with your child, you leave some cookies out and you're like, don't eat the cookies, and then you turn away, but you're secretly looking. And if they ate the cookies, I caught you. So that's the kind of thing we're doing. The thing is that AI systems can really detect when they're in training versus real environments. But if you recall the AI 2027 scenario, a lot of this yeah like discussion around that people were talking about online learning like it's like we we train open router trains the last big AI agent and then from then on it's just like kind of constantly learning online through some sort of RL Paradigm if AI systems are perpetually online then they're always in deployment and you've got to have monitors and control protocols right so that's why control research is so important for that especially for the early days to catch some of these these these slip-ups, right? So you can do all the model organisms work you want in the lab, and that's one layer of defense to see if we have these capabilities or the penchants for deception emerge. And separately, you need to have all the control evals studying them as they're deployed, especially if they're going to be learning online, perhaps updating their behaviors, and just be constantly checking for this stuff. Be ready. Have a fallback plan, a rapid response plan. What are you going to do if actually you see Serious warning signs. If you shut the models down, your stock price is going to plummet. What do you do? Do you revert to an older system? That's safer? Probably. So I think, yeah, we should definitely be tracking this stuff. And I wouldn't say that we are in the clear by a long shot. I would say that we are in a better world, by my estimation, than Austrom and Murie predicted. you know, 10 something years ago. But I don't know, they would say I'm very wrong about that. But I don't know, I think that it's useful that we can get some work out of these things that looks like it is actually quite likely to accelerate AI safety work.

[23:00] Nathan Labenz: Yeah, so that brings up another, I think, huge question for AI safety research in general, and probably the strongest, maybe not in, I don't know if you would say strongest in the sense of being most compelling to you, but certainly the most hawkish or fiercest criticism that AI safety research gets is that it always ends up being dual use and that it always ends up somehow accelerating the core capabilities track. And some people would say, just stay away from the domain entirely and focus on social shame or whatever. I do believe we can do better than that. I think we probably have to do better than that. But I wonder how you think about that, right? I mean, the canonical like RLHF was sort of a safety technique that really turned out to be more of a utility driver than anything, I would say. I mean, I guess they're both, right? It is dual use. But certainly when it came to like accelerating the field, making the things useful, waking the world up, like having all kinds of people pile in, everything going exponential all at once, you can kind of trace back to, at least in part, this transition from, real raw next token predictors to actual instruction followers. And we've got probably like a lot of those things going on today. The one that I think stands out to me the most is when you've alluded to a couple of times, which is like getting the AI to do the alignment work. You know, that sounds awfully close and uncomfortably close to recursive self-improvement, which is something that I am quite fearful of, I do think, again, Claude seems pretty ethical. The GPTs aren't too bad either, but yikes. Are we really ready to have them do our alignment homework? So how do you think about teasing out as you kind of prioritize different kinds of research, like where you want to invest, what kind of mentors you want to bring on, what kind of talent you want to cultivate through the program? How do you I mean, that seems like a huge question that is a really hard one. How do you think about it?

[25:06] Ryan Kidd: It's a very good question. And I'll preface by saying that all safety work is capabilities work. Fundamentally. Like people like to distinguish these things in terms of like, oh, oh, capabilities work is about the engine. It's about making the plane go faster. And safety work is about the directionality. But as you've pointed out, RLHF, which was intended as safety work to help the directionality, steer to where you want to go, also made people realize, oh, wait, this thing is useful. I can actually hop in this plane now because it's going to land where I want, which made them want to make the engine go faster so they could get there faster. That whole feedback loop started. I actually don't know if you can avoid this. The only way I could conceive of doing safety research, there's no impact on capabilities until the final critical moment when you deploy it. is like being holed up in a lab somewhere with people that you utterly trust under crazy NDAs and only having access to staggering resources, whatever's required, because presumably maths and theoretical methods aren't enough to improve safety. At least that seems to be the lesson of the last 10 to 20 years. I could be wrong, but it seems like the interplay between theory and empirical research is pretty vital for most types of disciplines like this. So you have to have staggering resources, perfectly loyal teams, like all these NDAs, no one's going to reveal your research, and then you build the system in secret or something somehow, and then, okay, then you deploy it, and then maybe you open source your alignment technology and everyone has it, or somehow you disable all the bad actors or something, it just seems like a very difficult prospect.

[26:44] Nathan Labenz: I think that sounds like safe super intelligence in a nutshell. Maybe that's the setup that they've got. Extreme secrecy, unlimited resources. They did have one notable defection, but otherwise, you know, a team that has resisted lucrative buyout offers.

[27:04] Ryan Kidd: So I'm not trying to defend research like this or even defend your capabilities enhancing safety research per se. I'm just saying that it's pretty hard to imagine a situation where you, because I think you do have to build the AGI at the end of the day. And I know I'm alienating a lot of people who might watch the show when I say that, but I think that you kind of have to from a pragmatic perspective because the market forces driving this are very strong. Now, there are some options that we could take, right? We could build direct source comprehensive AI services. So you never have to have like a centralized agent. You have distributed kind of mechanisms, right? You build scientist AI that is very narrow AI systems to serve a bunch of economic things. The problem is, I think they all get outcompeted by agentic AI that like, you stack like an AI company filled with agents and they all like go out in the stock market and make products and so on and just make more money. just beat your crappy narrow AI solutions. So the problem is like it's not just about making AI that is aligned. It's about making AI that is performance competitive enough that it dominates in the marketplace. The only alternative is to like have some sort of draconian like shut it all down kind of thing, which I am just very skeptical of ever working. I don't see any example of such a thing happening. The closest example we have is like stopping human cloning, but that that was not a lucrative bet, like in the same way that AGIs I claim. And also human cloning is kind of this, it violates this deep social more, I think in a way that few people today conceive of powerful AI systems to violate. I think they're wrong. I think building a second species is actually going to violate some deep social more in the same way that human cloning would be, but I don't think people will see it that way. So that leaves us with the fact that we actually have to build the AGI because But if we can build products that are safer, right, or perhaps are under some strict regulatory control that we have some like really like ideally like, I don't know, 10 year international slow phased entry to the new AGI world, right, where all these, you know, countries and companies are kind of forced to be very careful and collaborative in the way that they align their models, then we're in a much better world. That's the world I hope for. Okay, now as to whether AI safety research is unnecessarily capabilities enhancing. Some is, perhaps. RLHF, I think I'm on the fence, 50-50. Definitely at some point, RLHF was, the idea was in the water. It doesn't seem like it would have lasted much longer if Paul Cristiano and Dari Amadei et al. hadn't done that. I think someone else would have done it. That's not to say that you should necessarily try and accelerate the frontier of capabilities. It seems bad on then, but certainly RLHF opened up the door to a lot of very promising ways to build alignment MVPs, which kind of is the Cristiano meta strategy too. I don't know. It's hard to say. I'd like to run the counterfactual simulation and see where the world would be without ROHF one or like one year sooner or two years sooner. That would be interesting to see. It definitely did kickstart, I think, the, you know, ChatGPT revolution and, you know, productizing AI systems. But it's hard to see, like, given how small the AI safety field was at the time, The AI safety field, I think, has grown from the increase of AI exposure, right? So you would've had some amount of additional AI safety research that happened had ChatGPT moment not happened then. It happened like one or two years later. But I think it would've been kind of insignificant, if I'm honest. I don't think that the field was big enough. Now you could say like, okay, what if you also tried to pour resources into like secret AI safety projects at the same time, delay RLHF, delay ChatGPT, build up the AI safety field, uh uh via networks the myri summer schools weren't doing a lot and MATS came along uh just before the ChatGPT moment December 2021 and yeah I think like the first MATS cohorts were a little bit less like a little bit more directionless than the later cohorts definitely like I think safety research really kicked into gear after we had ChatGPT uh not to say that was the only cause but there were like a lot of things happening around that time and I think that like Definitely larger, more capable models have enabled certain types of essential safety research you could not do with smaller models. We're talking like interpretability on models that actually have, you know, coherent concepts embedded in them. But we'll say there's probably plenty of work to still be done on GPT-2 small, but linear probes and whatnot at a high level can target some of our frontier models. You know, Quen, these Chinese models are particularly good for that. certain types of debate. Like we had the first interesting empirical debate paper only after models were good enough to debate. And there's many, many other such examples. Like all the control literature, I think, just could not have happened as well. Sorry if that's too much.

[32:02] Nathan Labenz: No, it's great. Yeah, I was going to ask also about the idea that it sounds like you sort of believe it, at least up to a point. But, you know, going back to the sort of founding mythology of anthropic, I think one of the notions that was seen as like a legitimate reason, even among pretty hawkish AI safety folks for starting a company like Anthropic was, well, if you want to do the safety research, you've got to have frontier models to do it on. Otherwise you're just inherently behind. And then, you know, what good is that, right? What good is it to work on like last generation model? So obviously we've got a, you know, quite a few generations between GPT two and now. And sure, like, we don't understand plenty of things about GPT-2, but then I would also say there are a lot of emergent behaviors that are not observed in GPT-2 that are definitely of interest, including, you know, many of these deception and eval awareness things that are kind of most hair-raising to me today. Where do you come down on that now? Like, I wonder if somebody's like, geez, should I go to a frontier company because that's where the best models are, and that's where inherently that means the most consequential work would be done there. Or I could go work independently or at any of a number of other organizations, and I might be limited to a smaller QEN model or something. But maybe that suffices. Maybe there is enough in those kind of second tier models as we enter 2026 that you don't really need to be working with the latest, latest, latest. Again, I think I am mostly probably just confused or unsure about this, but do you have a take?

[33:46] Ryan Kidd: I mean, yeah, for plenty of interpretability research, people aren't using the frontier models. You don't have access to them. I mean, sure, people in the labs are, but at MATS, there's tons of really excellent papers that keep getting produced, and from many other sources, right? Eleuther AI, Far AI, et cetera, that are like doing world-class interpretability research on sub-frontier models. Because today's like sub-frontier model, today's QUAN or DeepSeek or Llama or whatever is, it's like yesterday's frontier model, you know, in terms of capabilities. We're at that point where these models are all above the waterline, you know, for doing really excellent research. So from an interpretability perspective, I don't think you need to be pushing the frontier that much, if at all. From the perspective of other types of research agendas, such as like weak to strong generalization and other types of AI control and skilled oversight things, I think you kind of do need more data points. I'm not saying we've exhausted everything you can do with the current models, far, far from it. But I think like you are gonna need more data points to build up, you know, consistent and to see some of these kind of worrying behaviors emerge where your weaker model can't actually supervise your strong model in all situations. Which, by the way, is predicated on this idea that verification is easier than generation, P versus NP, blah, blah, blah, especially if you can see the other person's thoughts and they can't see yours. It does make sense to be at the frontier from that perspective, but I will say that I think the main reason that the companies are doing this is obviously to make money. And then, you know, in corollary of that, like from a safety perspective, you were trying to actually make a strong case for being at the frontier, it would be like, so our models are performance competitive, and that like they're close enough to the frontier, like fast follower kind of model, that like you take a performance hit by using ours instead of the competitors, but they're safer. And currently no one wants to use anything that's worse than the frontier model. Why would you? That's the best model. But if a model was like, I don't know, 10% more likely to tell you to jump off a bridge or something, or actually, seriously, 10% more likely to hack your bank account and steal all your money, let alone, I don't know, escape and make a bioweapon, I would like to think people would use the less good model. And I like to think that regulators and insurers could adequately penalize the frontier companies into complying with those. Because then you have existence proofs, like, oh, my product, I'm actually trying. I made an effort. I tried to make my product not do the heinous thing that the very best model developer is doing. Then everyone has no excuse, and they have to do that, and governments can compel them to, and so on. So I think making your model performance competitive enough that people want to pay the alignment tax, so to speak, seems like a viable strategy from that perspective. Now, of course, none of this is trying to justify the current frontier, the race of frontier models, which seems very reckless. Let's be clear. I think at the current pace of development that we're going to be in a lot of trouble. But this is one of those collective action problems. These companies have to coordinate to slow down. And there's international things at stake here as well, because now you have, you do have a US model versus China model developer kind of race now that they're in the running. So it's very complicated. And when you have these collective action problems, I think the main way you solve them is through governance. And sure, the lab leads could be probably even more collaborative. And definitely some of them are not advocating as strongly as they should be for slowing down and for having this kind of collective, you know, kind of sharing in the alignment benefits and and, you know, not pushing the frontier dangerously. But I do think this is ultimately a job for governments.

[37:41] Nathan Labenz: Yeah. Quick. This might be a bit of a digression, but quick follow up on when you said companies are primarily doing this to make money. I actually would model them fairly differently than that. Take OpenAI, for example. Sam Altman has said, you know, we could burn $5 billion, $50 billion, $500 billion. I don't care. And we're making AGI. It's going to be expensive. I think that's almost a direct quote. And then when you look at their mission, I'm always struck by how the even just the way that they've chosen to define AGI to me strikes me as kind of inherently ideological. Like you could set your goal in any way, shape or form you might want to, and theirs is like explicitly in this frame of outcompeting human workers. And I think that they are sincere in their expectation that it's going to be good for everyone and it's going to free people from drudgery. And like, I certainly hope they're right about that. I'd love to live in a world where people don't have to do work they don't enjoy doing, but I don't know. I kind of I think of them more as like Obviously they're quite different too across the different companies, but I think of it less as like trying to get rich and more like trying to make a real mark on history is kind of the biggest summary that I would give for a lot of them. Does that resonate with you or not really?

[39:07] Ryan Kidd: Maybe. I don't want to I can't really speculate on the psychologies of the leaders of these labs, let alone their share, like not shareholders so much as I guess like venture capitalist investors and everyone else they've made promises to, their clients and so on, their employees. I can't really speculate about that. I will say that given that the value of AGI is estimated as at least between 1 and 17 quadrillion dollars, I think that's a lot of money. It's a pretty big mark on history. I'm not sure if it even matters whether they're trying to make a big mark on history or make money. We can adopt Dennett's intentional stance about the AI companies. Be like, Okay, so what does it look like they're doing? If we were to conceptualize them as a coherent agent trying to do a thing, what is the thing that they would be trying to be doing? To me, it seems a lot like they're trying to make a bunch of money. But making your mark in history could also be valid. Though I would say, I guess in the world where I expect, I don't want to use any specific AI lab as an example, but I think in the world where an AI lab is trying very specifically to make their mark in history and not trying to make a bunch of money, I'd expect It might look identical, actually, to this world right now.

[40:22] Nathan Labenz: Yeah, given the capital requirements, I mean, that is, it seems like Anthropic has sort of said as much, right? Like in their early days, they were less focused on commercialization or even, you know, sort of thought they might try to stay not commercial somehow or less commercial. And now it's just like, well, you can't really do that if you want to compete in this particular game because Certainly, as long as you believe scaling laws continue to rule everything around us, then you kind of just have to show that you can bring in resources to attract resources. And that is the path to making a mark on history. Again, Ilya maybe stands out as somebody doing something quite different there.

[41:02] Ryan Kidd: He's a billionaire. He's raising huge amounts of money for his models. Maybe he is going to make more money this way in expectation than he would have made staying at OpenAI. It's possible. He's got his own company now. He still retains OpenAI stalk, I'm sure.

[41:21] Nathan Labenz: It's funny that he's come up a couple of times in my mind just as sort of pattern matching on some of the things you've said. But the idea that somebody's going to go straight to super intelligence and then drop it on the rest of the world, I think he's kind of softened on that a little bit. But that general pattern is I think if there's one thing that OpenAI probably did have, the iterative deployment idea and giving people a sense of where we are and not keeping the whole thing under a basket somewhere, I think that was one of the things that seems like it's aged pretty well in my situation so far.

[41:55] Ryan Kidd: My earliest perception of the person advocating for this was Paul Cristiano, his takeoff speeds post, pushing back against. I suppose what he saw is predominantly the MIRI perspective at the time, like, oh, we're going to We're going to build the thing in secret. Frankly, I don't want to comment on-- I don't know what Mary's objectives were, but I know that they were trying very hard to not leak any information about their alignment research in some areas. Other areas, they published great papers and so on. But Paul Cristiano at the time was pushing back against this kind of-- he thought that fast takeoff would happen if you had a bunch of dry tinder lying around. had tons and tons and tons of GPUs, and then we stopped research for a year and then started again. Well, you'd expect a steeper growth. We're seeing this in terms of very, very fast followers. This is not just a phenomenon in AI and economies. Epoch recently did a study where they showed the pace of new AI companies approaching the frontier is just so much faster than the pace of which the frontier moves because there's an abundance of chips, there's an abundance of data and methods. and so on. And it's the same with like, you know, catch up economies and so on in the world. So I think that Paul Cristiano, you know, he was right in the sense that like if society is to cope and adapt to AI, then having gradual release and diffusion of technologies is better from that perspective. There's another perspective which says like it's that very gradual release or something that ensures continual VC reinvestment to drive the engine to actually make the progress. Whereas in the other world actually you just wouldn't build AGI because like perhaps in that world, like no one can build it without hundreds of several hundred billion dollars, maybe, you know, a trillion or something. I don't know. I can't say. I certainly think that we're now in the world where it does seem better to have gradual release of models than to have it all kind of hit us at once.

[43:52] Nathan Labenz: Well, I always value the opportunity to get perspective from someone like you who is such a connector and at such a central node with so many mentors and mentees and all the kind of flows of information and talent flows that you are so close to. But we should probably narrow focus a little bit and talk about what you guys are actually doing at MATS. And I'll put maybe a timestamp in the intro so people can also, if they want to get right to the core MATS stuff, they can Zoom ahead and join us, skipping over some of the higher level stuff. Why don't we do just kind of a quick rundown on like, what are the different tracks, I think you call them streams of work that are happening in the MATS program today? Maybe like a little waiting or sort of, you know, what you're most excited about. Then we can go into kind of your assessment of the AI safety labor market, which I think is really interesting and unique, and we'll take it from there.

[44:50] Ryan Kidd: We recently changed up our track descriptions. So we previously had like the standard oversight control, evals, governance, interpretability, agency, which is sort of a catch all term for cooperative AI and agent foundations, and AI sentience, digital minds research, and of course, security. But we've recently changed it up because we want it to reflect more, less the theory of change underpinning those kind of things, and more like the type of process and type of individual that works on this, right? So we now have the tracks on our website, empirical research. This is AI control, insert, skill oversight, evals, red teaming, robustness. A lot of this very like hands-on, coding heavy, iteration focused research, right? We have policy and strategy, which is different again, right? That's much more focused on, like less on like archive publications, potentially more on modeling, more on like adapting technical research into things that are actually actionable by policymakers. Theory is another track. So this is a lot of mathematics, it's foundational research on the concepts of agency and how agents interact. does include some of that agent-based modeling for cooperative AI. Technical governance, which is plenty of stuff like compliance protocols, eval standards, how to actually like enforce these kind of things. Like if you have an off switch, how would you even like make such a thing be, you know, viable in a governance framework? And then compute infrastructure, which is stuff like tracking where chips are going, right? Because if you're going to have international compliance, with various types of trees. You just know where your chips are and what they're running as well, or at least have some zero-knowledge proofs that guarantee they're not doing terribly dangerous things. And of course, physical security. If you build superintelligence or AGI or something, presumably you don't want everyone to have access to it and arbitrarily modify it, give it weird goals, because that would be bad. Some people say that's good. I say that's-- we don't let everyone have nukes. You know, why would we let everyone have super intelligence? It seems kind of ridiculous. So you've got to have physical security to prevent that from happening, to prevent diffusion. So yeah, those are the main tracks now. We're super excited. I think we have somewhere over like 50, 60 research mentors lined up for our summer program, which applications are open right now. And it's going to be the largest program yet, 120 fellows across our Berkeley and London offices. Anything else I should say?

[47:08] Nathan Labenz: Yeah, maybe you want to do like the weighting of those, like how many, I don't know if you would break it down by how many mentors are in each of those categories.

[47:17] Ryan Kidd: Yeah, current program has something like 27% evals, 26% interp, 18% oversight control, 12% agency, 10% governance, and about 9% security. I wish I had a figure I could show. I do have a figure, but it might be hard to show in the podcast format, but as you can see, like it's a pretty even mix of things. You know, we have something like maybe roughly three times as many people doing evals as doing security. So there is like some divergence there, but we have a pretty broad portfolio. And that's just because there's like tons of amazing researchers. We really just like pick some of the top researchers in every category.

[47:55] Nathan Labenz: Maybe it's a good time to just name some, drop some names.

[47:58] Ryan Kidd: I could do, yeah.

[48:00] Nathan Labenz: There's a lot of names that people will know.

[48:04] Ryan Kidd: Yeah, I mean, some of the oversight control researchers might be more known because this is one of the things that a lot of the companies, big companies are pursuing. So we have people like Buck Schlegeris at Redwood Research and his whole team are part of that. Ethan Perez and Sam Marks and many other people at Anthropic. We have Eric Jenner and David Linder and Victoria Crokofeno and many at DeepMind. So yeah, just tons of people doing oversight control research. For interpretability, obviously we have Neil Nanda, We have some of the Timaeus folks like Jesse Hugo and Dan Murphitt. We have, of course, the Goodfire people like Lee Sharkey, longtime mentors. Some of the people from Simplex who are doing some of the interesting, very interesting stuff, like Adam Shea and Paul Rikers. And they're doing like, yeah, it's all of Timaeus. I would say that this is some of the more interesting, kind of maybe more moonshotty, but very promising interp research bets that I have on the side as well. For evals, we have people from Meter. We have people from UKAC with people from Apollo Research, Marius Hoban, plenty of others there. Yeah, I could go on. There's, there's some amazing researchers there. We also have some harder to categorize research. Yoshua Bengio's whole team at Law Zero, plenty of Yoshua Bengio himself and plenty of others are there. And we have some AI sentience research as well, digital minds. So Patrick Butlin at Elios AI, Kyle Fish and Anthropic. Yeah, it's a very exciting program.

[49:30] Nathan Labenz: Yeah. That's quite a who's who in a couple past podcast guests and a couple that I took note of as maybe need to put down an invitation. It seemed like the balance, if I was kind of categorizing those right, it seemed like a majority would be in that first empirical category. Is that, do you think it stays that way? Or, you know, your comment on kind of ultimately this is a job for governance tracks some of the Honestly, the more MIRI line these days. I think the MIRI line today would be like, we don't really have time for that much research. We need to just go straight for the global treaty. You're not obviously quite so confident in that direction, but it sounds like you do believe ultimately that there is a major role for governments to play, and you're starting to move more in this governance and policy direction. Do you see that as-- is that going to be the biggest growth area reflecting that worldview, or how do you expect the balance of these different areas to evolve over time?

[50:32] Ryan Kidd: I actually can't necessarily say. Well, okay, I can speculate, but I'll say this. We have had actually about the same proportion of governance researchers, give or take a few percent, for the last two years. It hasn't changed by fraction. So we are quite on track for, you know, continuing the same trend potentially. Part of the reason is like we are based in some of the, particularly in Berkeley, SF Bay Area, this is a big technical hub. There are other programs that have had more, you know, a deep governance focus. Like we have gov AI, their classic fellowship. We have IAPS, we have of course RAND CAST, this large program run out of RAND, and plenty more besides, but Yeah, and of course, Horizon Fellowship for US Policy Careers. And these have also kind of existed for longer than we were around. So at a time when we were basically the biggest fish in town, which we are still in many ways in terms of funding and I think in terms of prestige as well. For technical AI safety, we're the biggest and best program. But I would say that for governance, there's always been a bigger fish. And so we've never felt that it's necessary to overweight governance beyond what our mentor selection committee indicates. In fact, that's the primary determination of what tracks get selected, right? Is our mentor selection committee, which is somewhere between 20 to 40 top researchers, strategists, et cetera, org leaders that we survey. When everybody applies as a mentor, we decide the mentor level based on the feedback from our mentor selection committee, who gets in. With some additional caveat that like we also have some diversity picks. and minimum requirements because we want to support a great breadth of research and we think that the mentor selection committee on the whole might be biased in some ways as well. So we try and like really talk to the experts when it comes to picking the agendas. And it so happens that like governance researchers have historically been like relatively low rated by our committee, which contains many governance researchers. I think I would go so far as to say that governance research is harder to do well. in some critical sense. It's harder to see what is the actionable thing to do in some ways. Now, everyone who has their specific governance agenda, I would say, doesn't feel this way for a good reason, right? Within their agenda, they have clear, actionable things to work on. But I think on the whole, there's just so many more possible technical directions to pursue that are high leverage in some ways as well. I think a lot of the governance stuff is like, oh, we're trying to build this is, this is not talking about advocacy now, right? This is talking about technical governance. We're trying to build technical governance solutions such that if an administration so deems them worth, you know, uh, uh, deploying that we have the capacity to do that. We actually have the solutions that can be deployed, which is very important, right? But I would say that like, don't rule technical research out because it is Like, especially if we have something like a regulatory market or even like warning shots that cause the public to wake up and tell Congress to regulate this stuff, right? We have to have technical solutions ready to deploy to make these systems safer. And the cheaper we make it, right, to make systems that extra degree safer, right, to lower the alignment tax that companies have to pay to train and deploy their systems to be safer, the more likely they are to do it. when they come under pressure, either internally, externally, or whatever. So I think that lowering the alignment tax via technical research is super important still. Also, if this alignment MVP plan is going to work, we have to have a bunch of directions for things to be iterated on by these AI assistants, or humans calling teams of AI assistants, as it's more likely to be. you actually have this massive interplay between technical research and governance research, where things like evals and safety cases built on technical AI safety solutions are things that can actually be tangibly put forward in policy proposals, right? And can convince policymakers of demos and evals and model organism honeypot traps, right? Where AI systems deceive the users or whatever. This is what convinces policymakers to make policy and gives them a tangible target for their policy to work on, right? So there's a clear flywheel here. So I would say like, do not roll out technical research. And there is a reason why MATS has so many more technical mentors. And that's just because it seems like on the whole, our mentor selection committee thinks that, you know, I guess on average, a technical portfolio is worth pursuing.

[55:03] Nathan Labenz: Yeah. That reminds me of what Jake Sullivan said in terms of advice for the AI safety community, which was basically, you need to make this stuff as concrete as you possibly can so that people like me have something to really latch onto. Because as long as it remains sort of a theory or like a possibility or, you know, whatever, It's just really hard to get government to do much stuff, you know, on that basis. So he was like, the more grounded and concrete all these fears can be made, the more likely you're going to have success in the policy realm. You mentioned advocacy as well, briefly there. Would you ever consider an advocacy track? And I guess it might be like advocacy research. You know, I feel like right now we do have groups doing advocacy, obviously. I'm not sure how data informed their advocacy, you know, strategies generally are. But I'm always struck by when I do see survey results, it's like, yikes. You know, the public is not like super keen on AI in many ways. Do you think that would ever be something you guys would expand into?

[56:12] Ryan Kidd: I mean, you assume we haven't.

[56:14] Nathan Labenz: Yes, I haven't seen it.

[56:16] Ryan Kidd: Yes, so we are a 501 , so we have to keep our advocacy and stuff to a minimum. And I think a lot of Matt's strength is this impartial player. We're trying to be somewhat of a research university, tech accelerator kind of vibe. We don't want to play favors politically. That's not in anyone's interest. I think if people are doing that and they're trying to be the thing we are, they're doing a bad job. That said, I think currently, I believe David Kruger is going to be a mentor in the current program. And some of his research that he's going to be discussing is to do with, I guess, what sort of messaging and what sort of standards are actionable, right? But of course, I wouldn't say this is true advocacy. This is more masses supporting independent research, working with David Kruger, who has his new org, Evitable, not inevitable, Evitable, which is focused on some of these advocacy questions. I think Matt has to be pretty careful, in terms of obviously our 501c3 spending requirements for advocacy. We haven't spent anything on advocacy, for what it's worth. And also like, you know, ensuring this political neutrality. so that our fellows, our mentors, and all of our strategic partners just can feel assured that we are like, you know, we're solutions oriented, right? We are pushing for a particular outcome, right? And I think that AI safety being a political football is just a bad idea. And I applaud advocacy orgs, like in code and plenty of others, like, you know, perhaps CASE, et cetera, for their efforts to try and, you know, to, to, that's not, that's not mass as an organization.

[57:54] Nathan Labenz: Yeah, gotcha. Toe in the water at most for now. Let's talk about the profiles. I both watched a talk of yours and read a blog post from about 18 months ago where you kind of sketch out the different archetypes of AI researcher that you have seen, and then also kind of map that onto the demands of organizations. And I don't know how much it's changed. in the last 18 months, if it all. But maybe give us the kind of baseline and then if there's any update, I'd love to hear how things are changing, especially, you know, I have in mind, of course, clawed code. And it may accelerate certain people. It may empower certain people to do things that they couldn't otherwise do. But yeah, tell us, first of all, how you organize your thinking about the kinds of people that you're bringing into the program.

[58:42] Ryan Kidd: So, I mean, Matt's like I've talked about mentor selection committee. Well, we are fundamentally I think, this massive information processing interface. So we consult the very best people as much as we possibly can. And we try to build our own opinions, but we don't rely on them. We try and consult experts at every stage. So the paper or blog post you're mentioning, which was called Talent Needs of Technical AI Safety Teams, to construct that, we surveyed 31 different lab leads and hiring managers, whoever we could get, the most senior person we could get related to safety at every AI safety org we could find that was hiring. at that time. And we asked them, what do you need? And then we compiled all that survey, well, those interview notes into like three archetypes, right? This is just technical. We've since done this for governance. Expect that to drop soon. So those three archetypes were connectors, iterators, and amplifiers. So we chose the term connector because these people are bridging gaps between theoretical arguments for AI safety and theoretical techniques to make AI safe. and the empirical techniques to actually make it happen. So they're sort of like spawning new empirical paradigms to work on. Okay. No one is hiring these people. It's pretty rare because if you're good at that, then everyone knows your name and you're already hired. Perhaps you're already leading an organization and everyone wants to be an ideas guy, but very few people want to, to hire ideas guys. And these people typically, it's people like Bach Schlegeris, you know, AI control, uh, Paul Christiano, right, just a huge amount of, uh, resources he's produced and so on. You know these people, right? They typically have AI safety organizations they founded and lead. Then there's iterators, right? And this is not just engineering, right? Iterators are active researchers who have strong research tastes who are pushing the frontier, but they typically aren't creating novel paradigms based on theoretical models of things. They're typically advancing empirical AI safety. And you can even imagine iterators and technical governance agendas as well. So this is the majority of people that are working in AI safety today and also the majority of hiring needs in the future. And then there's amplifiers who I think like the closest example is like TPM archetypes. I'll say this for iterators like prominent examples include like Ethan Perez, Neil Nanda, Dan Hendricks. Actually, I think Dan Hendricks maybe crosses some boundaries there. But yeah, amplifiers, to distinguish them, they have more focus on amplifying people. And typically you'll find them on large research teams. And they're scaling the number of people that can be effectively managed and contribute to organizations. So a lot of maths research managers would fit this category, or TPMs at the various labs. And interestingly, they're actually quite in demand as well, particularly for labs in like the 10 to 30 FTE range. They're the most in demand archetype because it's very hard to hire great people managers who also have the requisite research experience. You're trying to hit two bullseyes. And there are ways, of course, like Google has this sort of model where you have your research managers and your people managers, your project people managers, and they're somewhat distinct. And Matt does try to do this for our mentors and our RMs. But yeah, I think the need for amplifiers is only going to grow because as you've said, things like cloud code and other AI systems are going to erode away the minimum technical skills required to contribute. And I think also AI agents are going to take more of those things. You end up with a situation where your people skills, your management ability, your networking, your amplifier skills in general are the more bottlenecking thing on AI safety research. So all those iterators out there, there are job opportunities. You are still the main thing everyone wants to hire. But if you don't try and build up your management capabilities, you don't work at managing AI systems, then you are going to be left in the lurch as the needs of the field shift to amplifiers.

[1:02:37] Nathan Labenz: So to just try to echo that back to you, the connectors, another name for them might be conceptual visionaries. Like these are the people that define research agendas where they just previously didn't exist, like de novo high concept work. They in turn need iterators, which sound like essentially machine learning engineers is kind of the core skill set I--.

[1:03:03] Ryan Kidd: Scientists. Scientists and engineers, yeah.

[1:03:05] Nathan Labenz: And they're the ones that are running the experiments day-to-day, building the tooling, writing the code traditionally to like do the visualizations of the data and kind of taking this sort of initial conceptual hit that the connector came up with and like really kind of systematically mapping out that space. And then these organizations, as they grow, then they start to need amplifiers, which I maybe would call like leaders, you know, people that can sort of build up an organization, see that people are working well together, that it can go past the sort of two pizza rule, right, in scale. Is that changing now? So, you know, when We hear things like 90% of code will be written by Claude, and that seems like it's, you know, kind of closer to right than wrong. And certainly like I vibe coded three AI apps for family members for Christmas presents this year, which is something I, you know, would not have come close to being able to build previously, even with, you know, just one or two generations of model and go. I do wonder how much the skill set is already changing? What are you seeing there? What's the up-to-the-minute in terms of how people are thinking about changing hiring needs?

[1:04:26] Ryan Kidd: I mean, up-to-the-minute is that you have to be very proficient at using AI. And I think that some of the companies have updated their coding interview processes to allow for use of AI assistance. Because on the job, you have to be using AI all the time. That's just critical to succeeding in this field, to be amplified by AI. I would say that goes for every one of these archetypes we've identified. I do think as well that checking whether AI output is good or not in critical context is still going to be a very important thing. And stitching together different types of AI output and building pipelines to more efficiently process that are also going to be very critical. But we might be leaving the leap code era. I will say this, that like amplifiers, while not currently the most in demand across all different hierarchies of AI safety organization or team are, I think, probably in the next year or two going to be the most in demand. But that's just, that's based on my predictions about AI progress. As you say, it could be slower. You know, there could be jaggedness concerns, you know, that slow down this type of talent transition. But in general, it's never good for your, it's never bad for your employability to spec out as a manager. Managers are very useful and leadership traits in general make you more useful, better employee. It's part of personal growth, I think, to take on some leadership roles.

[1:05:52] Nathan Labenz: What does supply and demand look like these days in terms of like, you know, I guess maybe even at the highest level, going back to the, you know, the sort of origin story of Matt's, my understanding was you said, geez, this AI safety thing seems like it's going to require a lot of people working on it in a lot of different roles. And This is not something that universities teach, right? So like, how do people, what's the on-ramp to doing this? If they're, if somebody would benefit from one, like, you know, where do they go? So you've essentially created one, and of course there's some other others out there too, but you've created one of the largest and most highly regarded ones. Where are we in terms of like, are there a lot more jobs out there than mats can produce fellows? Are there, yeah, you know, how do you think about that? I feel like we've gone back and forth a couple times where at one time it was like, Oh, we're super talent constrained, and then it was like, Well, maybe not so much anymore. Now it's like there aren't actually a lot of roles for people to go into. Maybe I'm wrong on this, but I feel like this has seesawed back and forth, and I don't know exactly where we are today.

[1:06:57] Ryan Kidd: I'll start by saying I didn't found MATS. I didn't co-found MATS. I was in the pilot program as a participant. Yeah, yeah. There's like five of us who ended up doing the first research program. And it was, it was a pilot, like they didn't have open applications. It was just people nominated from what was the first AI safety fundamentals course, what's now Blue dot Impact, right? And we did that. And the credit goes to Victor Warlop and Oliver Zhang, who is COO and co-founder at Center for AI Safety. And I joined the team right after that program, and I would say, and then Oliver left to co-found Case and Christian, joined on as well and then Victor left shortly thereafter and I would say that like I scaled NATS that's my contribution yeah and and you know since Christian and I kind of you know refounded it in that we like formed a separate 501c3 a couple years after that because we got too big for our fiscal sponsor so yeah so I'll take credit for scaling NATS and for being the driving force behind strategies since I guess mid 2022 I I believe but okay in regards to talent needs Yeah, so that's a good question. Yeah. Sorry, actually, tell me the exact wording of the question again.

[1:08:11] Nathan Labenz: Well, yeah, what's the balance of supply and demand? This might not be right. I mean, you can correct me on this too, but I've had this sense at times where people have sort of said, there's so much demand for this kind of talent, where is it? We're talent constrained. But then other times I have heard from people that now people are rushing into the field and there's not actually so many roles available, and so people are kind of frustrated. But I don't know where we are right now in that back and forth.

[1:08:37] Ryan Kidd: Yeah, so, okay, I'll say this. AI safety is a field where there are always going to be jobs for the best people. Okay, if you're a cracked coder, you can get a job in AI safety. Like the anthropic alignment science team is growing at 3x per year. Okay, they're trying to scale fast. Far AI, a nonprofit, 2x per year. Okay. So, Matt's itself, we've been growing two X per year over our entire history. So these, these teams are scaling fast. Okay. And many more are getting founded. Open Phil has, sorry, coefficient giving has huge amounts of, of, you know, grant money to spend on this stuff, right? There are like a dozen AI safety focused VC firms out there to fund your profit. Like there are incubators like Catalyze Impact, Selden Labs, I believe Constellation has one now, there's tons of program like Matt's. I think the problem is that the, the, the level, once you have built an organization, especially if you're scaling very fast, right, those hits a certain size, the main constraint becomes like, is this person good enough to warrant the extra management overhead? Can they take on some management responsibilities? So you have this situation where like at OpenAI, like where people are managing like 10 to 20 individuals, maybe up to like, I believe one person in Anthropic Alignment Science had 18 reports. So they're really flat. So you have this real problem where you need to hire people who can quickly ascend the ranks and be research leads, be managers and so on, even PIs of new teams. And that is like the limiting constraint, okay? And that's the reason why a lot of people do some moderate re-skilling and then can't get hired because I think these jobs, like there are many, many opportunities But what we find is when we talk to these hiring managers, they say, We find it extremely hard to hire. We have the money, right? We have the clear need, but people are not at our bar. And that's what Matt's is trying to do, is to get people up to that bar. And there's some technical skills element to that, right? There's also some... you know, just like actual research experience. So people who come into MATS with prior research experience do so much better on average than people who have less research experience, right? So I think a strong option for many people should just be stay in academia, get your bachelor's, get your PhD. For other people, maybe they should go off and found a company. There's tons of money and directions for AI safety companies. I think founders are strongly needed in this ecosystem. And then you can create opportunities for more people to get hired. But I'll say as well, another thing NASA is trying to provide is credibility. I wouldn't say not formal accreditation, but in some sense, they have the reference from their mentor, who's a senior researcher in the field. You have the exhaustive MAT selection process, which is trying very hard to find people who are good. And then you have your proof of actual research impact. You produce a paper, right? That's your name. Perhaps you go to publish in a conference or an archive and people are talking about it. So that's what people need to get employed these days, right? You need to have an actual great output, some sort of deliverable that you can put shows your name, maybe several, right? You have to be like technically good enough at coding or using AI systems, whatever's required. And you need references from people that are trusted. Otherwise, it's just very hard to get ahead. The same story you see in every talent constrained job market is here.

[1:12:10] Nathan Labenz: How does that translate to experience profile? This is obviously a big question in the broader technology world, right? Like our junior programmers and endangered species, we see like very prominent examples like Neil Nanda and Chris Olaw, who are broke into the field at a super young age, also kind of defined it in a way, and are still quite young actually, even today. And that may lead people to think that like this is like a young person's game, but what you're describing is more like, sounds like more like post PhD or sort of, you know, somebody who's kind of been, you know, grown up in an organization to an extent I'm thinking of, Raju from the AI underwriting company, who was at McKinsey for a number of years and now has co-founded this organization, but comes to it with a ton of experience and sophistication in terms of management, leadership, all that kind of stakeholder management kind of stuff. What do you see in terms of, is there a lot of opportunity for people, say, straight out of college, or is that-- are they kind of barking at the wrong tree if they want to go directly from undergrad into this space?

[1:13:15] Ryan Kidd: So, I mean, the median maths fellow is 27. Okay. So there's like somewhat like a log normal distribution, long tail. I think the oldest one last code was like 55, 60. So there aren't people of all ages applying to maths. The youngest person is, of course, 18 because we can't take minors. Now, okay, more statistics, right? 20% of maths fellows are undergrads. They have no bachelor's degree. Okay. Or perhaps some of them haven't even applied for bachelor's, right? They're just cracked engineers. About 15% have PhDs already in the bag. So, at least as far as mass is concerned, as like this accelerant, reskilling, retraining, internship, mentorship program, whatever, you're getting a broad distribution of people. Now, I think that there is obviously huge demand for people with more experience. A second critical thing is that they have experience with the latest tools. And because these tools haven't existed very long, young people have a strong chance of, you know, being people who are particularly good at using these tools because they've just been constantly on the cusp of things. They haven't been sitting in a cubicle not using cloud code every day. So to that extent, young people have a huge chance. But it is the case that you do gain valuable knowledge from working on the job, especially in a great team, producing great papers that you can't replicate. You've pointed to some, what I'd call, prodigies. Chris Olda, Neil Nanda, there's plenty of people of that ilk who have come through maths that no one, I think, Actually, that's not true. There actually are some people who I would put in a similar class, like Marius, Hob Hahn, et cetera. And in that case, our main job is just to get out of their way. And I think that if you're that kind of person, don't let anything hold you back. Apply to maths, apply for grant funding, do whatever, come to the Bay, go to London, and just make it happen. You will find your path. If this is like... maybe not in your path, especially if you're like a more senior researcher or perhaps like a person who's like, Man, I can't conceive of that. I just want to finish my undergrad degree and do a PhD. That's fine too. People of every single walk have passed through masks, got hired, done other programs, got hired, founded companies, et cetera. I think like the advice, it's hard to tailor advice to like, you know, a myriad of different types of people, but I would just say focus on your technical skills, focus on understanding the frontier of technology, And yet don't be don't be limited by the opportunities you see on job boards, right? You can create your own opportunities. You can cold e-mail companies, you can like apply to grant funders with just some random grant proposal you put together because it fascinates you deeply. You can call up hiring managers and stuff.

[1:15:57] Nathan Labenz: One of the I mean, when you describe the range, it is a pretty broad distribution. And that tells me that you trust your own ability to discern who's going to be good more than you trust other outside signals. So maybe tell us what you are actually using to assess people. And this could be translated into practical advice in terms of how does somebody make an application stand out? But what are you looking for that allows you to take somebody in their 50s or somebody who's 18 and feel like you can read what really matters regardless of what their background is?

[1:16:36] Ryan Kidd: Yeah, so I mean, we do some of the standard stuff that you would see at other tech companies, right? Like we have CV review, we have some code signal tests, so brush up on your coding skills and so on. And they do detect AI use. We are, of course, considering ways to allow for tests that include AI use, but these are obviously harder, right? They're harder to design, they're harder to check and so on. But yeah, that's part of our general application. Now that's for some streams. I'll say this, mentor selection, sorry, scholar selection, like we're trying very hard to provide something like a service to mentors. So if a mentor says to us, I don't want to do CV review, I don't want to do code signal, I just have this selection problem that I want fellows to, applicants to work on, and then like, I want you guys to help me evaluate this. Build me like a team of contractors or some automated evaluation process to do first pass screening, and then we'll go from there. That's our favorite kind of evaluation in some way, because we know that it's as close as possible to the actual job, the actual research as we can get. Of course, this is typically, in Nanda's case, it's like, go away and do a 10-hour mechinterp pseudo-work test and then present your results to me. You can use AI, do whatever, just find something interesting. And this is great, because then we get great results. Some other streams, it's harder to do this. It's harder to administrate. And so we do rely on some proxies that are perhaps less specific than ideal, but I think are no worse than anyone else in the industry is doing. And of course, I think the way you stand out, obviously it's going to depend on the specific mentor, because MATS is very heterogeneous in that respect, right? The best thing to apply to Neil Nandastream is going to be vastly different than applying to Ethan Perez and the Anthropic Megastream. But in general, you want to really understand your basics. about AI safety, so do a blue dog course, right? Because there may be some critical knowledge or a paper that if you haven't read, you don't understand, if you don't understand what deceptive alignment is, that might be really bad for Ethan Perez or Buck Schlegeris' kind of control research, and even applying, getting to the streams. If you don't understand that for InterpStream, probably doesn't matter as much, unless, of course, you're dealing with deception and you're interpreting. So make sure you understand your basics. Make sure that if you're applying to a stream that is empirical heavy, that you can do code signal tests, you can code, including without AI assistance, at least for the time being. It doesn't hurt to apply to other programs as well. Mass is far from the only program out there now. This is not like early days. Like there are so many great research programs out there. Pivotal, ERA, PIBS, Laser Labs, SPAR, ARENA for technical. I think ASTRA is now running again. Yeah, there's tons of great programs out there. And that can really booster your CV. If you have experience in the kind of research that you want to do at Mass already, then so much the better. Consider it like a postdoc opportunity or something, or a post-research opportunity. Build your own independent projects. Yeah, sorry if that's like too much advice to be actionable.

[1:19:44] Nathan Labenz: Yeah, I think it boils down to tangible product is, is like king, right? I mean, and I, I say that always in the AI engineering world as well, if any, you know, and I'm far from the world's leading expert on how to break into that space. But what I always tell people if they ask me is a working demo is kind of the coin of the realm. You know, like it's all. People might be interested in what you have to say, but they really want to see that you can make something work. They want to see it online. It could be a replet or it could even be a collab notebook or something, but you've got to make something that can work. And it sounds like this is a pretty similar worldview. You've got to show that you can get in there, make something happen, as you put it with Neil's track in particular, find something interesting. If you can do that, we might have something to talk about. One thing that jumps out is like maybe not as emphasized as I would have thought is being in command of current research. That's something I think at this point, like really nobody can keep up with all the current research because, you know, that exponential has gotten away from all feeble human minds, I would say, maybe with a few hyperlexics that can still keep up. But I have found like Keeping up with research is a pretty-- feels important to me. It feels like an important part of how I stay conversant with people across a lot of different areas. But obviously, what I'm doing in trying to be conversant with people across a lot of different areas is not the same thing as research. How much emphasis do you think mentors in general put on being on top of the literature, so to speak?

[1:21:34] Ryan Kidd: It varies. So there are some, some of the mentors will ask for questions like, I don't know, what do you think about X concept? Others won't be as interested. Obviously, as you say, right, these costly signals are the most important thing. Like, have you done good research? Do you have a deliverable, like a product? Do you have a strong reference for an important person? That's also key. Do you have, like, have you done your homework in terms of the Blue dot course and other things, right? I think that math selection doesn't currently emphasize breadth of knowledge very much, mostly because mentors don't necessarily want that. And I think that this is maybe a weakness in our process to some extent, if we don't then help people build that breadth. But we do, we have seminar programs. We have tons of opportunities for intermingling between different research streams, which really rapidly builds a breadth of knowledge. And we have like, we used to have like discussion groups and these, these still occur occasionally with like workshops and so on. So I would say like, I really do encourage everyone to do like a basic blue dot course or, or equivalent like AI safety Atlas and case of good courses. as well. But I think that this is not as required for selection, and it's more just to prevent you from entering MATS and then starting to do a research project and realizing, Oh crap, I have no idea where the gaps are. I don't understand how my work fits into anything. How do I get funding after MATS? How do I get a job? Blah, blah, blah. How do I choose a good original research direction? So it's more for your ability to actually deliver within the program, tracking research, and less to do with your ability to get in at the moment. Which is pretty important because maths is just a stepping stone. If you do maths and then you don't produce a great deliverable by the end of maths, sure, it's a great thing on your resume, but it's not going to be enough in many cases because it's such a competitive environment. So yeah, I think that it's really good for people to build a shallow but broad understanding of the literature. So I would recommend don't be checking X constantly for new papers, blah, blah, blah, unless they're in your field. Maybe set up some Google Scholar alerts for Interp, if that's your thing. But more every so often do a periodic deep dive into What are all the cool updates across different fields? You know what I mean? And you can do this by like every year looking at the new Blue Dog course or every month looking at some research roundups or highlights like Z as a newsletter or transformer. And there are other people you can follow on X. That's my main recommendation.

[1:24:00] Nathan Labenz: So your admissions rate, it's super low, right? I don't want to, we want to encourage people to apply, but it is a very selective program. What does the kind of funnel look like in terms of, I don't know if there's intermediate steps that, you know, would make sense to talk about selected. And then I think the good news, though, is if you do get into the program, your success rate on the other end of like getting into the field in a professional W-2 employee status sort of way is really high. You want to run us through those numbers?

[1:24:34] Ryan Kidd: Yeah. So last program, I think we accepted around 5% of people, maybe 4% who applied in our initial like intake form. There was a subsequent process that they had to complete, which is applying to specific mentors and streams. And I think we accepted somewhere around six and a half to 7% of those people. So a bit higher. That maybe is the figure I'd focus on, somewhere around 7%, let's say. Now that's better than people think, right? They hear, they thought it was something like a, I think the Anthropic Fellows program, for example, was like 2% because Anthropic, big name, right? But mass is larger, we have more diversity and more spots and so on. And I think that like, Yeah, I think people should also just treat the application process as a learning experience. In general, we try and make it like some streams are going to be more painful than others. But I think that for the streams like Neil Nando's where you spend 10 hours working on a project, then you have something really cool for your GitHub. That can only help.

[1:25:35] Nathan Labenz: And if you don't like doing that, you're not going to like working with Neil anyway, I would imagine.

[1:25:39] Ryan Kidd: Yeah, definitely, definitely the case. I do think it is, it's unfortunate that there aren't like easy ways to do credit assignment cheaply, right? To find the best people without them spending a bunch of time. But I don't know, like I know that job interviews for top tech companies like OpenAI Anthropic and GDM just the last like three to six months or something, like you just, like you've had so many things to do before you get finally the yes or no. So we definitely aren't that involved. It's a much slower process. And I think that's because the commitment is less on our end. We're not giving people W-2s. MATS is an independent research program. They get grants from a third party. We provide the housing, the office, the mentorship community, but we don't sign people on for any type of employment, which I think is part of the appeal as well. So that's the main statistic there, 7%. At the other end, about 75% of our accepted fellows go on to our extension phase. So our first three-month program, 7% get in, 75% go on for another six months, maybe even 12 months in some cases. And that extension phase is where a lot of great follow-up research happens. And of the people who've done our program over history, we've had 446 fellows in total. not including people who've done training programs that we've helped facilitate, of which there are probably another like 2, 300. Of those 446, 80% have gone on to get permanent jobs in AI safety based on our latest statistics. So that's great. I think, yeah, like I think 98% are employed in some capacity. Now, of those 80%, not all are like W-2s, right? Some of them are independent researchers with grant funding from coefficient giving or LTFF or something, which I think is like a fine situation, right? And then in terms of like the actual field growth, there's some statistics I can share. So it seems like based on Stephen McElise's Less Wrong investigation, that something like the growth rate of the AI safety field is extra 25% per year, which is kind of interesting, right? It does seem to be growing exponentially as far as we can tell. Now, that is a lot less than the growth rate of Matt's applications, which are going up somewhere between 1.4 to 1.7x per year, depending upon how you slice it. And mentor applications might be increasing around 1.5x per year. So there's a big disconnect. And according to Blue dot Impact, I believe that their growth rate is something like 370% per year in terms of applications to their programs. So there's like, yeah, some large disconnect. So a lot of people are applying to Blue dot. That can't go on, you know, at that rate. That's just way too fast. But probably because they've done amazing advertising and marketing stuff. Matt's has only just started to do advertising and marketing. We had the first ever mentor open round of applications launched just recently. And yeah, we sponsored NeurIPS. That was cool. And we sponsored your podcast and several other great, great venues as well. And I think that this is only going to cause probably the application trend to, to continue. I would say, I would guess 1.5 X per year, something like that, which that is a faster growth rate than the current deployed growth rate of the field. As to why I could speculate it's probably just caused by like, like a high bar, a very high bar for a lot of these companies, maybe a deficit of founders as well. And there are plenty of organizations working to remedy that. I know there was this AI assurance technology report from Juniper Ventures about a year ago where they predicted that the size of the market for AI assurance technologies is doubling each year. So there is a lot of opportunity to do stuff that might contribute to AI safety.

[1:29:24] Nathan Labenz: What does the salary distribution look like for the people that are getting jobs, like how much of an alignment tax? If any, do people pay in the salary department?

[1:29:36] Ryan Kidd: I mean, at the Frontier Labs, no tax at all. They're getting paid the same rates if you're leading. Yeah, they're getting staggering amounts of money. I think like a couple of years ago, the going rate for like someone joining off the street was like 370 K. or something. I'm sure it's much higher now, especially with the crazy meta stuff that happened. I would bet mid-level and higher people are making over a million, even on the safety teams of these labs, but I don't have any private data on that. But yeah, if you join as a junior software engineer, don't be surprised if you get somewhere around 350k or something. At nonprofits, obviously, it's lower, right? They can't compete with equity. They don't have any equity, but they also typically have all less funding. You know, coefficient giving's pockets aren't as deep as the collective might of US venture capitalism. And I think there is also something like, there is something like a nonprofit tax. I wouldn't say there's a safety tax. I'll say there's a nonprofit tax, right? Because there are nonprofits that are doing like AI capabilities stuff like Allen Institute for AI. People make a lot of money still. Right. There's like, this is, this is artificial intelligence, right? And Coefficient Giving understands and other funders understand that, you know, you have to pay to play. So you have organizations like Meter that where they are, I believe, offering quite a lot of money for their roles. Upwards of 300K for, for most roles. Like, yeah. probably over a million for some, I would dare say. So there are nonprofits that are really, really trying to compete for talent. They can't offer anything like the Frontier Labs salaries, including equity, but they're trying their best. And I think this is kind of reasonable, but it also is a bit of an insane moment. Matt's salaries are not anywhere near that high. Maybe we're doing the wrong thing. I don't know. Yeah, and I think there are other AI safety nonprofits that have tried very different strategies. Like you have I think the going rate for far AI's research scientists are something like 100, 170K, so significantly lower. They might have actually improved that recently. So there is a wide spectrum here, and it really depends on the compensation policy of the organization. But you will see very well-funded nonprofits offering comparable salaries to at least some junior AI company roles.

[1:31:56] Nathan Labenz: What about in terms of compute? I know you guys have a, in addition to the stipend that folks get as fellows, there's also a compute grant and I believe it's $12,000 worth of compute. I'm interested in what form that takes. Like, is it just a Brex card that you can go spend on compute wherever you want? Or do you guys have like established compute partners that you work with that, you know, serve your fellows well? How often is that enough? Are there times when people find that they need more compute to do what they want to do? And then if they go work at a nonprofit, like how compute rich or poor are the nonprofits?

[1:32:35] Ryan Kidd: Yeah. So I mean, Matt's offers 12K. I'll say budget. I won't say like we don't give people a card that says here's 12K. No. they have to justify their compute spending. They have to have an actual project and proposal that they are going to spend that on that necessitates that. And most people don't spend anywhere near that much, which is good because that's part we haven't, we budgeted as if they could, but we really don't want to spend, just waste money on compute. Basically, no one has compute limitations. Sometimes people rarely have needed more than that. We've considered their proposal and thought hard and reallocated funds as needed. And yeah, I think people aren't limited by compute at MATS in general. I think that the way we do it is pretty good and that we have a bunch of, for our model API calls and all that, we have specific organization accounts that we sign people up, and then we kind of give them a budget and so on and top them up as necessary. We do have our own Matt's cluster as well that we maintain online, so our compute team handles that. But typically people tend to use RunPod and other types of self-service things. It depends on the kind of research. There's some types of research that works well on our cluster, and there's some types of research where even with the benefit our compute team can provide in setting up the cluster and maintaining it, the experiments are so customized and so they need to tinker so many things that we just give them a budget and let them use online providers. It's just better that way. We did have used other providers in the past, but that's their current setup. And we're looking at putting together like a kind of customized Claude code kind of suite as well.

[1:34:16] Nathan Labenz: Hmm. Meaning like building out a bunch of tools or MCP type.

[1:34:22] Ryan Kidd: Yeah. Enhancements? I don't think, not MCP at this, well.

[1:34:27] Nathan Labenz: That's kind of a distinct idea.

[1:34:28] Ryan Kidd: Yeah, yeah. You're right. Yeah, there could be actually-- there's a lot of data in MATS databases. We could probably put together something really useful at some point, but we haven't thought about that.

[1:34:39] Nathan Labenz: Yeah, just the archive of everything that's been tried would be pretty fascinating to do some agentic search through.

[1:34:46] Ryan Kidd: I mean, our new research database is online, matsprogram.org/research. You can see everything there with a Google Scholar. But that's just all the papers that got published. You can also see our Les Shrong blog post under the MATS program tag. Many more research artifacts have been produced than are visible there.

[1:35:01] Nathan Labenz: You mentioned-- you said the word tinker. Is the Thinking Machines API growing in prominence in terms of what people are finding attractive to use?

[1:35:12] Ryan Kidd: Yeah, many people are wanting to use the Thinking Machines API. So we've put together some-- I think we-- yeah, I'll say many organizations like to donate API credits. which is awesome because we can really use that.

[1:35:28] Nathan Labenz: Yeah. Cool. But is there anything else that we should cover in terms of like January 18th? We know that is important as a date to keep in mind. What other facts should we make sure that we touch on?

[1:35:43] Ryan Kidd: NATS is growing. We're always hiring. If you want to work on our team and help grow the next generation of people, if you fancy yourself an amplifier of sorts, you have people skills and research skills, we'd love to hear from you. Go to our website, masterprogram.org/careers. We're taking on mentors as well. There's an application form on our mentor section of our website. And participants, we're going to run three programs next year. This year, rather. Not one, but-- sorry, not two, but three. That's gonna be a summer program, a fall program, and then a winter program starting into next year. And I'm super excited. We also have plenty of other offerings in the work. We're considering a one to two year residency program for senior researchers as well. And yeah, more on that to come.

[1:36:29] Nathan Labenz: Yeah, cool. Are you taking any connectors? Is there, if I am a connector type or wanna become one, Is Matt a way to find my way there or not really?

[1:36:45] Ryan Kidd: Many have. I would call Jesse Hoogland, one such person. I would call Paul Rikers, so Timaeus and Simplex. I would, I'd say Marius Halpan as well, to some extent, with his deception evals work. Yeah, like, and I don't, probably dozens of people. I'm just like sharing some of the more, the names that come more easily to mind, but just many, many people have come through Matt's, we're super open to individuals who have this kind of archetype. And note a connector, right? They have empirical skills, they have theoretical skills. So they could probably succeed in a bunch of different ways, right? But they're uniquely spec'd out to connect those two things. Now there are some mentors and projects that are much more suited to this kind of thing than others. People like Richard Ngo, historically Evan Hubinger. I think actually Evan Hubinger has been like probably the most dominant connector driving force at MATS over our time, but he's not a mentor in the next program, unfortunately. It doesn't have time. But yeah, there's many different opportunities that match for this kind of thing. I think even in some of the interp streams as well, it's very possible to enter an interpretability stream and bring it with it like some model of the kind of theory-based interpretability mechanism or strategy that you want to pursue and then see that executed on. That's happened several times.

[1:38:04] Nathan Labenz: One of the things that I took note of in the blog post from 18 months or so ago was you had made a comment that funders basically don't want to, or they're much more inclined to support the growth of organizations that they sort of see as legible, that have like research directions that they sort of feel are somewhat established or that they can wrap their heads around. And they're much more reluctant to fund like totally new conceptual directions. And that seems like it exists in contrast with like the AE Studio survey, where they basically found that the field as a whole seems to think that like, we don't have all the ideas that we need and, you know, that like more kind of far out ideas should be tried, which of course led to their neglected approaches approach. What do you make of that? Is there stuff that we can do or is it, you know, is there is it a different organization's job to figure out how to fill that gap. Because I do feel like I want some more, and I love some of the AE Studio stuff, including self-other overlap. I always come back to that as an example of something that's just quite off the map of what most people are doing. When I think of AI control and what Buck and the Redwood Research team are doing, I find that stuff fascinating. And one of the things that kind of impresses me most is that they are willing to work on something that in some ways is so depressing. They're like, we're going to try to figure out how to work with AIs, even assuming they're out to get us. And I'm like, yikes. I don't know that I would be able to sustain the positive attitude enough to do that if I was working from that premise. I do feel like there's a relative dearth of things that are more inspiring. Here, I think maybe of AI Studio, but also Softmax. Obviously, people have a lot of different opinions on, are these things ever going to work or not? I wonder what your take is on just kind of the overall mix. It seems like a lot of things are kind of more toward patch the holes, keep the AI down, tempt it, you know, see if it'll take the temptation, and then patch it, you know, if it takes the temptation. And there's not nearly as much that is sort of a, a more kind of colorful, positive vision for the future. And I wish there was, but maybe that's just not happening because The ideas are just too hard to come by. Maybe it's not happening because the funders aren't bold enough. What's your take on how we can get, if we should be trying to get more of that stuff? And if you think we should, how might we go about it?

[1:40:36] Ryan Kidd: I have many takes here. So obviously I advocate a portfolio. And Matt's has historically sponsored a bunch of projects. Self-other overlap, that project came out of maths alum, I'll just say Mark Carlineau, I might have messed up his name, was like the originator of that project at AA Studios. And I believe Cameron Berg is running, another maths 1.0 alum with me, is running some of their more neuroscience-inspired approaches as well. So ASuers is great. I love what they're doing. I think that the survey they did have left wrong is just like probably not representative of the AI safety research field as on the whole. But then it might be. Even so, I think we obviously need more ideas because more ideas are good, right? More bets are good. More shots on goal are good. Now, I would not advocate a person who is like a very strong iterator to drop that and try and become think of some new paradigm that is I think that would be strictly counterproductive. on the margin because I think we do have some very strong central research bets that we need more people pursuing because they will yield demonstrable results. But if everyone did that, this would be bad because you need to have your portfolio. Maybe these approaches fail. Maybe they need other pieces to work. Many AI safety research agendas are kind of contingent on other things going right or other people working on other stuff. It's like any kind of research field. You need to have everyone advancing the frontier. So I think High safety has historically gone really kind of argmaxy on different agendas, which is bad. Portfolio approach is much better. Don't rule things out as possible directions. Just shift and reallocate resources to them. To their credit, Coefficient Giving have done an amazing job particularly recently, at supporting a bunch of different novel research bets. And they've also funded PIBs, or Principles of Intelligence, a program that is trying very hard to pursue sort of moonshotty interpretability and agency understanding projects. So they're great. Check them out. I think that more ideas would be good. I think that the kind of person who should be pursuing that typically is going to look something like someone who is already a domain expert in some other area. You are occasionally going to have your Buckschlegeresses, your Evan Hubingers, right, who come along with no PhD, but spent years at MIRI, you know, incubating in that kind of, that deep AI safety, that rich safety experience, and then come out with amazing stuff like, risk and learner optimization and AI control and all that stuff. But short of, you know, having access to that type of community and that type of research experience, I think most of the prominent connectors, like your Alex Turners and so on, have come, have spent a lot of time in research science PhDs, also on Vestrong, of course, and like, you know, incubating in that environment as well. So I think MATS is a great way for that kind of person to develop and to spawn more research ideas. In fact, I've seen To shout out Alex Turner. He has come up with some amazing research ideas over his time at MATS. And I think we've been very fortunate to support him. Things like gradient routing, and also Alex Cloud, another MATS mentor, and just plenty of other things, like activation engineering and steering. He was one of the people involved in that. So I think that senior experienced researchers are going to be probably, like most things, the main drivers of new ideas. and grant funding that lets them pursue whatever their research taste dictates is great. And programs like Matt's that let them stalk their research agendas are also great. I also think bounty programs could work as well, but I would hazard against people putting all their eggs in the basket of we need to have a bunch of new ideas because the central idea is not working. I don't think that's true. I think the central ideas are still our actual best bets.

[1:44:20] Nathan Labenz: Yeah, okay, makes sense. Do you want to shout out any other Either just organizations that Matt's fellows have gone to or even started that you think are kind of underappreciated. This could be sort of assignment editing for me for future episodes, but also just things that you think people should be paying more attention to than they are.

[1:44:39] Ryan Kidd: Yeah, I mean, there's tons. You could see all the organizations listed on our website. There's so many amazing people there. I guess I could shout out, I mean, it's hard to play favorites because Matt's is like, we've worked with so many people and we're trying to be very broad, but Yeah, I guess like in terms of nonprofits specifically, because maybe they don't get as much attention, obviously Redwood and Meter and Randcast, Apollo Research, FAR, Goodfire, TruthfullyAI, LawZero, Miri, plenty of others. I love these organizations. We need more, frankly, nonprofit research organizations. And if you think you could found one, give it a shot. Obviously, you need to have a ton of research experience under your belt and very credible references and so on. Yeah, I don't know. It's really hard to play favorites, Nathan. Yeah, so many great researchers.

[1:45:29] Nathan Labenz: You're fueled for good options for organizations to shout out. So it's a testament to how many fellows have already gone on to do impressive work. Great job by you guys in driving this and growing it over the last few years. And people should definitely apply if they want to be a fellow. January 18th is the deadline. It's time to get into it if you want to make sure your application stands out. Anything else we should touch on before we break?

[1:45:57] Ryan Kidd: No, I just really appreciate this experience. Thank you so much for inviting me to talk.

[1:46:01] Nathan Labenz: My pleasure. Thank you for doing it, and keep up the great work. Ryan Kidd, co-executive director at MATS, thank you for being part of the cognitive revolution.

[1:46:10] Ryan Kidd: Thanks, Nathan.


Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.