Hello, and welcome back to the Cognitive Revolution!
The Cognitive Revolution is brought to you in part by Granola. If you're a regular listener, have heard me describe the "Blind Spot Finder" Recipe I'm using to look back at recent calls and help me identify angles & issues I might be neglecting, but it's also worth talking how Granola can help raise your team's level of execution by supporting follow-through on a day-to-day basis. This week, for example, I had several working sessions with teammates, and I committed to a number of things. In the past, there's a good chance I'd have forgotten at least a couple of the things I said I'd do, but with Granola, I can easily run a TODO finder Recipe and get a comprehensive list of everything I owe my teammates. This is the sort of bread & butter use case that has driven Granola's growth and inspired investment from execution-obsessed CEOs including past guests Guillermo Rauch of Vercel and Amjad Masad of Replit. See the link in our show notes to try my blind-spot finder Recipe and explore all the ways that Granola can make your raw meeting notes awesome.
Now, today my guest is Geoffrey Irving, a pioneering machine learning researcher who's co-authored seminal papers with a who's who of giants in the field, and who is now Chief Scientist at the UK AI Security Institute, which is, in all likelihood, the most situationally aware government entity in the world today.
With roughly 100 technical experts on staff, and a mandate that includes:
- threat modeling,
- pre-release frontier model evaluation for dangerous capabilities spanning biosecurity, cybersecurity, and loss of control,
- advising the government on strategies to reduce catastrophic risk,
- funding independent frontier research,
- and engaging in global diplomacy…
Geoffrey has one of the broadest portfolios and most commanding views of the AI landscape.
And while he's optimistic about our ability, in the fullness of time, to solve the major open problems in AI safety, for today, without a hint of hype, he paints a genuinely alarming picture.
Our theoretical understanding of machine learning is nascent. Nobody, he argues, should be particularly confident in their mental models of how AI will go.
Models already outperform a majority of experts on a great many security-related tasks, and there's no good reason to expect their progress to stall.
RL is working well beyond strictly verifiable tasks, and jaggedness matters less when even the models' weak spots are as good or better than the best humans.
The many increasingly sophisticated bad behaviors we've seen over the last 18 months are broadly all different versions of reward hacking, a problem for which we lack theoretical or practical solutions.
We likely won't get many 9s of reliability from current safety techniques, and there's some reason to expect they could all fail at the same time, for the same basic reasons.
It is getting harder to jailbreak models, but the AISI Red Team has never failed to do so. Eval awareness is an open and growing problem.
Voluntary cooperation between frontier model developers and the AISI is working well, but not everyone is participating.
The AISI is seeking to fund theoretical research in areas like information theory, complexity theory, and game theory that might produce stronger guarantees, but these fields, like most of rest the world, are just beginning to take AI seriously at all.
Geoffrey is an intellectual powerhouse, but I came away from this conversation just as impressed with the UK AISI as a whole. This is an organization staffed with top notch talent, that has its finger on the pulse of industry development, and is speaking very accurately and clearly about AI's trajectory and how many major questions remain unanswered, even as frontier model company CEOs tell us that they are less than 3 years from creating expert-level AI machine learning researchers.
With that, I hope you are focused and motivated by this conversation about the AI state of play, with Geoffrey Irving, Chief Scientist at the UK AI Security Institute.
Watch now!
Thank you for being part of The Cognitive Revolution,
Nathan Labenz