Success without Dignity? Nathan finds Hope Amidst Chaos, from The Intelligence Horizon Podcast

Hello, and welcome back to the Cognitive Revolution!

Today I'm sharing a special cross-post from my recent appearance on The Intelligence Horizon podcast, with hosts Owen Zhang and Will Sanok Dufallo.

Owen and Will will soon be graduating from Yale College, and as you'll hear, they've clearly spent much of their senior year thinking deeply about the current state of AI, where we're headed, and what it means for all of us, and I was really impressed not only with the quality of their questions, but their ability to challenge me with followups that effectively steelman'd the most relevant counterarguments.

We start with the fact that while AI timelines have compressed dramatically over the last 5 years, genuine experts continue to radically disagree on critical questions. Having established what I hope is appropriate epistemic humility, I then go on to call it how I see it.

In short, the singularity is near. Interpretability science proves that AIs are developing increasingly sophisticated world models, and with RL scaling now clearly working, AIs are no longer simply imitating humans, and likely won't be limited by what we know for much longer.

The potential upside of this is, of course, incredible. The value I've got just from using AI to navigate what humans have discovered about how cancer works, and how to treat it, has been invaluable – and the prospect that we might cure the majority of human diseases in the next decade is obviously extremely exciting.

That said, the risks are very real, and will remain serious for as long as we lack a solid understanding of how AIs work internally and why they do what they do.

My p(doom) remains somewhere in the 10-90% range.

And yet, at the same time, I've become at least a bit more optimistic that we might actually build robustly good AIs, because scaling laws at least seem to imply that Powerful AIs can only be created with massive resources, the 3 companies competing at the frontier today are at least reasonably responsible actors, and our best alignment techniques are working better than I expected. Given these fundamentals, it seems at least plausible that a defense-in-depth strategy that combines techniques like Goodfire's intentional design, Redwood's AI control work, improved cybersecurity through formal verification of software, and various forms of pandemic preparedness, could together be enough to keep society on the rails.

We touch on a number of other topics as well, including the US-China rivalry, and why, especially in the context of the Department of War's recent attack on Anthropic, which has us looking more and more like China all the time, I would rather bet on figuring out a way to cooperate with our fellow humans than bet everything on AI researchers' ability to steer AI advances in a way that will ultimately work for humans.

I appreciate Owen and Will for allowing me to cross-post this conversation, and I definitely encourage you to subscribe to The Intelligence Horizon – their recent conversation with former OpenAI researcher Zoë Hitzig covered the evolving ways that people are using ChatGPT, variations on Universal Basic Income, AI governance models that emphasize a decision-making process over specific principles – and why she believes such structures will probably have to come from outside frontier companies, and plenty more.

For now, I hope you enjoy my conversation with Owen Zhang and Will Sanok Dufallo, from The Intelligence Horizon.

Watch now!

Thank you for being part of The Cognitive Revolution,
Nathan Labenz