Nested Learning: Ali Behrouz on the Quest for Continual Learning & Illusion of AI Architectures

Hello, and welcome back to the Cognitive Revolution!

Today I am excited to share a conversation with Ali Behrouz, grad student at Cornell, researcher at Google, and author of "Nested Learning."

This episode was recorded a few months back, and while I normally believe that "AI content doesn't age well", this conversation with Ali is an exception. His work is some of the most inspired and potentially transformative that I've seen in the quest for new machine learning architectures that are capable of genuine continual learning.

This, of course, is one of the most important capability advances on the horizon – arguably it's the main gap between today's models and a digital AGI that would be capable of joining and contributing to human teams just as humans do – and Ali is advancing the frontier with an approach that is biologically inspired and technically elegant.

His blockbuster paper, Nested Learning, which has been touted as a harbinger of a possible paradigm shift by no less than Jeff Dean, develops a simple strategy that allows models to rapidly adapt to their current context on an ongoing basis, while preserving core knowledge, by updating different parts of the system at different frequencies – much like humans manage memory on multiple timescales, from working to long-term memory.

And his latest work, "LANGUAGE MODELS NEED SLEEP: LEARNING TO SELF MODIFY AND CONSOLIDATE MEMORIES" – which I actually heard about for the first time during this recording, and which has finally become fully public – takes inspiration from how humans consolidate memories and learn from dreams while sleeping, introducing an offline mode in which models transfer new knowledge from their high-frequency-update layers to their more slowly-evolving layers via distillation, and also learn new abstractions and connections between concepts by generating and training on synthetic data derived from their recent experiences.

In addition to the details of these architectures – which, like so many AI innovations, I find both extremely exciting and a bit scary – we also discuss:

How scaling for performance may shift from stacking more layers to nesting more frequency update rates;
How Ali understands all components of ML systems as forms of associative memory that compress a given context flow, why this leads him to call Deep Learning Architectures an Illusion, and how he's operationalized this conceptual insight by developing "expressive optimizers" that learn update rules and are capable of outperforming both Adam and Muon;
How the attention mechanism can be understood as an infinite-frequency-update module, and why Ail expects attention layers to remain fixtures of AI systems indefinitely;
The empirical results showing that Ali's new architectures compete effectively with Transformers on standard measures while also outperforming on hard tasks such as effectively recalling information from up to 10M tokens of context, and also learning to translate multiple previously unseen languages at the same time;
Why Ali sees continual learning as both an opportunity and a huge risk for privacy and alignment, how human-AI relationships might evolve, and why Ali is cautiously optimistic that models that evolve over time based on our interactions could both serve our individual needs more effectively and also lead to a more diverse and hopefully stable AI ecosystem overall.

The bottom line, for me, is that for all the debate and speculation about whether or not current architectures can scale to AGI and beyond, there's a very good chance that conceptual breakthroughs will render that question moot before we manage to answer it.

Transformers have changed the world, but they aren't the end of history, and as tough as it is to keep up with AI developments, anyone who wants to get a handle on where things go from here can't afford blind spots when it comes to new research directions like Ali's.

And now, without further ado, I hope you enjoy this deep dive preview of AI systems that learn, on an ongoing basis, in increasingly human-like ways, with the brilliant Ali Behrouz.

Watch now!

Thank you for being part of The Cognitive Revolution,
Nathan Labenz

Alignment with Awakening: Davidad on Moral Realism, AI Wisdom, & why His p(Doom) is Down to 5%

Alignment with Awakening: Davidad on Moral Realism, AI Wisdom, & why His p(Doom) is Down to 5%

AI:AM Highlights: Exploring the J-Space, AI Superforecasters, SambaNova's Chips, & LTX Video Gen

Alignment with Awakening: Davidad on Moral Realism, AI Wisdom, & why His p(Doom) is Down to 5%

Alignment with Awakening: Davidad on Moral Realism, AI Wisdom, & why His p(Doom) is Down to 5%

AI:AM Highlights: Exploring the J-Space, AI Superforecasters, SambaNova's Chips, & LTX Video Gen

Nested Learning: Ali Behrouz on the Quest for Continual Learning & Illusion of AI Architectures

Read next

Alignment with Awakening: Davidad on Moral Realism, AI Wisdom, & why His p(Doom) is Down to 5%

Intelligence on the Edge: Liquid AI's Ramin Hasani on the Search for Device-Native Foundation Models

1000 Designs a Day: Neural Concept's Thomas von Tschammer on AI-Native Engineering

The God We Deserve: Nonzero's Robert Wright on AI as Humanity's Ultimate Test