Intelligence with Everyone: RL @ MiniMax, with Olive Song, from AIE NYC & Inference by Turing Post

Hello, and welcome back to the Cognitive Revolution!

The presenting sponsor of today's episode is Granola, the AI notepad that helps you get the doing done.  Whether it's identifying TODO items after a call, turning a brainstorming session into a product spec, or looking back at multiple calls to identify cultural trends at your company, Granola takes your raw meeting notes and makes them awesome.

Right now, Granola is featuring AI "Recipes"  from AI thought leaders, including several past guests of this show.  My own contribution is a "Blindspot Finder" Recipe that looks back at recent conversations and attempts to identify things I'm totally missing.  This was immediately useful in the context of contingency planning for my son's cancer treatment, and the more data Granola collects as I continue using it, the more valuable it becomes for suggesting AI topics areas that I really ought to explore.  See the link in our show notes to try my Blind Spot finder Recipe, and experience for yourself how Granola puts your meetings to work. 

Today, I'm excited to share a special combined crossover episode featuring Olive Song, a Senior Researcher specializing in reinforcement learning and model evaluation at the Chinese AI company MiniMax, creators of the M series of models, the most recent of which, M2.5, currently tops the OpenRouter Usage Leaderboard.

To give you the most complete picture possible, we're combining two sources: first, a presentation Olive recently gave at the AI Engineer Conference in New York, where she had previously lived for 6 years, and second, an interview with Ksenia Se from her podcast Inference by Turing Post

Together, they provide an excellent overview of MiniMax's goals as a company, the capabilities they prioritize in their models, the techniques they're using to get there, and the day-to-day ups and downs of training frontier LLMs.

Highlights include:

  • How Minimax's strategy of building both models and user-facing applications creates tight feedback loops that enable their cross-functional research and engineering teams to identify and address model weaknesses as quickly as possible;

  • An overview of how "interleaved thinking" – which allows the model to take an action, get feedback from the environment, and pause to "think" again before continuing – improves performance on long-horizon agentic tasks;

  • A description of the "perturbation pipeline" they use to systematically vary the model's training environment in order to encourage robust generalization;

  • Olive's perspective on the constant battle she & teammates are waging against reward hacking;

  • A window into the tedious debugging that's sometimes required to diagnose training issues, and how they realized that they needed to run RL training at FP32 precision;

  • And how the team at MiniMax is using AI agents to keep up with the daily flood of AI news.

While Olive recognizes that MiniMax's models, like all open source models in today's world,  can't quite match the performance of top American models, I think there is still a lot of value in the details she shares about their approach to Reinforcement Learning and how they structure their team and work.

And I always appreciate the opportunity to hear directly from Chinese AI researchers, who, just like their American counterparts, are figuring things out step by step as they go, even as major questions about issues including the governance of increasingly powerful open-source models, remain unanswered.  

With that, I want to thank Swyx, the creator of the AI Engineer event series, which I absolutely recommend attending if you can, and Ksenia, the creator of Turing Post, which has what I find to be some of the very best topic selection of any AI newsletter, for allowing me to create and post this combined episode, and I hope you enjoy this window into the development of some of the best open-weight models in the world, with Olive Song of MiniMax.

Watch now!

Thank you for being part of The Cognitive Revolution,
Nathan Labenz

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.