Training an AI Scientist with Feedback from Reality, w- Liam Fedus & Ekin Dogus Cubuk (from a16z)

Training an AI Scientist with Feedback from Reality, w- Liam Fedus & Ekin Dogus Cubuk (from a16z)

This special crosspost features a16z General Partner Anjney Midha in conversation with Periodic Labs co-founders Liam Fedus (OpenAI, ChatGPT) and Ekin Doğuş Çubuk (Google DeepMind), who recently secured a $300M investment.


Watch Episode Here


Read Episode Description

This special crosspost features a16z General Partner Anjney Midha in conversation with Periodic Labs co-founders Liam Fedus (OpenAI, ChatGPT) and Ekin Doğuş Çubuk (Google DeepMind), who recently secured a $300M investment. They discuss their ambitious mission to build automated physical laboratories that connect AI-generated hypotheses directly to real-world experiments, aiming to teach AI scientific intuition and accelerate discovery, with a north star of finding a high-temperature superconductor. The conversation delves into the human and organizational strategies for building such an "AI for Science" company, emphasizing curiosity, cross-disciplinary learning, and a culture that prioritizes mission alignment over traditional credentials. Listeners will gain insights into how Periodic Labs plans to commercialize their advancements and foster broader scientific contributions, driving a future where AI deeply understands the physical world.

Read the full transcript here: https://storage.aipodcast.ing/...

Sponsors:
Linear: Linear is the system for modern product development. Nearly every AI company you've heard of is using Linear to build products. Get 6 months of Linear Business for free at: https://linear.app/tcr

AGNTCY: AGNTCY is dropping code, specs, and services. Visit AGNTCY.org: https://agntcy.org/?utm_campai... Visit Outshift Internet of Agents https://outshift.cisco.com/the...

Claude: Claude is the AI collaborator that understands your entire workflow and thinks with you to tackle complex problems like coding and business strategy. Sign up and get 50% off your first 3 months of Claude Pro at https://claude.ai/tcr

Shopify: Shopify powers millions of businesses worldwide, handling 10% of U.S. e-commerce. With hundreds of templates, AI tools for product descriptions, and seamless marketing campaign creation, it's like having a design studio and marketing team in one. Start your $1/month trial today at https://shopify.com/cognitive


PRODUCED BY:
https://aipodcast.ing

CHAPTERS:
(00:00) About the Episode
(03:55) Advancing Science With AI
(05:01) Founders' Origin Story
(07:48) Nature as Reward Function
(09:48) Training ChatGPT vs. Physics
(12:27) The Quantum Mechanics Lab
(16:31) Measuring Progress and Impact (Part 1)
(19:09) Sponsors: Linear | AGNTCY
(21:42) Measuring Progress and Impact (Part 2)
(21:42) Designing the Expert Team
(24:20) Scaling Laws and Distribution
(28:56) Superconductivity as North Star
(32:41) The Commercial Path (Part 1)
(32:46) Sponsors: Claude | Shopify
(36:48) The Commercial Path (Part 2)
(40:27) Uniting Diverse Teams
(47:15) Enterprise AI Strategy
(51:38) The Role of Mid-Training
(55:59) Collaborating with Academia
(01:00:27) Who Should Join Periodic
(01:02:29) Outro

SOCIAL LINKS:
Website: https://www.cognitiverevolutio...
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://linkedin.com/in/nathan...
Youtube: https://youtube.com/@Cognitive...
Apple: https://podcasts.apple.com/de/...
Spotify: https://open.spotify.com/show/...


Full Transcript

Transcript

Nathan Labenz (00:00)

Hello, and welcome back to the Cognitive Revolution. Today, I'm pleased to share a special crosspost from the a16z podcast featuring a16z general partner, Anjney Midha, who also recently joined me on the Cognitive Revolution to discuss Sovereign AI. Today in conversation with Liam Fedus, former VP of post-training research and co-creator of ChatGPT at OpenAI, and Ekin Dogus Cubuk, former head of material science and chemistry research at Google DeepMind. Together, they've co-founded Periodic Labs and just announced a $300 million seed investment led by Andreessen Horowitz.

Before diving in, a quick note. While Turpentine was recently acquired by a16z, my editorial independence remains unchanged, and I'm sharing this episode simply because I think it does offer a really valuable perspective on the future of AI-powered science.

Regular listeners will no doubt notice some overlap between this conversation and our recent episode with Radical AI. Both companies believe that there simply isn't enough high-quality experimental data in the existing scientific literature to train foundation models for physics and chemistry. And both have raised serious capital to build automated physical laboratories that are meant to connect AI-generated hypotheses directly to real-world experiments using feedback from physical reality as the reinforcement learning signal, with the goal of teaching AI models a form of scientific intuition and thereby accelerating scientific progress itself.

Of course, there are still many possible ways to focus such an ambitious project. And while Radical AI has recently announced a contract with the US Air Force to develop high-entropy alloys for use in hypersonic aviation, Periodic Labs has set the goal of discovering a high-temperature superconductor as their North Star, with the expectation that to get there, they'll need to achieve countless sub-goals along the way, including autonomous synthesis and autonomous characterization.

Importantly, while the science and macro strategies are similar, the conversations are actually quite different. Whereas I tend to explore the technical details in arguably tedious depth, Anjney focuses much more on the human and organizational dimensions of building an AI for science company. As you'll hear, because no human comes close to holding all of the scientific knowledge and intuition that Periodic Labs hopes to train into their AI systems, they prioritize people with intense curiosity and mission alignment, and they don't require advanced degrees. They take pride in their "no stupid questions" culture, and they host weekly teaching sessions in which ML researchers, physicists, and chemists can all learn from one another. And recognizing that even $300 million won't be enough to achieve their ultimate goals, and that even a wildly successful company is only one part of the broader scientific ecosystem, they have thoughtful plans to commercialize their progress in the form of an intelligence layer for advanced manufacturing companies while also starting a grant program even at this early stage meant to elicit key contributions from academia.

Overall, I love the vision and ambition on display here, and I admire the conviction with which a16z and others are backing it. And while this doesn't come up in the episode, I've long believed that long-term AI safety might best be achieved by creating domain-specific superintelligences, which would mean that the AIs that advance fundamental science don't need to have advanced theory of mind or persuasion skills. And in any case, as much fun as I'm having playing around with Sora 2, it does seem quite clear that a future of truly radical abundance requires AI systems that go beyond the digital world and iterate directly against nature's own ground truth.

With that, I hope you enjoy this conversation about building an AI research company meant to develop systems that autonomously explore and deeply understand the physical world, from the a16z podcast with host Anjney Midha and Liam Fedus and Ekin Dogus Cubuk, founders of Periodic Labs.

Liam Fedus (03:55)

Ultimately, science is driven against experiment in the real world. And so that's what we're doing with Periodic Labs. We're taking these precursor technologies, and we're saying, okay, if you care about advancing science, we need to have experiment in the loop.

Anjney Midha (04:09)

The applications of building an AI physicist, for lack of a better word, that can design the real world are so broad. You can apply them to advanced manufacturing. You can apply them to material science, to chemistry. Any process where there's R&D with the physical world required, it seems like will benefit from breakthroughs that Periodic is working on.

Liam Fedus (04:28)

For example, if we could find a 200 Kelvin superconductor, even before we make any product with it, to be able to see such quantum effects at such high temperatures, I think would be such an update to people's view of how they see the universe.

Anjney Midha (04:44)

So Liam, you were the co-creator of ChatGPT. Doge, you were running some of the physics teams at DeepMind. Let's talk about how you guys met, and what was the moment where you realized that you guys had to leave both of those labs to start Periodic?

Liam Fedus (05:01)

I believe we met eight years ago at Google Brain, flipping over a large tire.

Anjney Midha (05:06)

You gotta give us more on that story now.

Liam Fedus (05:10)

So Google Rails was one of the gyms at Google at the Google facilities, and I think that's where Doge and I met, and there was just this massive tire that a single person basically can't flip by themselves. And so Doge was trying to flip it, and he pulled me over. He's like, "Oh, I think the two of us could do it."

Anjney Midha (05:31)

And why were you trying to flip this tire?

Ekin Dogus Cubuk (05:38)

I tried doing it. I couldn't do it. And then I was like, who's the strongest person I can find? And it was either Barret or Liam. And I found Liam, and it worked. We just flipped it.

Anjney Midha (05:48)

And was that the moment where you guys both realized you had physics backgrounds? How did that happen? How did you go from flipping tires to flipping experiments?

Ekin Dogus Cubuk (06:00)

I don't know if Liam remembers this, but we would catch up over the years, and we would often end up talking about quantum mechanics or superconductivity. This was very common, but I never thought that we'd end up working on physics together. So Liam was working on LLMs and they were going really well. And I was not using LLMs, but I was noticing that LLMs are becoming more and more impactful in my work.

So one way it was becoming impactful is when I was trying to remember some things about chemistry or physics, I could just talk to the chatbot and actually learn a lot of stuff I forgot. Another way was, of course, coding. We were writing simulations, and the LLM was so helpful in writing these simulations for us. So then the question was, can we use LLMs more as a first-class citizen in the physics research?

Liam Fedus (06:44)

And I think leading up to this decision to leave, Doge and I were just connecting and talking about these different tech trees. We're looking at the improvements on language models, on reasoning. We're seeing what high-compute reinforcement learning can do. And on the material science side, we're seeing scaling laws within physics, within chemistry, both with respect to simulations, with respect to experiment, and it's like the same kind of principles at play in ML.

And I think to both of us and to a lot of people in the field, the goal of this technology is to accelerate science, accelerate physical R&D. Chatbots was a great milestone along the way, but we really want to see technology out in the world. And we felt like this was just the right place to begin. Physics is very verifiable. It's a great reward function, fairly fast iteration loop. You have simulators for large classes of physical systems. And we felt like in order to create this AI scientist, this is the beginning of this path. So we built that conviction and decided to found Periodic.

Anjney Midha (07:49)

Well, let's take a second to talk about what Periodic is and what does it do.

Ekin Dogus Cubuk (07:52)

So Periodic Labs is a frontier AI research lab that's trying to use LLMs to advance physics and chemistry. We feel like having experiment in the loop tightly coupled with simulations and LLMs is extremely important. So we're building up a lab that will generate high-throughput, high-quality data, and we will use LLMs and simulations in conjunction with the experiments to try to iterate. Science, by its nature, is an iterative direction, and we feel like LLMs, all these tools that are available to humans, can do a great job in accelerating physical R&D.

Liam Fedus (08:33)

I'd say the objective is let's replace the reward function from math graders and code graders that we're using today. So math graders, to give an example, you have a prompt: "What is 2 plus 2?" You know the ground truth is 4. You can put a lot of optimization pressure against problems like that that are programmatically checkable.

And what we're doing by having the lab is we create a physical reward function. That becomes the basis on which we're optimizing against. And so if a simulator has some deficiencies or some issues, we always error correct because for us, the ground truth is the experiment. Nature is our RL environment in our setting.

Anjney Midha (09:21)

Let's just take a second for folks who might not be familiar to explain what you guys mean by a lab that will verify RL in the real world. Can you talk a little bit about how experiments work? How are AI models trained today? And how are those different from how they're going to be trained and developed and post-trained and deployed at Periodic?

Liam Fedus (09:44)

It might be helpful to talk about how we created ChatGPT. So ChatGPT originally—the technology evolved very rapidly over the last few years. When we were first creating it, it was a very standard RLHF pipeline. So you have a pretrained model, and it's sort of like this raw substrate. And what you're trying to do is take this autocompletion model and turn it into something useful.

The way we did it at that point was we would have supervised data. So given some input, we would say this is a desired output. So if we're trying to get it to act as an assistant, we create some tuples like that. Then you run reinforcement learning, but now you're learning against a reward function that's trained against human preferences. So humans will say, "Well, given this input, I would prefer completion A to completion B," and you do that over and over again, and you can create a reward function that can then be optimized against. That is sort of the basis of how we created ChatGPT.

But then there's a huge gap between the original model and what we have today. And I think part of that is reasoning, but also part of that is just much better, more precise reward functions. So the reward functions that we were using originally couldn't determine whether you were mathematically correct or not. So early versions of ChatGPT were mathematically not particularly strong, and it sort of results from what you optimize against. The reward function basically encoded "be a friendly assistant, try to help people get to their thing," but it had no sense of "is this mathematically correct or not? Is this code valid or not?" And we made huge advances over the correctness of our reward functions.

But this is all digital. We're creating tasks based on the Internet, textbooks, papers, and this is great. This lays the foundation, but ultimately, science is driven against experiment in the real world. And so that's what we're doing with Periodic Labs. We're taking these precursor technologies, and we're saying, okay, if you care about advancing science, we need to have experiment in the loop, and that becomes our reward function for our agents.

So as Doge was saying, our agents are doing the same type of things you would use for coding or to help answer a query. But now instead of just giving tools like, "Here's Python. Here's a browser," now we have tools like quantum mechanics, so simulate different systems. But ultimately, we're going to a lab, and then that becomes the basis of what the system is optimizing against. That's the natural end state of these systems.

Anjney Midha (12:27)

People in AI often say "lab." Often what they're referring to is quite different from what you guys mean by lab. Doge, what's the difference?

Ekin Dogus Cubuk (12:35)

That's right. So as Liam mentioned, so far, the LLMs have gotten really good at logic and math. There's verifiable rewards. What is the next frontier in terms of inquiry after logic and math? I'd say it's physics.

And when you say physics, there are different energy scales. So there's astrophysics studying galaxies, there's fusion, nuclear physics, but then there's the energy scale of physics that's more relevant to our life, and that's the quantum mechanics, the Schrodinger equation. This is where biology happens, chemistry happens, materials happen.

So we felt like our first lab should be basically probing that quantum mechanical energy scale. And for us, that would be physics at the level of solid-state physics, material science, and chemistry. One of the more fundamental ways of making things around us is powder synthesis. So you take powders of existing materials, you mix them and you heat them up to a certain temperature and it becomes a new material.

So this is one of our labs. We're going to have a powder synthesis lab. And it turns out this is one of those methods where robots can do it, very cheap, simple methods. I don't know if you saw this coffee-making robot in the SF Airport. A robot at that level can mix powders and put it in a furnace. And that's a very rich field, so you can actually, using that method, discover new superconductors, magnets, all kinds of materials that are very important for technologies around us.

But at the core of it is just quantum mechanics, and we feel like teaching these LLMs to be foundation models for quantum mechanics will be the next frontier for LLMs.

Anjney Midha (14:07)

Why haven't the models that are currently out in the world and deployed been able to do this?

Ekin Dogus Cubuk (14:14)

Great question. I think, as you mentioned earlier, science is by its nature iterative. Even the smartest humans try many times before they discovered the things they discovered. And I think maybe this is one of the confusing points about LLMs. LLMs can be very smart, but if they're not iterating science, they won't discover science. To be honest, humans won't either. You put a human in a room without any chance to iterate on something, they won't discover anything important.

So we feel like the important thing to teach these LLMs is the method of scientific inquiry. You do simulations, you do theoretical calculations, you do experiments, you get results, and the results are probably incorrect or not what you want at first, but you iterate on it. And we feel like it hasn't been done yet. So this is what we want to do, but we feel like you have to do it with the real physics, not just the simulation. This is why we have our own lab where the LLM will have the opportunity to iterate on its understanding of quantum mechanics.

Fundamentally, machine learning models are good at what you train them to do.

Liam Fedus (15:17)

That's sort of the nature of it. And so if a model is acting badly, you're like, well, did you train it to do that task?

Building on Doge's point, there's sort of an epistemic uncertainty, this reducible uncertainty that you aren't really building or collapsing unless you're actually running an experiment. So for instance, one of the engineers on our team was looking at a reported property of some physical property in the literature, and it spanned many orders of magnitude. So if I train a system on that, these systems aren't magic. The best it can do is replicate that distribution, but it's really no closer to a deeper understanding of the universe, physics, chemistry.

Then another point is it's very uncommon to publish negative results. All of the results are basically positive, and a valid negative result is very valuable. A negative result could be discarded because, well, that was sloppy science. But there are valid negative results, and that's a learning signal. And this is something that our lab will produce as well.

So I think these three things: noisy data, no negative results, and you need the ability to act in order to actually do science, which is an iterative endeavor. Those are the core theses of why we need a lab.

Anjney Midha (16:31)

And what might be the core way to measure if Periodic is progressing against that goal in your guys' minds?

Ekin Dogus Cubuk (16:39)

One simple one is high-temperature superconductivity. What is the highest temperature superconductor we synthesized? Today, the best number for ambient pressure is 135 Kelvin or so. So we'll know very easily if we're doing well if we can go beyond that number. So that's pretty fundamental.

On the more applied side, there's processing of materials and its effect on properties. So we can just measure these properties directly. Let's say it's the ductility, it's the toughness, strength of the material, and as we measure it, the LLM will get a very clear signal. It's hard to hack, unlike these other LLM training techniques. It's really what you see in real life is the signal that's going to the LLM.

Liam Fedus (17:20)

Yeah, effectively: can you design the world around you? You're like, "I need something with this property." Can this system discover and produce that? Both from a fundamental scientific discovery perspective, but also in industry. So someone's working in space or defense or semiconductors and, yeah, we're having these issues. We're trying to achieve this property of this material or this layer. Can the system accelerate the development of those technologies? So it's very grounded. That's how we'll know it's working.

Anjney Midha (17:55)

It feels like the applications of building an AI physicist, for lack of a better word, that can design the real world are so broad. You can apply them to advanced manufacturing. You can apply them to material science, to chemistry, to anything—any process where there's R&D with the physical world required, it seems like will benefit from breakthroughs that Periodic is working on. Why hasn't it been done before? And what is it about this moment in history that makes it the right time to attack this problem?

Liam Fedus (18:27)

Maybe one is difficult. What makes it so difficult? I mean, I think part of it is the team. So in our view, this has been enabled by frontier technology in the last couple of years. And so Doge and I have been so focused on basically putting together this n-of-one team. This group of physicists, chemists, simulation experts, and some of the best machine learning researchers in the world have never been part of one concerted effort. And we feel in order to actually achieve this, you need all these expertise. You need these pillars to do this.

Nathan Labenz (19:04)

Hey, we'll continue our interview in a moment after a word from our sponsors.

[Sponsor reads for Linear and AGNTCY]

Anjney Midha (21:42)

So when you guys went about designing the team, after you left OpenAI and DeepMind, what was the primary heuristic that you used to guide yourself in figuring out who you wanted on the team?

Ekin Dogus Cubuk (21:53)

So in terms of expertise, we wanted to have LLM expertise covered, the experimental expertise and simulation. And for each of these, we wanted to have basically world-class talent. And of course, for each team, there's actually a lot of sub-teams, it's like a fractal. The expertise is very fractal-like.

So for the experimental side, we want to cover solid-state chemistry, solid-state physics, automation, and kind of the more facilities, the more operational aspects of experiments. On the simulation side, there's the more theoretical physics parts, there's the more coding aspects of simulations. And the LLM side, of course, there's mid-training, RL, infra. And for each of these, we try to get basically the best people who have innovated in these sub-pillars.

Liam Fedus (22:44)

The technology that we think is necessary to do it has really just emerged in the last couple of years, and this data isn't on a Reddit forum or something. You need to actually go produce experimental data, simulation data. It's siloed across all of these advanced industries. And many of them, while there's a desire, they may not have knowledge of some of the most recent techniques that's been driving this recent wave in AI.

Anjney Midha (23:09)

There was a moment in time when papers like the GPT-3 paper, for example, that proposed the idea of scaling laws. And then there was a follow-up paper from OpenAI that was called, I think, "Scaling Laws for Generative Modeling," that just showed that as long as you scaled up the amount of compute and data in the right combination, you could very predictably improve the performance of these models. And the theory was that if you just kept doing that ad infinitum, there would be a bunch of emergent capabilities. These models would be able to reason about all kinds of problems out of domain, out of distribution.

Wouldn't that argue—how would you square the circle with that school of thought that, you know, naively, the current pre-training and post-training pipelines at most of the frontier labs won't just eventually crack physics as well? Why is this idea of physical verification so necessary? And is that school of reasoning wrong?

Liam Fedus (24:20)

Excellent question. Scaling laws empirically seem to continue to hold, so that's not in question.

Ekin Dogus Cubuk (24:29)

But I think there's a question of what is this y-axis?

Liam Fedus (24:32)

And that test distribution is very different from what we're talking about. That test distribution, let's say you're pre-training on the Internet, might be a representative set from the Internet, and you'll have these sort of predictable scaling properties. But that's not going to capture that you have a very different set of scaling properties with respect to different distributions.

To make this a little bit more concrete: let's say, hypothetically, we're training a coding model, and we have unit tests to provide some reward signal. So the model writes some PR. We check that the unit tests go from failing to passing, and we say, "This was successful. We're gonna reinforce these things." You might say you start optimizing this, and now the system is becoming ever more capable of writing code for its own development. And you have this acceleration, this kind of takeoff scenario.

Code is one of the most promising areas for this because there's abundant data online. You have this feedback loop where the system itself can begin to improve itself, and it's a very promising technique. And we're all seeing the benefits of advanced coding models, and it's accelerating quickly.

However, that model is not going to then cure cancer. The knowledge simply doesn't exist. It doesn't—you need to optimize against the distribution you care about. So that model, while it's gonna be a very valuable tool as a software engineer, it may help a cancer researcher do their analysis. It simply doesn't have the data, the knowledge, or the expertise iterating against that environment. And I think that's just sort of the fundamental belief we have.

Ekin Dogus Cubuk (26:21)

Actually, Liam and I worked on this a bit when we were looking at the scaling laws for vision models. And this also came up a lot in the CLIP paper from OpenAI. The in-domain generalization and the out-of-domain generalization are monotonically correlated, but it's not linear necessarily.

And so what that means is you can keep improving your model and it will improve as a power law in-domain. And for out-of-domain tasks—by which I mean the things that you're trying to do that's a bit different than what's in your training set—it will also improve as a power law, but the slope of the power law may not be good. So you might need to spend centuries before you get to the goal you want.

We saw this in a paper, for example, we published a paper where we saw that as you increase the size of your training set, the IID performance, the in-domain performance improves as a power law. Out-of-domain performance also improves as a power law, but depending on what the out-of-domain is—how far you are from training distribution—that power law might have such a small slope that it's basically useless.

So this is one of the reasons we feel like the best way to make progress is to make your target as close to your in-domain training set as possible. And the best way of doing this is to basically iterate on changing your training set to be more like what you want to do.

The other answer is actually maybe even simpler. The experimental data we want actually doesn't exist. So for example, if you want to learn on the experimental data in literature for synthesis, turns out the formation enthalpy labels, which is the energy it takes to basically assemble the atoms in the shape you want, is so noisy that if you train a machine learning model on it, it's not predictive enough to predict the next one.

One of the reasons for this, as Liam mentioned, people don't usually publish negative results. And negative results are usually very context-dependent. So what's a negative result for someone might be positive if they do things differently. So not only is there this domain shift problem where what you're trying to do might be different than your training set, so the power law won't have a large enough slope you want. But the other problem is for some of these things we want to do, there's no data for it. For example, for superconductivity, there's a lot of datasets you can look at, but the noise floor on them is so high that training on them usually doesn't help.

Liam Fedus (28:41)

Doge, me, the entire team are deep believers in scaling up and scaling laws, but it's just: do a beeline for the thing you care about. And in our case, we care about advancing science, advancing physical R&D. That's the thesis.

Anjney Midha (28:57)

Is there a tension between being super Bitter Lesson-pilled and just throwing more compute at the problem and the domain-specific pipelines that the lab you guys just described will have to focus on? In the case of Periodic, I think you mentioned the first beelines you guys are making are towards superconductivity and magnetism. What is it about those domains that make them good candidates for the first pipelines that Periodic is working on? And are they just stops along the way to an AI physicist that generalizes across all kinds of domains, or is there a danger of them being essentially off-ramps that don't result in the AI scientific superintelligence that is the North Star for what you guys are doing?

Ekin Dogus Cubuk (29:45)

I feel like the high-temperature superconductivity goal is actually a goal that has so many sub-goals in it. It's a bit like when DeepMind and OpenAI started and said we're going to do AGI. But what they meant was they had to do so many things before they got to these cool results.

Like for us, if we want to get a high-temperature superconductor, we probably need to get good at autonomous synthesis, autonomous characterization. We need to get good at characterizing different aspects of the material, using the LLM to run the simulations correctly. So it's a North Star, and there's so many goals on the way that would be very impactful for the community. That's one reason.

Another reason is I feel like high-temperature superconductivity is such a fundamentally interesting question. For example, if we could find a 200 Kelvin superconductor, even before we make any product with it, that in itself says so much about the universe that we didn't know yet. To be able to see such quantum effects at such high temperatures, I think would be such an update to people's view of how they see the universe. So we feel like it'll be really impactful for humanity even before we make a product out of it. I think that's one of the reasons.

A technical reason also is superconductivity is a phase transition. So it's pretty robust to some of these details that we cannot simulate yet. So for example, when you make the material, the superconducting temperature usually is more dominated by its fundamental crystal property than the defects or microstructure, whereas there are certain other materials properties where even if the crystal has the property you want, there are so many other factors that you cannot simulate that will prevent you from seeing that property.

So superconductivity has this nice philosophical upside to it, has this technical upside to it, and it really rallies both the physicists. There are people who studied physics for 40 years and are really excited about superconductivity, and there are people who've never studied physics but are very excited about superconductivity. It's quite rare to find a topic that unites the whole team.

Liam Fedus (31:42)

Like Doge said, in order to do this, there are so many foundational pieces to solve, and our tactic is in order to actually get to this goal of AI scientists, you need to make contact—do the full loop somewhere. If you say you're doing this in just very vague terms, you sort of just end up back on archived papers and textbooks. And so it's really important for us to do the loop, but then create this repeatable process.

How do you go from subdomain to subdomain? And there's really interesting questions about how well do the ML systems generalize between these things. What is the generalization of a system between superconductivity data to magnetism data, for instance? And maybe that looks very different than its ability to generalize to fluid mechanics. And I think there's fundamental arguments to make there. But the goal is create this repeatable system, prove it, and then just go through the different domains that way.

Nathan Labenz (32:42)

Hey, we'll continue our interview in a moment after a word from our sponsors.

[Sponsor reads for Anthropic/Claude and Shopify]

Anjney Midha (36:52)

I can see the argument for why cracking room-temperature superconductivity from an experimental basis is extraordinarily valuable for humanity, but you guys are building a startup. And to use an analogy for why you need to have a clear medium-term path along the way to a North Star that is both commercially viable and net positive to society: what we've seen with other frontier labs that are working on automating white-collar work or software knowledge work is that there's this North Star of an AI researcher. But along the way, there were a bunch of sub-goals, and a concrete application that opened up a ton of commercial value and benefits for users on the way to that AI researcher was the idea of AI programming.

Software engineering has become probably the first major domain that's caused people to really update their priors about how useful AI models are beyond consumer applications, and in terms of productivity, their impact has been extraordinary in just a few short months. So if the traditional frontier labs' North Star was an AI researcher, and the path along the way to get there was AI programming, what is that for Periodic?

Liam Fedus (38:16)

Basically, copilots for engineers and researchers in advanced industries. So maybe perhaps just being in Silicon Valley, we really think about computer-oriented work. Everything is digital. Everything is bits. But there's so many industries—like space, defense, semiconductors—where they're dealing with iteration of materials, of physics, and that's part of their workflow. How are they designing these new technologies, these new devices? And in the absence of data and the absence of good systems, they don't really have particularly good tools. That is our opportunity, and these are massive R&D budgets.

So while high-temp superconductivity is a great north star, we very much understand that technology and capital are intertwined. We're going to be able to maximally accelerate science if this is a wildly successful commercial entity. And to do so, we want to accelerate advanced manufacturing in all these different industries, become an intelligence layer for all these teams to accelerate their workflow and start reducing their iteration time, get them to better solutions more quickly, accelerate their researchers and their engineers.

Anjney Midha (39:33)

Let's click a little bit deeper on that in practice. A day in the life of a Periodic team member—about half the team are ML scientists with machine learning backgrounds, and the remaining half are physical scientists with physics or chemistry backgrounds. How do you start by uniting the cultures? How do you take somebody whose primary career so far has been experiments in wet labs, doing physics and chemistry, and give them an intuition for ML, and vice versa? Because you guys are both physicists who then had the career trajectory where you also had the chance to be at frontier AI labs and were part of training systems that are now considered landmark machine learning systems, like ChatGPT, like GNoME. But for others who might be coming from one domain, how do you get the team to build an intuition for the other?

Ekin Dogus Cubuk (40:30)

It's a great question. We feel like it's actually crucial for us to make sure these teams work very closely with each other. So one of the things we're seeing is the physicists and the chemists need to figure out how to teach the LLM how to reason about this, because I think the frontier AI labs have figured out how to train them on math and logic, but not yet on physics and chemistry.

So one thing we're seeing that's been really productive is the physicists and chemists are thinking about what are the steps we should include in the mid-training, in the RL training, that will teach the LLM how to reason correctly about quantum mechanics, how to reason correctly about these physical systems.

Another one is the LLM researchers are learning quite a bit about the physics, the simulation tools, the goals. So they've been working together really well. We have weekly teaching sessions where the LLM researchers teach how the RL loops work, how the data cleaning works, and then the physicists and chemists are teaching about different aspects of the science, the history of science—that's also very important.

We feel like this has been going really well. One way of looking at this is the things we have to teach the LLM to be able to discover, say, a superconductor, include being able to read the literature really well—read all the papers, the textbooks, find the relevant parts—and then being able to run simulations, theoretical calculations, and then take action, run experiments. We feel like this is quite similar to the physical R&D researchers in these companies. They have to read the literature, read maybe internal documents or external documents, and then run simulations, run theoretical calculations, and then actually attempt to do something experimentally, learn from that.

So we feel like all the progress we're making towards our internal superconductivity or physics goals actually is making our LLMs much better at serving our customers who are doing very similar workflows.

Liam Fedus (42:29)

I think just culture: no stupid questions. You can ask the dumbest physics question, the dumbest ML question. And we have a few faculty as part of our company, and they're actually excellent teachers. So these learning sessions have been really fantastic.

Another thing I noticed is computer scientists often think in terms of APIs. So scientists will say something, and they're always trying to map it. "Okay, what's the input? What's the output? What's the target? How do I map that back?" And it's always just this translation.

And I think we also have built up as part of the team people on these different edges. So if you have a simplex of pure ML/LLM, pure experimentalist, pure simulation, there's people who live inside as well. And they have been excellent bridges for translating between these different groups of people. So it's active learning to learn the other spaces, creating APIs, and then these bridge connector people. I think Doge is an excellent example of that.

Anjney Midha (43:33)

Is it a requirement for somebody who wants to join Periodic to have an advanced degree in physics or chemistry?

Ekin Dogus Cubuk (43:41)

Absolutely not. One of the jokes we were making is—who was the NBA player who was saying, "I'm much closer to LeBron James than you are to me"? We were saying the opposite of that to candidates, because the amount that even our best physicist doesn't know about physics is much bigger than the amount that they know about physics.

So for this new candidate, even if they have no background in physics, how much they have to learn about what we're trying to do is actually not that different than how much the best physicist has to learn, because there's so much chemistry to learn, so much material science to learn.

I think this is one of the interesting aspects of science today. In the past, in the 1800s, there were these physicists that could do so many different things at the frontier. Today, we've reached a point where our intellectual knowledge is so large that a leading thinker can usually only advance in one very specific field. And maybe this is actually holding us back because, say, to discover an amazing superconductor, as we keep going back to this example, you have to know so much about chemistry, physics, synthesis, characterization, and unfortunately, I don't think any human knows enough about all of these. So we have to collaborate.

So I think our team is kind of like a small example of this where we have, as Liam said, a lot of different points in that simplex. And for any person, they have so much to learn, but that's true for basically every other scientist.

So, for example, I supposedly come from the physics side of it, but I've been learning so much more physics because we now have people from different areas of chemistry in the team, different areas of physics. And I think it's true for LLM researchers as well. They come in, and there are aspects of LLM that they probably didn't know until they started working with other researchers on our team.

So I think it's a great learning experience. And it's like a small example of what we're trying to do with the LLM, because we're trying to teach this LLM all these different things that we're learning as researchers. It's a really fun experience.

Anjney Midha (45:32)

And what are you finding makes a great researcher at Periodic that's different from what might make a great researcher at OpenAI or Anthropic or DeepMind?

Liam Fedus (45:43)

I would say there's very high overlap, but probably one of the biggest determinants is: do you care about this mission? Is accelerating science—is that the big goal? And I think looking at the team right now, it's just an incredibly mission-driven set of folks who are like, "Yep, this is the North Star. Let's do that."

If someone really wants to improve some mega corp's products, yeah, you'd probably be better off at that mega corp iterating and improving their products. But if you care about scientific discovery, I think Periodic Labs is the best place to do that.

Anjney Midha (46:18)

How big is the team today?

Liam Fedus (46:19)

We're roughly 30.

Anjney Midha (46:21)

And as you think about taking a lot of the research that's going on at the company and deploying that out in the real world, the kinds of customers that we've talked about—space, defense, advanced manufacturing—these are mission-critical industries that are known for being essential to whatever part of the economy they're part of, but often, they're not the fastest to adopt new technology.

How do you think about deploying the kinds of frontier agents that we've talked about that are great at science, great at physics, in companies or organizations that might not be anywhere close to as sophisticated as you are in AI or ML? Do you have a working thesis for how to make sure that the arc of progress is not bottlenecked on deployment? It sounds like you have a fairly good thesis on how to unblock the arc of scientific progress on the research side. But when it comes to deployment, what might be a working theory that you guys are optimistic about that would help get the systems that Periodic is building out into the real world?

Liam Fedus (47:36)

Maybe one thing that we've noticed in our conversations with all these companies is they all are looking for their AI strategy. They understand that the technology is shifting really quickly, and they're looking at how they're doing their work, and it's not changing as quickly as they think it should be. Some industries also are losing key expertise in different fields, and they're losing these senior engineers, senior researchers, and they're like, "How do we preserve that?"

But one thesis is understanding—thinking about these APIs and thinking about what are the evaluations, what are the biggest bottlenecks for these companies—looking at some of the problems they face, and we can map that to our systems. And we say, well, we think we can dramatically accelerate this.

And so it's not coming in and saying, "Hey, we're gonna transform your fab line on day one. We're gonna transform how you're doing everything. Forget everything." It's like, no, we're gonna solve a really critical problem, well-scoped, very clear evaluations. You co-draft that with them and just show them how powerful this technology can be when you optimize against the thing you care about.

So nothing particularly surprising here, but sort of like a land-and-expand type method as you might expect. But really looking for who are the biggest promoters within that company, what are the biggest problems, make sure you're solving a very real thing for them, and intersect that with where is our technical capability the highest.

Anjney Midha (49:08)

You were on a call this morning with one of the customers in your pipeline. We don't need to name who, but what were some of the things you heard as their most urgent problems that they'd like for Periodic to solve?

Ekin Dogus Cubuk (49:18)

One of them was simulations. They spend a lot of time training people on some of these simulations they need to use that are critical for their development. And being able to automate those simulations would be quite enabling. The design process and then some of the small things like matching the formats, being able to feed the simulation results into the design pipeline. All of these seem quite important, and then being able to treat the data together in the same place.

Liam Fedus (49:50)

I think there's a really fundamental question. A lot of these companies will rely on retrieval. That's sort of a super lightweight thing. Someone shows up with a neural net, and they're like, "Great, we'll just retrieve over all of your data, and then that's your solution."

However, as we've seen with things like ChatGPT and other things, when you pre-train on the data, when you actually encode the knowledge into the weights, it's not just a retrieval system. You have a richer, deeper understanding of the material. And I think this is a big fundamental challenge.

So for instance, for this customer, they can give privileges to their employees and have retrieval acting on behalf—the system acts as the user—and so you can match those same kind of privileges for access. But if you start doing pre-training or mid-training on different parts, it's like, well, if you pre-train on every piece of data, that might only be accessible to the CEO of that company. So then you have to figure out how do you bucket that knowledge and create different types of systems.

But I think right now, after talking with the user, they don't seem to have a great solution for distilling all of the knowledge into a single model or into a set of models. Going beyond retrieval to proper training.

And then I think also the supervised training they're doing is really akin to the early days of ChatGPT, where it's input-output, you have a few examples. And transforming this new way of thinking: no, high-compute reinforcement learning is really effective. This is how you should think about the strategies it's using. This is how you create effective tool-using towards those problems, and this is how you optimize it effectively.

Anjney Midha (51:42)

Could you describe for folks who may not be familiar with it what you mean by mid-training? Because people are familiar with pre-training. They're familiar with post-training. But in the Periodic context, what does mid-training mean?

Liam Fedus (51:52)

Yeah, sorry for the lingo. So I think this term came up years ago where it's like, well, we had pre-training. We had post-training. But sometimes you need to put in a little bit more knowledge.

So before search worked really well, there was an issue of freshness. We had pre-trained models, and they have a knowledge cutoff. So there's a scrape of the Internet at that point, but users want more real-time knowledge. So it's like, how do you get that in there? And enter mid-training.

Mid-training is basically you're taking new data, new knowledge that's not in the model, and you continue pre-training. And this differs from standard post-training where post-training typically is more reinforcement learning, supervised learning. And the mechanism or the goal of it is just put a lot of knowledge into the model that doesn't exist before. So that's mid-training in a nutshell.

Anjney Midha (52:43)

And in the Periodic context, does that mean essentially going and injecting a ton of custom data from an experimental implementation in a particular customer or particular industry? What are the atomic units of mid-training that will improve the capabilities of the models on problems that they're just terrible at today?

Liam Fedus (53:12)

It's all the knowledge. So you can have very low-level descriptions of physical objects, like crystal structures, for instance. You can also have higher-level semantic descriptions of, well, this is how I made material XYZ. And trying to get all this data into the model is really valuable. So it's simulation data, experimental data. None of this exists, and basically putting that knowledge into the model and making sure that these distributions are connected in some way.

And what I mean by that is if you just sort of mix together distribution A, B, and C, there's no guarantee of generalization. What you want to hope to see from these systems is the inclusion of this other dataset is improving performance on the other datasets. And so these are just machine learning techniques or machine learning problems to solve. But basically, just make it an expert in physics and chemistry where it was deficient before.

Anjney Midha (54:06)

You guys both know that I spent some time running evals on a bunch of these models at the Stanford Physics Lab earlier this year, and the results were that the models are terrible at scientific analysis.

Liam Fedus (54:17)

Because they weren't trained to do so.

Anjney Midha (54:20)

Because they weren't trained to do so. But on the other hand, many of the existing research teams working on the general models are investing in trying to make these better. Is there something about the way you're building Periodic that gets the draft off of all of that progress in the base models, or do you have to start everything from scratch and therefore not be able to be composable with advancements happening in the mainline models today?

Ekin Dogus Cubuk (54:50)

We benefit from all the different advances. So one of them is the LLMs are getting better, and we definitely benefit from them because we take a pre-trained model and then mid-train it with high-compute RL.

Another one is the physical simulation tools are getting better. They're open-sourcing new ways of simulating, new ways of using machine learning to predict properties. So we get to basically utilize all of those. And it seems like machine learning has made such an impact in the physics and chemistry fields that we expect these improvements to continue.

Liam Fedus (55:22)

I think another thing is when we think about tools for agents, we think of, here's a browser, here's Python, but increasingly, people think about tools as other neural nets, as other agents. And so if you look at a lot of physics code, it's not particularly deep. This isn't competition programming. This is kind of hacky scripts. But you can rely on some of the best systems for wherever they spike on. So neural net as a tool to these agents is something that immediately accelerates our work. So you don't have to replicate everything.

Anjney Midha (56:03)

There's a historical pattern that a lot of the fundamental research in the physical sciences—physics, chemistry, biology—has historically been done at university labs. Is there a role at all that the university ecosystem will play in Periodic's future, or do you think these are just completely divergent paths?

Ekin Dogus Cubuk (56:25)

Absolutely. So much of the simulation tooling we use has been developed in academia. Many of it is in Europe, for example, a lot of the novel synthesis methods. So we definitely benefit from a lot of these different, very deep technical progress.

Like, for example, all the physical simulation tools are these complicated Fortran code that on our team, for example, we don't really know how to develop very efficiently. But we feel like there's definitely a very deep connection between academia and industry labs.

So for example, recently, a lot of the large-scale simulations have been done in industry labs like Microsoft, DeepMind, and Meta, but a lot of those tools have been actually developed in academia and then passed on. So there's actually really nice synergy there.

Liam Fedus (57:14)

I'd add a few other things too. Like you found when you were evaluating models on their ability to do scientific analysis, they were deficient. This was probably not a direct goal for those teams training those models.

So I think academia and these collaborations can say, well, help us inform what are the important tasks. How do you do this analysis? What skills do we want to put in the model? A skill could be a full analysis, or a skill could be a smaller primitive as part of a larger analysis.

But also secondarily is how do you think? So one of the physicists was looking at the reasoning strategies of one of our models. He's like, "It's all wrong. It's all wrong." And we're like, "What do you mean?" He's like, "No, this should be thinking higher level. It should be thinking in terms of symmetries. This is the book that encodes the thinking strategies that will be more effective."

And of course, your reinforcement learning environment needs to reward those types of strategies. But given some of the most premier scientists are using these strategies, they're likely effective. And these are types of things where an industry-academic partnership can be so powerful because industry is simply blind to these types of analyses, these tools, as well as just this way of thinking.

Ekin Dogus Cubuk (58:32)

And there's a way of connecting that to the tooling question as well because language is very important. But then in the human brain, we also see other visual processing, geometric processing. So it's plausible that while these LLMs will keep getting better and better, they'll actually benefit from having geometric reasoning that's separate.

So today we can do that with equivariant graph neural networks, we can do it with diffusion models that are geometric tools by construction, and the LLM can call them. So then it can have both the language aspect, which is very good for a synthesis recipe, but also the geometric aspects, which is very good for representing atoms, just design geometries in general.

Anjney Midha (59:12)

So how are you thinking about deepening Periodic's ties with academic labs?

Ekin Dogus Cubuk (59:15)

This is very important for us. So we have two major initiatives in this direction.

One of them is we're starting an advisory board. This will be expertise spanning from superconductivity to solid-state chemistry to physics, and we want to make sure we're in touch with these long-term research directions. A lot of important government funding goes to these groups, and we want to have a tight coupling between what's important for them and us.

So this includes superconductivity expertise such as ZX Shen from Stanford on the experimental side, and Steve Kivelson on the theory side. We also have synthesis expertise on the advisory board from Mercouri Kanatzidis from Northwestern University, and Chris Wolverton on the high-throughput DFT side.

And our second initiative is going to be through a grant program. We really want to enable some of this amazing work going on in academia, and some of their work isn't a good fit for industry—it's best done in academia. So we want to accept grant proposals, and we want to enable and support the kind of work that's going to help the community, especially in relation to LLMs, agents in synthesis, materials discovery, physics modeling. So maybe after this show, you can include the link.

Anjney Midha (1:00:31)

Yeah, we'll include them in the show notes. So for people who might be interested in joining Periodic, what are you guys looking for?

Liam Fedus (1:00:37)

First off, someone deeply curious, someone who really wants to understand the machine learning, the science at a deeper level, who wants to make contact with reality, who wants to advance science. This has to be a driving thing, but also pragmatic. What we're trying to do is incredibly challenging, and someone who has very careful process and they're solution-oriented. They get to goals quickly.

And really, someone world-class along some dimension. We're looking across all these different pillars—machine learning, experimentalists, simulation—and people who can bring some sort of innovation on: how do you create a creative ML system? How do you bring new types of tools or new types of thinking to some of these state-of-the-art models? Someone who can advance simulations and make it more robust and more reliable with experiment.

Ekin Dogus Cubuk (1:01:38)

And maybe one more thing I'd add is Liam and I have been really looking for a sense of urgency in candidates because we want these technologies not in 10 years. We don't want these LLMs to start improving science in 10 years, but we want them ASAP. So if the candidate feels a sense of urgency for improving these physical systems, discovering these amazing materials, innovating on superconductivity, they would be a good fit.

Liam Fedus (1:02:01)

If you match all these, please reach out.

Anjney Midha (1:02:03)

Alright. Sounds like we gotta amp up the speed, the scale of stuff happening at Periodic, and we'll put the career links in the show notes. Thanks for coming, guys.

Nathan Labenz (1:02:33)

If you're finding value in the show, we'd appreciate it if you take a moment to share it with friends, post online, write a review on Apple Podcasts or Spotify, or just leave us a comment on YouTube. Of course, we always welcome your feedback, guest and topic suggestions, and sponsorship inquiries, either via our website, cognitiverevolution.ai, or by DMing me on your favorite social network.

The Cognitive Revolution is part of the Turpentine network, a network of podcasts where experts talk technology, business, economics, geopolitics, culture, and more, which is now a part of a16z. We're produced by AI Podcasting. If you're looking for podcast production help for everything from the moment you stop recording to the moment your audience starts listening, check them out and see my endorsement at aipodcast.ing.

And finally, I encourage you to take a moment to check out our new and improved show notes, which were created automatically by Notion's AI Meeting Notes. AI Meeting Notes captures every detail and breaks down complex concepts so no idea gets lost. And because AI Meeting Notes lives right in Notion, everything you capture, whether that's meetings, podcasts, interviews, or conversations, lives exactly where you plan, build, and get things done. No switching, no slowdown. Check out Notion's AI meeting notes if you want perfect notes that write themselves. And head to the link in our show notes to try Notion's AI meeting notes free for 30 days.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.