The Decade of May 15-22, 2025: Google's 50X AI Growth & Transformation with Logan Kilpatrick

The Decade of May 15-22, 2025: Google's 50X AI Growth & Transformation with Logan Kilpatrick

Logan Kilpatrick from Google DeepMind returns for his fifth appearance to discuss Google’s transformation from "sleeping giant" to AI powerhouse, sharing insights from his year at the company as AI usage grew 50 times to 500 trillion tokens per month.


Watch Episode Here


Read Episode Description

Logan Kilpatrick from Google DeepMind returns for his fifth appearance to discuss Google’s transformation from "sleeping giant" to AI powerhouse, sharing insights from his year at the company as AI usage grew 50 times to 500 trillion tokens per month. He examines Google’s strengths, including superior compute infrastructure, frontier models like Gemini 2.5 Pro, viral products like NotebookLM, and the deepest AI research talent in the industry. The conversation covers whether leading AI companies will become more similar or different as easy opportunities disappear, why startups still have unique chances, and the potential impact of Google’s ultra-fast diffusion language models. Logan also shares practical advice for joining early access programs and getting noticed by industry insiders, including his personal email and an open invitation to reach out.

SPONSORS:
Oracle Cloud Infrastructure: Oracle Cloud Infrastructure (OCI) is the next-generation cloud that delivers better performance, faster speeds, and significantly lower costs, including up to 50% less for compute, 70% for storage, and 80% for networking. Run any workload, from infrastructure to AI, in a high-availability environment and try OCI for free with zero commitment at https://oracle.com/cognitive

The AGNTCY: The AGNTCY is an open-source collective dedicated to building the Internet of Agents, enabling AI agents to communicate and collaborate seamlessly across frameworks. Join a community of engineers focused on high-quality multi-agent software and support the initiative at https://agntcy.org/?utmcampaig...

NetSuite by Oracle: NetSuite by Oracle is the AI-powered business management suite trusted by over 41,000 businesses, offering a unified platform for accounting, financial management, inventory, and HR. Gain total visibility and control to make quick decisions and automate everyday tasks—download the free ebook, Navigating Global Trade: Three Insights for Leaders, at https://netsuite.com/cognitive


PRODUCED BY:
https://aipodcast.ing

CHAPTERS:
(00:00) About the Episode
(02:32) Introduction and Reunion
(02:49) AI News Avalanche
(04:15) Similar Development Trajectories
(08:19) Google's Cultural Transformation
(15:02) Incentives and DNA
(16:55) 500 Trillion Tokens (Part 1)
(17:01) Sponsors: Oracle Cloud Infrastructure | The AGNTCY
(19:01) 500 Trillion Tokens (Part 2)
(19:56) Future Model Convergence
(23:50) Startup Opportunities (Part 1)
(27:51) Sponsors: NetSuite by Oracle
(29:14) Startup Opportunities (Part 2)
(30:24) API Strategy Discussion
(34:29) Model Release Philosophy
(37:04) Personal Model Usage
(41:22) Long Context Capabilities
(48:28) Early Access Programs
(53:35) AI Productivity Value
(58:53) Agent Development Spectrum
(01:05:57) Agent Trends Discussion
(01:07:01) Diffusion Models
(01:09:48) Path to AGI
(01:13:22) Future of Work
(01:24:06) Outro

SOCIAL LINKS:
Website: https://www.cognitiverevolutio...
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://linkedin.com/in/nathan...
Youtube: https://youtube.com/@Cognitive...
Apple: https://podcasts.apple.com/de/...
Spotify: https://open.spotify.com/show/...


TRANSCRIPT

Introduction

Hello, and welcome back to the Cognitive Revolution!

Today's guest, Logan Kilpatrick, needs no introduction – this is his 5th appearance on the show, and his tireless work to support AI application developers, previously at OpenAI and for the last year & change at Google, is legendary. 

In this conversation, with the benefit of at least a little time to process, we're looking back and digesting the overwhelming volume of major new AI models & products that Google and others have recently released.

Logan describes his personal experience at Google as their AI usage has grown 50X, from 10 trillion tokens per month a year ago just after he started there, to 500 trillion tokens per month today – which, notably, is more than 50,000 tokens per month for every person on earth –

and he also shares his perspective on Google's incredible organizational transformation from "sleeping giant" to top-tier AI powerhouse, with assets headlined by the strongest overall compute infrastructure of any company, top-tier and pareto frontier models, including Gemini 2.5 Pro, highly original & viral products like NotebookLM, game-changing applications in medicine and science that are starting to ship to trusted users, and what I have always and still consider to be the deepest bench of AI research talent and most diversified, well-rounded research agenda of any company in the world. 

He also offers thoughtful analysis on whether we'll see continued convergence among leading AI companies or more divergence as the "low-hanging fruit" gets picked, why he believes startups still have unprecedented opportunities despite big tech's advantages, the implications of Anthropic cutting off Windsurf for partnering with OpenAI, how the blinding speed of Google's diffusion language model could bring about another revolution in software, and why… despite all the AI capabilities advances he's seen and helped to popularize, he's still betting that humans will continue to matter and taking a relationship-centric approach to his work.

Speaking of relationships, perhaps the highest-alpha part of this episode is when I ask Logan for advice for those who want to break into early access programs or otherwise get help & attention from people in similar positions.  I won't spoil it here, but his response includes his personal email, and an invitation to reach out.  

As always, if you're finding value in the show, we'd appreciate it if you'd share it with friends or leave us a review. And we welcome your feedback via our website, cognitiverevolution.ai, or on your favorite social network.

Now, I hope you once again enjoy hearing from Logan Kilpatrick of Google DeepMind.



Main Episode

Nathan Labenz: Logan Kilpatrick, everybody knows who you are. Welcome back to The Cognitive Revolution.

Logan Kilpatrick: Thank you, Nathan. This is the world record for the most number of times. I feel like you should just make me an independent, reoccurring segment on some regular cadence, because we get to do this a lot and it's wonderful to be back.

Nathan Labenz: It's only your calendar that would prevent that from happening, so be careful what you wish for. We were joking about titling this podcast The Decade of, The Week of May 15th to May 22nd, 2025. Holy moly, we've got an absolute avalanche. I've been saying for a long time, my grip on all AI news is slipping, and with this moment, I think it's officially slipped for everybody. We've come a long way since GPT-4, and a ton of stuff is happening. I want to run it down, but I also realize at this point, we can't even be comprehensive, so I also want to take some strategic opportunities to zoom out a little bit and just get your bigger picture perspective on some things. First one in that vein is, over the last however many months, we've seen several waves of leading AI companies launching very similar things in pretty short periods of time. This has happened with reasoning models. Most recently, it has happened again with the coding agents, and on a feature level, we're also seeing quite a bit of connecting into your Gmail, connecting into your Google Docs, different context retrieval type things. Some of that's obvious, but some of it is pretty core research driven, right? Getting the models to reason. How do you understand why the different leading companies seem to have such a similar development trajectory and also launch timelines?

Logan Kilpatrick: That's a great question. I think there are a couple of dimensions to this. One, on the research side, there are true innovations that when people go out and talk about them, it becomes clear in hindsight why something should be done. And I think maybe reasoning is that story. Obviously, DeepMind has been working on that reasoning stuff for a long time. I think it was some of the particular techniques; it just became clear that, "Let's make this level of order of magnitude of investment," and things ended up working out pretty well. Then you bake in all the other stuff that we had been trying that was independent and different, and you start to see some really interesting things. So, I think some of this is people like the path, and this is what's awesome about an ecosystem: other people light a path, you get to benefit from the path they've lit, and then you can bake in those innovations into what you're doing, plus benefit from the bets you were making that were independent of that. So, that's exciting on the research side. On the product side, there are a lot of people... AI is the most competitive ecosystem in the entire world right now, both at the model and product side. There is not a more competitive ecosystem with the amount of money, talent, intellectual capital, speed of execution, than there is in this AI ecosystem right now. So, there are just a lot of competitive people who are really good at what they do, and they don't want to be pushed behind by their competitors. There's this feeling that you have to stay on par with what everyone else is doing, and that actually ends up being... I feel that tension as somebody who builds products in the AI ecosystem, which is, there's this tension between doing what you think the long-term future is, versus not trying to look like you're behind in the present moment, and finding that balance point between the long-term bets that are distinct, that are actually going to give you a differentiated perspective over time, and what short term, we just need to have parity, is a tough challenge. And again, I feel this on the developer side as well, because we obviously provide compatibility layers for people who use other model providers to come and use Gemini, so there are always thoughts about how much do we invest in that to get feature parity while also developing next generation API capabilities and model capabilities? So, it's a tough trade-off. On the dates and timing, some of it is intentional. A lot of it actually just ends up being relatively happenstance that things launch around the same times. But it is always fun to see the conspiracies of X, Y, and Z companies just sitting on something and then, with a one hour notice, they just say, "All right. We're going to all put..." Anyone who's worked in any company that's larger than 20 people knows it's not actually possible to do that. The amount of operational overhead and burden to do something like that means companies are not that nimble. Even your favorite AI lab is not nimble enough to have that level of reaction.

Nathan Labenz: I do have to say, Google and DeepMind and your team specifically, you have been pretty nimble, right? One of the things I wanted to ask is what has the experience been like? What sort of culture... A year ago and definitely two years ago, the outside view of Google was sleeping giant, turned sclerotic, everybody was kind of managing their own little fiefdoms. I don't know to what degree that was really true. Now, the narrative is totally flipped. The giant is wide awake and really remarkably keeping pace with even much younger and smaller companies. One of the most interesting data points shared at the IO event was 500 trillion tokens per month now being processed across Google's services. We are now into the next month, so at the rate of that curve, it might be literally 2X more already. I don't know if you are watching the dials that closely to know, but how has that happened at Google culturally? What has shifted or what has your experience been internally, not just going through that curve, but rallying everybody to actually support all the different work that has gone into supporting that curve?

Logan Kilpatrick: One of the interesting parts about this story is that at its core, it's a people organizational story. The challenge is that's not selling front page New York Times articles or whatever, choose your favorite analogy. Historically, if you look at how Google was set up to do this work, the reality was it wasn't set up for this moment. Google obviously had many different teams doing AI, for many good reasons, as they were pursuing fundamentally different goals. Google Brain, for example, historically had a very wide breadth of truly different research. That's where many transformer models and other things came out of that varied breadth of research. At the same time, Google Research was doing more applied things in some cases, trying to upstream much of that into other parts of Google. Brain also did that sometimes. Then you had DeepMind, where Demis and the team had a strong opinion about how they thought we would get to AGI. They were pursuing a very specific research direction, and much of that is evident in what has come out of DeepMind over the last six or seven years. In a world where there wasn't a clear winner as far as what the technology, at least in the short term, could be to help scale us closer to systems that get closer to AGI, it made sense for Google to have those bets across the board, with very different structural organizations and approaches. But it became clear at a certain point that one of those things was working in the short term, and we should rally resources to get everyone on the same page. That happened at the end of, or the middle of 2023, when Brain, part of Research, and DeepMind merged. That was the start of Google positioning itself for success with this technology for the next 10 years. The challenging part is that large human systems are extremely complex. There are many humans involved, and it's easy to forget the level of complexity and chaos in human systems, regardless of the desired outcome. The DeepMind team, Demis, and the leadership have done a great job of reinventing the culture to create an organization that brings two very different organizations together under a single roof, setting up the team structure so Google could be successful building these models and then going through the process. Large-scale training runs don't take a minute, so your iteration cycle is not super quick. The iteration cycle of releasing models and doing all the end-to-end work is not super quick. It took time to get the iteration cycle going. OpenAI and others, specifically OpenAI, had been doing that iteration cycle a bit more leading up to these moments than we had been doing at the time. As you do that iteration cycle, suddenly everyone wants AI. The 500 million tokens a month is a 50X increase over the previous year. At the same time that you're setting up the right organizational structure and doing the iteration loop to ensure you're training the world's best models, you also need to scale up hardware, which doesn't happen instantaneously either. We need TPUs to do research and for inference. The amount of TPUs you need isn't just sitting there idly waiting, so there's work and timelines involved. All things considered, given the constraints, we're in an incredibly good position. What excites me most is the slope of improvement across all those dimensions: How can we work better as a team and get everyone on the same page? How do we ensure that breadth of research upstreams back into the main Gemini models? How do we ensure we have the world's best infrastructure? How do we ensure we have that iteration cycle for releasing new models, learning hard lessons, and developing rigor around that? We're doing all those, which has been incredibly exciting to see. DeepMind also, the last comment I'll make, has transitioned, and we talked about this before, from being an organization that did foundational research to now building products. That's the last step of this organizational journey: How do we build the Gemini app? How do we think about what we do for developers? How does that influence how we train models and what that iteration cycle looks like? All of that has happened, the work has been done, and we're putting the pieces in the right places. Now, for the next three to five years, we get to see the outcome of making good and hard decisions to put things in the right place.

Nathan Labenz: If I had to summarize that and perhaps contrast, though I won't ask you to, there's another big AI research organization with a fragmented structure, tons of compute, and researchers pursuing many different directions for years. That, if anyone hasn't identified it, is Meta, and the contrast has been quite strong. One possible explanation is that DeepMind leadership saw they were getting close. This seemed like it might be a significant development, so it was worth going through the trouble of reorganizing. I'm sure Demis would rather do many things than reorganize an organization and redraw reporting lines. But if you're close and getting on that wartime footing, so to speak, not that I ever wanted to see an AI war, it's worth it. You haven't seen that same thing at Meta or a few other companies, nor have you seen the integration of the work. They do have Meta AI in their apps, but it's clearly not on the same level, and they also haven't done the reasoning aspect. I'm sure Meta researchers were at the same San Francisco parties, getting those same subtle hints that this seemed to be working, which led everyone else to say, "Okay, we better make sure we're on that train." And thus far, they haven't. For me, the takeaway was perhaps the importance of conviction in leadership to do whatever it takes and push hard on small hints that seem credible. That seems to matter a lot right now.

Logan Kilpatrick: The other piece I'll add is how I think the incentives matter a lot, especially for Google. Google has been, and Sundar has said this many times, an AI company since he took over in 2016. People joke that Google was sitting on the Transformer and didn't use it, but the Transformer was powering Google Search at multi-billion user scale in many different ways. The technology was being used, not in the same incarnation as current generative AI, but at that level of scale. Building models, deploying them, and building that infrastructure and iteration process has been in Google's DNA. It obviously needed to be reformulated a bit for the current team doing that across Google. The other piece of this is the incentives. If you look across Google's products, Google, organizationally and in terms of the future of its products, is highly incentivized to make great models. The great models we make, and this is an interesting thread we should discuss, are present across all our products. Whether you're writing in Docs, doing things in Sheets, in a Waymo, doing stuff on YouTube with video, or a Cloud Enterprise customer doing something, all those use cases benefit from this. It's not just an add-on; it's fundamental to the success of those products. So, there's an interesting angle here regarding how incentivized Google is to be successful. I think we are highly incentivized, and it's in the DNA of what the company has been doing for the last 10 years. If you don't have those two things, this moment would probably be a lot more painful than it has to be.

Nathan Labenz: That 50, or I should say 500 trillion tokens per month, is 50,000 tokens per month for every human being on Earth, which is a pretty crazy number.

Logan Kilpatrick: That is crazy.

Nathan Labenz: It's grown a lot faster than I expected. Obviously, there are other providers out there processing a lot of tokens too. So we're now getting into this regime. Of course, not everybody is using it, so we're starting to see significant inference numbers on a per capita basis. As you look ahead, do you think we'll continue to see this convergence where the leaders will mostly be doing things measurable on the same bar charts? Or do you think we will start to see more divergence, which could mean different form factors, significantly different strengths and weaknesses? Who knows what divergence might look like, but what's your expectation there?

Logan Kilpatrick: I would guess we see more divergence, to be honest. I was at a dinner a couple of nights ago talking to some founders, and many of them are clearly betting on model convergence. I think it depends on what level of abstraction you want regarding what will converge. My general sense is that the low-hanging fruit has been captured. So now, the question is what structural advantages do you have as a business to train LLMs? I think Google has a really important infrastructure advantage in the ecosystem and other things where those will shine through. Getting to this point was not unexpected. Now, getting to the next level will not be easy for many people, and that's where world-class teams and those making an order of magnitude bet on this will see the advantages. Intuitively, through that lens, it only gets harder from this point. AI innovation does not become easier after this point to make these models better. I would guess a lot of teams will start to... It will be interesting to see the size of the labs; maybe all labs will keep doing everything, but I think there will be opportunities to focus on a specific area. I'm just conjecturing here, but you could imagine Anthropic deciding, 'We actually just want to be the world's best coding model, and that's the only thing we care about, and that's what success looks like.' I don't think this will be true because they have a very broad mission that doesn't seem specifically code-focused. However, you could imagine companies doing some angle of this where they decide there's value in diverging from a super general path because they could build a great business doing that. Again, it's at odds with some of the broad missions these companies have, but I do think there's real value in that, and you can go deep and start to understand how to build a long-term business and company around some of those things. Maybe the big labs won't do that, but for others, there are obviously more people training foundation models than just the big labs. Many of those companies are going down the path of finding something they're really good at and building a differentiated perspective on how to solve this problem from a model or infrastructure perspective, whatever it is. I think that makes a lot of sense, honestly.

Nathan Labenz: That's interesting. I don't know. I should be at least somewhat deferential. You probably know better than I do, but I just look at how fast these core foundation models from the leaders are getting better, and I'm like, "I would not want to be a tier two foundation model trainer in today's world." Especially if it's only getting harder from here, and you're saying that from the DeepMind position, I think that sounds really hard from any other position. I guess my full... It won't be for you to sign onto this in this moment, I don't think, but transparently, my position is I just think the big tech companies are going to win anything and everything they want to win. And you're seeing this bleeding into the application layer as well, right? I'm interested to hear, I know you've been very focused on supporting developers directly via the AI Studio and the APIs. You also, I'm sure, are in regular dialogue with folks like Cursor and Windsurf and anybody who might use Gemini 2.5 Pro as a coding model. But now we've also seen all the big three frontier developers in this last wave. They've all put out a coding agent too, right? So how are they feeling and how are you talking to them about the fact that you want them to use the model, but you also now do have a competing product in market against them, right?

Logan Kilpatrick: A couple of things. One, I think the product we do have, you're referencing Jules, right?

Nathan Labenz: Yes.

Logan Kilpatrick: At least for us, Jules is definitely super early. I think I'm super excited. It's a great team inside Google that's working on it. But obviously, the level of adoption that some of these other AI coding products have is very much on a different level.

Nathan Labenz: 500 million ARR, they just said today.

Logan Kilpatrick: I saw that tweet, which is exciting for them. So I think there is... I was talking to someone last night, and the comment I made, which continues to be true, and I don't know if I've said it on another episode that you and I have talked about, is there's no better time in human history than right now to be building a startup. Truly, if you're building a startup to build language models and compete against all the big labs, you better be very well capitalized to do that because that's a very difficult problem. If you're building out the application layer, it's never been easier. The time to build software, the opportunity to explore new ideas, the pace at which AI enables you to potentially, this current AI moment enables you to scale monetization, build really retentive user products, and build a profitable business. All of these things, the barrier has never been lower than it is right now to do all those things, which is just as somebody who fundamentally believes in developers changing the world. I think that's the coolest opportunity ever. And yes, sure, the big tech companies will hopefully be successful as well and will sell infrastructure and do some things at the application layer, but the real opportunity is there are a million and one different problems to be solved. And some of these large billion user products solve things in a really general way. And the cool thing is you can really go deep for some specific user segment and solve their problem in a unique way. And the cost to do that from building a startup and writing the software to do it has never been lower. And the outside world, the startup world, there are 1,001 different AI tools that you get to leverage in order to get to that place. Larger companies don't, just because of the level of security, privacy, and enterprise requirements, they often don't use a lot of those tools. It's a different bespoke set of tools, and the pace of tooling innovation for large companies often happens a little bit slower than it does for the startup ecosystems. You have all of these entrenched speed advantages, and then you couple in the idea that everyone's going to have a bunch of agents building stuff for them in the future. I continue to be super, super excited for people even in the coding space at the application layer who are building stuff. There are just so many cool things to be built.

Nathan Labenz: The importance of speed, or the criticality of the advantage of speed for startups, I think has definitely never... it's always been true, I suppose, but it seems to have taken on an extreme importance now. Just not too long ago, I talked to Andrew Lee from Shortwave, who's building a Gmail on top of Gmail, but also a Gmail competitor. And he said after really soul searching deeply, "We came to the conclusion that our only advantage is speed." We really have nothing else.

Logan Kilpatrick: And focus. Focus is the other piece. And I think this goes to big companies, and I feel this as well. There's lots of tension for me because the cool thing about Google is there are a million and one innovative things happening. The challenge is how do you actually balance that and take action based on the million and one innovative things happening? It's a real burden. And the nice thing for startups is you don't have a million and one innovative things happening. You can just go and do one thing. And I have a lot of envy for folks who have that because you just don't need to make a lot of decisions. You can just really focus on solving that problem at hand. And I think that the speed of execution, the ability to focus on just a single thing, is a blessing, and take advantage of that as much as possible.

Nathan Labenz: What do you make of this Windsurf news lately? The brief story is they agreed a deal with OpenAI. They've been using Claude as their primary model, and then Anthropic cuts them off in virtue of having agreed to this deal with OpenAI. On the face of it, that's all pretty reasonable. But if I am a coding agent company or whatever that's thinking, "What's my long-term prospect here?" Right now the speed advantage goes to the startups because the big tech companies have been so friendly, I guess, to the rest of the ecosystem as to put the models out before they've implemented them in their own products in many cases. But it's not too hard to imagine that flipping, right? If Google wanted to say, "Okay, Gemini 3 we're going to deploy in our own coding agent and Gmail and Docs, and then a few months later we'll put it in the API." That would definitely flip the speed advantage on its head. Do you think startup founders should be worried about that?

Logan Kilpatrick: Yeah, that's an interesting question. One, on the Anthropic piece, I do think that the byline of Anthropic wanting to invest in who they think will be long-term partners and getting compute to those customers, I think, actually, I sort of... As somebody who's spent a bunch of time thinking about how we get compute to the right teams that are building products, I have empathy for that argument. I think that could make sense.

Nathan Labenz: It's totally defensible from any number. Simple business strategy, right? Don't support your competitors is, I think, totally defensible in many, certainly in normal business contexts.

Logan Kilpatrick: Yeah. I think as far as our strategy, the great thing for builders is Google Cloud is the fifth-largest enterprise business in the entire world. The mandate of Cloud is to bring this infrastructure to the rest of the world, to bring Google's infrastructure to the rest of the world so that people can build world-class startups and not need to rebuild the level of infrastructure that Google's built in order to scale the internet to where it is today. So it's just such a core and foundational part of the business that I would find it hard to believe the strategy shifting from shipping across our own services but also shipping across cloud services. Interestingly, oftentimes today, it's actually even more extreme than the picture that you painted, which is oftentimes, the external developers have an even larger speed advantage from a model perspective. If you think about who is the customer of models inside of Google, it's teams that are building billion-user products, and the teams that are building billion-user or 150 million user products. I was just talking to someone about some of the features and products that exist inside of Google Workspace, and some of the ones that I've never even thought about have 150 million monthly active users, which is crazy. That user persona, if you go and talk to enterprise users of LMs, they don't move the LM as quickly. They don't switch models as quickly because even though they're inside of Google, there's still all of the normal constraints of building a large user product, which is you don't want the behavior to shift, you have to do a ton of evals, you have to do all these things, and all that requires time. When you have a small product, it's very easy to quickly switch models and you don't have to think about it that much. But for teams inside of Google, they do have to think about that, and that's the responsibility to the users that we have is to be really thoughtful about that. So again, I think the time horizon would shift so dramatically as far as getting these models out the door if the strategy became we have to force internal teams to use these and then deploy them before we give them to external. They'd be so far off from where we are today that, again, I have a hard time imagining that would make sense. Also, from the business perspective, it's important for us to give LMs to developers because it's a core part of the Google Cloud business, which is, again, a huge business for Google.

Nathan Labenz: I know you know Daniel Cocotello. I don't know how well necessarily, but you overlapped at OpenAI. I'm sure you're familiar with his AI 2027 scenario. One of the interesting things in that is that he projects over these next two years that the model developers are going to start widening the gap. I saw Rune not too long ago on Twitter. Somebody asked, "How much ahead is what you have internally versus what we see externally?" And he said, "You guys have no idea how good you have it. It's two months. You're on the bleeding edge just behind where we are internally." But Daniel's projection is that will change and that for multiple reasons, including wanting to use the models intensively for their own ML research automation dreams of recursive self-improvement and takeoff, and who knows what, right? He's got pretty aggressive scenarios in mind there. He thinks that gap is going to widen, that the developers are going to start to hold the best models back for themselves, the public will satisfice more often, and the really insane stuff will be held very closely and known to few people. It sounds like you don't buy that scenario, or at least don't see any signs of that happening at Google.

Logan Kilpatrick: I think there are two dimensions. First, it is hard to get signal on how good models are. Evals are such a difficult problem, so you often don't really have an intuition as to whether this thing could be the right model long term if you don't release it to the world. So, that is one of many pressures to get the model out the door. I also wouldn't underscore the momentum or quasi-momentum war that happens. It is really important to project what the extra momentum looks like from an AI perspective because ultimately, if you talk to developers and people building companies, that is a very large influence on who they end up building on. I also think there are other threads to this, which is that switching is hard. There are so many layers that I have a hard time buying that we are not going to deliver models to the world in the same way we are doing it now. I think there are many levels of motivation and game theory that tell me that won't be true. It is also interesting to think about how you actually capture the most value from this technology from an economic perspective, and maybe it is not us. The cool thing about developers is that Google obviously has large distribution, but you get a really wide aperture of distribution across so many different things. Maybe the economic model looks slightly different, where the unit of intelligence is very... Assuming the models are much better and can do all this ethically productive stuff, the unit of intelligence being a token and charging people on a per million token basis, I could buy that changes in the future and your economic model looks different than how developers do it today. But I still think fundamentally, there is a great business to be built giving that to other people because how you are going to use this model looks very different than how other people are going to potentially use it, and you could build a great business doing that by releasing it to the world.

Nathan Labenz: The importance of feedback definitely is not to be missed, and that also connects back to the conversation about what is going on at Meta, which, for the record, you are not commenting on, but I am just tangentially mentioning. They are not getting nearly as much of that as the companies that currently have the best models in the market are getting. So, that is interesting. How much do you use other companies' models? Do you do your own different vibe checks across different providers? What is your model diet?

Logan Kilpatrick: I play around with a lot of stuff. I think it is interesting. It is fun to see how... Independent of my job, I am someone who loves technology and I love seeing cool AI products, so I spend a lot of time playing around with all the coding models. I think it is probably the thing that I experiment with the most. I just saw there is a lot of cool stuff happening in the audio space right now. We launched our native audio model at I/O, which was one of the threads, and it is available on NotebookLM and other platforms for developers. It has been really interesting to see that as an emergent space where people are building products and services, so I have been spending a lot of time playing around with the products and models that people have. ElevenLabs just launched a new model to do something similar, like native, really robust, natural-sounding audio. So it has been super cool. Across whatever dimension you want, there is a lot of fun stuff to play with. I still try ChatGPT occasionally to play around with and see what that experience has evolved into. It is fun to be someone who likes using this technology. It is interesting though, I am not one of those people who... I have many conversations with folks who talk about how they will send the same query to three different models and then examine the differences between those, doing it in three different tabs. These are people I have a lot of respect for, who are running companies and all this stuff, and I think that seems like a pretty cryptic way to be doing that type of experimentation, which has led me to believe there is probably some interesting product to build there, where people are really trying to understand the nuances and differences between models and engage with multiple answers. I think the multiple pieces of content concept is a really interesting thread to pull on. You can imagine that in the future and in product experiences. But I am not at that level where I have Claude, Grok, ChatGPT, and Gemini open at all times and I am asking my question in three or four places.

Nathan Labenz: I do not do that all the time by any means, but I do it on occasion. My general philosophy is always to try to be doing two things at once: one being whatever the object-level task is, and two being learning about AI's ability to help me with that object-level task. For example, contract review would be a great example. If I am going to take on an advisory agreement or whatever, they will send me the contract, and I will send it to at least three AIs. If none of them have an issue with it, I will just sign it without even reading it myself. Usually, they are pretty consistent, but it is also an interesting opportunity to see how they are presenting things a little differently. Claude is typically the shortest and the least formatted. It is hard to characterize. How would you characterize it? Gemini 2.5 for me, which you just launched a new version on stage at the AI Engineer World's Fair. I cannot claim any deep familiarity with the new one because it just came out yesterday, but the Gemini 2.5 Pro class of models, I think we are on the third date stamp version. It was one of those hair-raising moments for me because the command of the context window that it has was just so incredible. I dumped, and I have talked about this in a couple different episodes, a research codebase into it that had four to 500,000 tokens. No other provider, at least with the level of access that I have, even supports that length of context. To see the command that it had of it, I was literally going back and forth debugging problems in the AI Studio. Don't tell anybody because they might cut me off for this behavior, but I am putting... It is rewriting whole files for me, and then I am saying, "Oh, I got this bug. Please fix," and it rewrites another version of these long scripts for me with half a million tokens of context. That was wow, this is a significant step change. I wonder what other step changes you would highlight that people might not be fully aware of, or more subtle vibes and tone differences that you think distinguish Gemini from other options.

Logan Kilpatrick: The whole model behavior piece has been really interesting to see. Folks have a strong reaction to default personalities, and I think we're very early in coming up with a rigorous point of view on how to make a default personality that works well. Through a certain lens, if you look at the requirements for building the Gemini model, the baseline Gemini model is used across so many different products, even inside of Google. So many different products have such a varied point of view, and the product and user they're building for makes it really difficult to come up with something. I know Claude and Anthropic have done a ton of stuff trying to make the model personality feel distinct, and they have a point of view of what that should look like. I think this goes back to the advantages for startups. Anthropic gets to do that just because the consumer product is relatively small compared to billion-user products. So it has been interesting to see us take a more middle-of-the-road approach, not try to have too much of a personality, but then also make sure that the model can have that if that's what the product people want to build. The best example of this is the Gemini app. The Gemini app probably wants to actually have a personality and do some of that stuff. I think the challenge becomes how you maintain that. I've seen that time and time again: as you change the models, the personality dramatically changes. If you're intimately conversing with these models, it feels like the person or the model you were talking to before is now gone and it's replaced by something else. I think that's actually a pretty jarring experience for today's model iteration process. I think there's some interesting stuff to happen to make that not the case. But what you described, I actually have a tweet queued up that I need to put out about long context capabilities because it's far and away, 2.5 Pro, this current iteration, has a really large gap. I'll tweet this. I'm in my era of tweeting things live right now just because it's top of mind, but I'll put this out and then I'll send you in the chat this tweet that I just put out. Far and away, when you're talking about gaps in model performance, long context is actually clearly one of those right now. I'll put it in the chat for us. I just opened Twitter and there it was, eight seconds ago. This is showing OpenAI's MRCR, which is a long context eval that OpenAI built. You can see the delta between the models, and far and away, the latest version of 2.5 Pro is on the order of 20% better. This is eight needles, which is the hardest version. It's not just single needle context, which is retrieving one thing from the context window. Even the Gemini 1.5 Pro model from over a year ago was close to 100% accurate. That was basically a solved problem. The problem is exponential decay as soon as you start adding more. To see this level of progress from a model perspective, being able to process and find eight distinct items, is actually pretty remarkable, especially given there wasn't a whole lot of long context innovation that's happened in the last year. I think it's this combination, and I've had this conversation. I was just with Jack Ray yesterday, who leads our reasoning team. He's awesome. He leads our reasoning efforts and originally worked on long context. I've talked to him a lot about this fusion of long context and reasoning. It's reasoning that enables you to be able to use the full context window that's available. So it's cool to see that happen finally. You can feel it. I mean, of course, benchmarks and practical use are not the same thing. I don't know what I would have said about Gemini 1.5 in terms of whether it felt like it had the depth of command. I did a few things where we put whole books through it, asked it to find relevant quotes, and it could do that pretty well. But this new thing, if people haven't tried it recently and you have any... One of the things that's challenging is that's so much context that it's hard in many cases for you to know if it's doing a good job. Video is one of the best ones, I think. I've been doing this, and it's easy to... The challenge is you have to perhaps already watch the video, but it's a fun experiment. Take a long video you've watched and then go and ask a bunch of questions. That use case tends to shine, which is really interesting. We've also, interestingly, because of how good the context window has gotten, actually seen a shift in what the distribution of the request size looks like. People are actually using... Historically, we've been like, 'Why does no one really use long context that much?' We were definitely ahead of the technology and the curve from that perspective. Now, with 2.5 Pro, the long context usage is dramatically higher than it's historically been, which has been awesome to see people coming around and building on. I think it gets closer to this future, the whole long context versus RAG discussion. Historically, you could have brushed it off a little bit because the model wasn't really that good and no one was really doing it. But I think the future is going to look more like people putting more and more stuff into the context window. Of course, they'll still need RAG in cases, but it's awesome to see that corner turning from a long context perspective. Turn your hyperparameters up is one of my current mantras. Another good one for people who want to get a qualitative sense of the command that the new models have of long context: I have a simple Colab notebook for extracting emails from the Gmail API where I just do 'from me' basically, 'is sent' is my simple search to filter out all the crap I'm not engaging with, but just threads that I have sent a message to. Pull all of those. You can go back, depending on your volume of email, pretty far and get a pretty robust picture of who you are that still fits into a million tokens. Then you can start to get a sense for what the model understands of you from that million tokens, and it is pretty impressive. I can share that Colab notebook if anybody wants to mess around with that. I put it in that format so that you can do it without your data ever having to leave Google. It's just going from your Gmail to your own Google Drive via the Colab notebook. So it was the most secure way I could think to make it that I could share with you. I don't want to have your email. That's the last thing I need. You've mentioned a few times being with people, dinners, talking to founders. I guess two angles on that. One is, what are you looking for in those groups? Everybody who's building wants to be in the inner circle of early access programs, the trusted tester rosters, and all that stuff. How do people get into those programs? How do they get into the trusted tester sets? Also, can you enable the VEO 3 API for me please? How much is network how you're keeping up versus other ways of keeping up? That's a good question. I think this looks different depending on what you're doing. For me, I'm not sure how well this will track across people, but for folks who are listening this far into the conversation, I assume you're an AI enthusiast. My conviction is really strong after the first five minutes. I love that. So send me an email, honestly. We have a super robust early access program. We love feedback from people building interesting things. So if you're building something interesting, email me. My email is L, the first letter of my name, and then Kilpatrick, my last name, @google.com. Send me an email. I would love to hear about what you're building. I would love to get you into the early access program. It's not some big secret. Some stuff is more secret, some stuff is less secret. We really just love to work closely with developers and get feedback and be as open and building with people as possible. So email us. So hopefully, VEO 3 API is a work in progress. We don't have one available at the moment that we can onboard people to externally. We're setting up a bunch of stuff and also working on how we can make it so that the model... The order of magnitude of scale for the API product is just different relative to putting it into a consumer product with a high price point. It's a lot of different dimensions. So we're working on ways to make sure the model can actually work. We have infrastructure to support the model at the scale that we're going to see demand from an API perspective. It's incredible to see all the audio really bringing video to life. Historically, I'd been truthfully pretty skeptical of a lot of the video models as far as how much... It's cool to see the video generated, but you couldn't really... The practical use cases were hard for me because the amount of work it would take to do something meaningful with that video is pretty substantial. I'm curious for Waymark how, I've used the product but I haven't used it recently, how much audio has been an important part of that story. I think it really brings the video to life for me now that it has audio, and the audio feels like it's actually native to what the video is meant to be saying.

Nathan Labenz: It's incredible, first of all. I've been thinking recently that we're quite lucky. I don't think anybody... it doesn't seem like anyone planned this, but there was a lot of hand-wringing about deepfakes and fake voices, cloned voices making calls, including a little bit from yours truly, around the election. That didn't really come to pass, I think mostly because the models weren't quite there yet. We're fortunate that it's landing early in a cycle where we hopefully, by the time the next election comes around, will actually have enough reps and people have built up cultural immunity to it. There will be more guardrails and whatever, such that hopefully we'll be able to deal with it. But it is getting to the point now where I genuinely don't always know if something is AI video or real video. For Waymark in particular, audio has been really important. We have traditionally, and still do, take the approach of just having a voiceover track. We mostly make TV commercials. We mostly partner with big media companies. YouTube ads are a natural part of that. These are all sound-on environments. So anytime you see a TV commercial, there's usually a person talking to you, and then there are visuals and there might be on-screen text and images and what have you. That's our usual approach. We use a mix of providers, but ElevenLabs has certainly been a very important provider for us, and their voice quality just continues to climb. With Veo3, it opens up a new dimension. In the past, we've mostly used images that the businesses have, and then the next step is, if we can bring those images to life by doing image-to-video with even Veo2, then that just makes the whole thing more dynamic. Quality is really important there. I would say Veo2 mostly hits the mark, but sometimes has some weird stuff. Honestly, it is pretty damn good. But with three, now it's, you could even rethink the form factor a little bit. You could imagine having the voiceover talk a bit, but then also flipping over to a clip and having that thing present in a different voice. So it definitely opens up the space of possibilities for us in terms of the sorts of stories we can try to tell. We're mostly telling small local business stories, but there are a lot of different ways to tell them, right? We've been relatively narrow in that space over time just because the technology could only do so much. This is the kind of thing that our creative team sees, and it's up to them to figure out what exactly would you want to make out of this new thing now that you can have all kinds of different voices showing up in a real context like that? Honestly, I think we're still wrapping our heads around it. Also somewhat limited by the fact that we're still just testing it in the actual top-tier Gemini app. My personal AI spend is up to about a thousand dollars a month, which is an interesting thing. I've been saying that for a while, but I hadn't actually gotten there, and now I'm pretty much there between OpenAI, Claude, Gemini all at the top level, plus 20 other things that I've accumulated. It's amazing to be spending a thousand dollars a month on AI subscriptions, but...

Nathan Labenz: Some of these things you have to have.

Logan Kilpatrick: Do you think you're getting that level of value out of them? I assume, given the position you're in, that some of them are duplicative because you want to test all the different stuff. But if you were to remove the duplicative ones and just had whatever the best was across a bunch of different categories, do you feel like you're getting that level of productivity boost relative to what you're spending today?

Nathan Labenz: No question. If I wasn't committed to testing everything and having the earliest point of view on things that I can get, I think I could get a very similar productivity boost for much less. But the productivity boost is still dramatically higher than what I'm paying.

Logan Kilpatrick: Yeah.

Nathan Labenz: No doubt. The acceleration of all sorts of different work. Last week I was traveling a bit and ended up coding two different apps on Replit with the agent doing almost everything for me. It's starting to feel like delegating work to other humans, much more so than a few years ago when we talked about prompt engineering and trying to... Originally, prompt engineering was setting things up so that a natural completion of what you provided would be what you wanted. Now, I literally don't think that much about the fact that this is even AI. It's more just, here are some product notes, and if it messes up, then I'm like, "What? Why did you mess up? Did I mislead you or something?" I have to think a little harder. But I'm really struck by how the communication to the AIs now is feeling much more natural, much more high level. The boost is tremendous.

Logan Kilpatrick: This is the evaluation that I think is one of the most exciting ones to me, which is relative to the amount of money you're spending, how much that... it's hard to measure that, I know. So it is somewhat theoretical, but that is what I think long term as we move away from all the regular academic benchmarks being saturated, etcetera, etcetera. What is the economic productivity created by some of these systems? You need a broad mandate to do something like that just because there are so many, it's infinite possibilities. But it is really interesting to think about that. I do think it's a cool north star to drive up the amount of value you can create in the world in a very positive way through a $20 subscription. The value you get today from a $20 subscription relative to what it's going to be in five years, I think is actually really materially different. It will be cool to see that play out.

Nathan Labenz: Another dynamic that will be interesting to watch is what, if any, stable equilibrium we ever arrive at. Right now, one reason there's so much surplus for me is that we're not yet in equilibrium. To a certain degree, I have superpowers that other people don't have. They could have them, but they don't because they're not aware of it or haven't developed the habits. A lot of it is honestly just remembering to use AI in the moment instead of doing it manually, for whatever version you might be considering. Late last year, during the holidays, there was an audio production project. I wasn't involved in the business side, but someone came to me and said, "I have an audio production project. This company has a big network and wants to do a lot of local radio ads. I thought of you; maybe you could do it." They said, "They usually pay a couple hundred dollars per location, per version of the ad, so it would be in the six figures, but we're wondering if you could do it for less." The discount we provided to this company, relative to what they usually spend, was probably 75%. Still, the revenue per hour I spent on it was probably $3,000 an hour. Not all of that came to me, by the way, but that sort of disequilibrium doesn't last forever. Many people will figure that out over time. I wonder, in that project in particular and in general, if I'm still being compensated based on assumptions that haven't fully accounted for the fact that productivity can, and in some places has, significantly jumped. I think this will happen in the future. I don't know.

Logan Kilpatrick: There will still be those edges in the future, which is interesting. I think the pace of innovation will increase. Going back to my previous comment, that doesn't mean it won't be more difficult, but I do think the pace of innovation will continue to increase. Because of that, there will be many discontinuities and opportunities. Being on the frontier is likely to be disproportionately rewarded because you are using all the tools, which is interesting to see play out. But there are so many edges and opportunities left as the frontier keeps moving forward that even if you're just starting today and think, "I'm not on the frontier," there are probably 50 things you could explore that would be very interesting.

Nathan Labenz: A forced transition. Speaking of things on the frontier and very interesting, let's talk about agents. Everyone is talking about agents in various ways. Here's a horseshoe theory of agents. I've found that the latest things, whether it's Claude, Jules, or any of these more agentic models that take multiple steps and do bigger mini-projects, feel much more like the original ChatGPT in that the mode of interacting with them is very turn-based. The turns are getting bigger, the output is getting bigger, and hopefully more valuable and accurate, allowing it to succeed. But on a one-off basis, you're still responsible as a human for figuring out: "Did it do what I wanted? Did I ask the right thing? Is this actually working for me at all? How do I proceed based on what it did?" You have to evaluate that on a step-by-step basis. In the middle is where I think people are actually getting scalable automation value, where they're not letting the AI choose its own adventure. They're not just saying, "Here are 50 tools and a goal, go." This can sometimes create magic moments, but often doesn't do what you want. In the middle, it's a much more structured paradigm, whether it's LangChain or similar, where you break things down into constituent parts. For example, you might have eight different prompts for eight different steps. There might be a couple of forks or double-back points, so we'll give the AI some discretion to choose its route, but it's a fairly structured system. From what I've seen, those seem to be the things where people are actually getting to the point where reliability is high enough that they no longer have to look at the output on a task-by-task basis.

Logan Kilpatrick: Yes.

Nathan Labenz: How would you coach people as they think about the spectrum from original chatbots, which are now familiar, to workflows, agents, and autonomous systems? How do you see that spectrum, and where should people be? I'm sure you have many thoughts.

Logan Kilpatrick: My take right now is that with reasoning, it's become very clear that a lot of the scaffolding will move into that layer. You'll send a request and provide scaffolding to the model in the reasoning step. Today, it will have access to search, code execution, a code sandbox, tools, and function calling. Models in general are on a trajectory to become agents out of the box, which is very interesting. They'll have all these capabilities baked in. Of course, there will be limits to what it does because you don't build everything into it. But it will, by default, have access to do many things. You could imagine having other hosted tools, which then gets the data flywheel spinning for training the models to do that. You can imagine some of those trajectories you're describing, like the flows the model goes through and how it tries to solve problems, all ending up being upstream to the model. So I do think the models are on that path to be systems and agents out of the box. But the practical reality is there will still always be a need for scaffolding. I think it's a balance of how to make the current version of the product work in a way that likely needs scaffolding, but without building it as a one-way door. If the model can do that thing, you'd have to fundamentally rewrite it. Perhaps coding models are good enough that rewriting everything from scratch won't be difficult, and it will be fine. Historically, if you have a larger product, it ends up being very hard. This is a transition I've seen many companies and products in, a sort of LLM 2.0 transition moment. They built a lot of original tooling because models weren't good at many things. They had all this additional scaffolding, layers, and systems; it was a complex system to make LLMs work in production at scale. Now that models have become so good and can do many of these things natively, you can remove a lot of that complexity, which, depending on what you built, can be very difficult. I think the folks who built the scaffolding and complex system did the right thing because they wanted to make that product experience work. They probably benefited from being AI-native and powered by it from the beginning, hopefully gaining many customers and business. But you also need to ensure you can continue to adapt because models will be able to do more and more, hopefully taking on more of that burden. It has been interesting to see that. I've had an increasing number of conversations with people going through that transition right now. It's been specifically because of reasoning that this has made it possible for many people.

Nathan Labenz: And long context obviously goes hand in hand with that.

Nathan Labenz: We've experienced that at Waymark, especially in image processing. I've told this story repeatedly. Briefly, how hard I had to work at one point to have any minimal understanding of what a random user uploaded image was crazy. Now, feed 100 images into Gemini Flash and it will tell you which ones to use. It's simple. What once was a highly scaffolded workflow, necessary for reliability, is now basically just a prompt for us. That cycle seems like it will repeat. That sounds like what you're describing, and ensuring you're ready to rip out scaffolding and convert it to a prompt as that moment hits for whatever you're building is the recommendation.

Logan Kilpatrick: Josh, I had a conversation with Josh Woodward, who runs the Gemini app and Google Labs. He said this played out for them in NotebookLM as well. Originally, to create NotebookLM audio overviews, it was a 14-step process with many handoffs and steps in the loop, mostly powered by Gemini. Today, it's a four-step process, dramatically simplifying complexity because the models are so good at doing many of those things. They no longer need an entire bespoke system built around writing transcripts for audio overviews, which is cool. You feel that in the product experience too, where it has become much faster. There are many other benefits because you don't need 14 different independent LLM calls that all have to happen in sequence. It's been cool to see the product experience benefit in many ways from this simplicity as the models have gotten better.

Nathan Labenz: Are there any other interesting, hidden gems, underappreciated, or strong trends in the agent space? A2A is something I've been looking into, but honestly haven't been able to fully understand yet.

Logan Kilpatrick: I have a non-agent topic that's interesting, and I'm happy to discuss it. On the agent side, for A2A, the quick mental model is that there are parts of the agent-building ecosystem that MCP doesn't solve, and A2A is trying to address some of those. One example is the auth model and similar aspects. There are still parts of the story for putting agents into production at scale that need to be solved. It's an open question where MCP will go long term. Will it encompass many of those things, or will it leave space for other frameworks or standards to solve some of those problems? I don't think we know yet. I'm watching closely and interested to see what happens.

Nathan Labenz: What else is on your mind?

Logan Kilpatrick: Diffusion. Did you see the demo of the Gemini diffusion?

Nathan Labenz: I had it open here. I didn't get to it, but yes.

Logan Kilpatrick: Did you get to play around with it yet?

Nathan Labenz: I haven't used it. I'd love to have it on that list as well.

Logan Kilpatrick: I'll get you on the list.

Nathan Labenz: It looked unbelievable. First, it makes more intuitive sense to me than the autoregressive model. When I reflect on my own thinking, I feel what I'm doing is much more fuzzy and high-level first, then segmented into parts, and then I try to do those parts. At some level, I'm writing sentences token by token. That resonates far more than trying to sit down and write the whole thing linearly from the first token to last, even with a reasoning model to

Nathan Labenz: or a scratch pad to mess around. Would it surprise me if, in two years, the diffusion paradigm has won because this coarse-defined structure turns out to be better? I don't think so. It's unbelievably fast.

Logan Kilpatrick: Unbelievably fast. I'm really excited. Even if the next token prediction paradigm continues, for people who want to build products with that level of speed, it's unclear at this point what the performance trade-off characteristics will be. Will the cost be the same? All those things. There are many open questions. But assuming you could build product experiences for similar costs and model quality as today, many interesting product experiences could be built if you have that level of speed. I think that's what could enable a personal generative UI experience to truly happen: if tokens can be generated that quickly, rendering on a screen in the blink of a human eye. That would be incredibly cool to see. I'm super excited, and I'll get you on the list for access to that. Even if it doesn't work out, it's a good reminder that we need to push in different directions, because other paradigms could work, and maybe it's not next token prediction. Many properties of these things, like editing, which is becoming more common for many use cases, seem well-suited for the diffusion model.

Nathan Labenz: I suspect in the end it could be quite a bit better for a lot of use cases. It just seems so natural. I always say the transformer is not the end of history. Obviously, there is an attention mechanism in a lot of these diffusion models too, so attention is still part of what we need. What do you think we're missing right now from AGI? Memory is one often cited candidate. What's on your list?

Logan Kilpatrick: Memory is definitely one of those. I think AGI is going to end up being much more of a product experience. I would guess if I have a hypothesis about how people are going to end up having the AGI moment, my assumption right now, and we'll see if this plays out, is that someone is going to release a model that ends up being really good. It's not going to be this thing that everyone says, "We've clearly built whatever your definition of AGI is," which is also the problem that everyone now has a different definition of AGI. It would be easy if we all had the same definition. We don't. So that's the other problem, that it's not going to happen. But I think it is going to be a product experience. Someone is going to, because there's just the model piece of this, and I almost believe you could do a lot of these things today. Obviously there are certain constraints, but I think someone is going to weave together the right components at the product level with a model that's really smart. Maybe the delta in how smart the model needs to be relative to today to actually have this experience work. It could be just that long context is 50% better and reasoning is 50% better, and then you somehow figure out a way for memory to work. The memory piece is actually a completely different engineering, neuroscience, human psychology problem of how you surface the right things at the right time. I think someone is going to build that experience, and they're going to say that the feeling is that this thing is going to be AGI. Again, it's really a product experience enabled by a model, but the model itself isn't able to do all those things. It's what happens when you take the model and you build everything around it and you do it in a really thoughtful way that people are going to say is the AGI moment for a lot of folks. So that's my guess right now. I think the models are doing more and more of this stuff. You could imagine maybe the models are doing the memory stuff themselves and that gets trained into the model. I think that's very far out there. But in the short term, it's definitely going to be a product experience that gets us to AGI, which is not what... I think the AGI narrative is so model driven right now, and I just don't think that's actually how people are going to feel and experience what's going to end up happening.

Nathan Labenz: Really good memory work is coming out of Google, as you might expect. We recently did an episode on the Titans architecture, and there's already been a follow-up to that. Now up to 10 million tokens of memory, and it probably can go beyond, but they've demonstrated up to 10 million tokens with pretty strong memory performance. So it could be coming sooner rather than later, but it is a distinct module, right? Obviously, our brains have many modules too. Okay, maybe a last question, then I'll let you hit on anything else you want. One of the striking moments from IO was when Sergey was asked in a fireside chat what he thinks the future of the web is going to look like in five to 10 years. He almost spit out his coffee at that moment where he said, "Future of the web?" He said, "I don't think we know what the future of the world is going to look like in five to 10 years." That's a striking reminder that even the people pushing the frontiers of this technology don't have a crystal ball and don't really know what exactly we're getting into. So I wonder what your expectation for the future of your life and your job is in the next two to five years. Are we going to get a drop-in Logan replacement? NotebookLM is going to replace me. Do you think you're on the chopping block in the next two to five years for AI replacement as well? Or how do you see this shaping up?

Logan Kilpatrick: I was sitting front row of that fireside chat next to Koree, who's our CTO at DeepMind, and Emmanuel Tropa, who drives a bunch of our infrastructure stuff. It was fun to see their reactions to some of the conversation. I have such a fundamentally human-centric view of the world. Even today, as somebody who builds AI and thinks all the AI products are cool, I write everything I do personally. All the work that I do, every email that I write, every tweet that I write is written through my head. In probably 95% of cases, zero AI assistants are involved in that process. It's because I have conviction in my worldview, and it's because I have conviction in my tone. I think maybe you can make a loose approximation of someone, but the reality is I want to be the entity which has agency over the things that come out around who I am, because I think it's fundamentally... I think people will end up having this fundamental question for themselves: who do they want? Even if I have this digital twin which knows all the things and maybe it can make loose approximations, and I could say, "That seems reasonable. I could potentially see myself saying something like that." Do I actually want that thing going and saying those things on my behalf? Probably not. That is a very foreign concept to what humans do today. I think maybe the only exception, or one notable example against this, is people who run companies. I can imagine you have a large company. You say, "Some team or some person is representing Google," as an example. Someone might say, "Maybe I wouldn't have said that way," or, "I wouldn't have phrased it that way." They are still representing Google as a whole, but they're an independent agent on its behalf. I think unless you've had that experience, that's still fundamentally different than the human desire for some other entity representing me. It's not clear to me that people are really going to want that experience. I personally right now don't want that experience. That's just my personal opinion. That's the decision I'm making. But I do think it'll be interesting to see where the balance is as far as how much people do that. This goes back to, and I have a bunch of these random convictions on my personal website, but I think the value of humanity, for you actually, as an example, Nathan, I think this podcast is exponentially more valuable in a world where AI can generate human-sounding things and can analyze content and put together research reports. The reality is the next token prediction coming out of all of those systems isn't the next token prediction that's coming out of your brain. The thing that I care about is what's the next token prediction, or the diffusion, if you want to use that example instead. The diffusion thoughts that are coming out of your brain, that's what I care about. I care about your perspective because you're another human, and we have shared lived experiences, and we've done stuff together in person, all that stuff. I think there are places where you won't care about that because of whatever, maybe the type of content, or there are certain dimensions where that won't matter as much. But I really do fundamentally believe humans are interested in what other humans have to say. When I think about someone sending me AI content that was written by AI or generated by AI, I just care a little bit less. I'm not really that interested. I can tell if they're not willing to put in the craft and the time to do something, why am I that interested in it? Again, there are exceptions to this. Software is a great example. If someone builds great software, do I care whether or not a human wrote that or not? Not really. Maybe in some cases, I'd appreciate it more if a human did it. Maybe not in some cases. So it is interesting. There will be a spectrum. I'm not worried. I think the way that people do work will shift in some capacity, and I think the value of having a differentiated perspective is also going to be incredibly beneficial in a world where intelligence isn't the limiting factor in a lot of ways. All that's to say, I'm excited for another five to 600 podcast episodes from you over the next two to five years.

Nathan Labenz: Thank you. We'll see how many I can address for you. I think there's a lot to appreciate in your thoughts there. I'm with you on some portion of it, where there's this whole notion that if people don't have jobs, they'll have no meaning. I'm not on that train. I definitely agree that one of the great things about AI could be that it allows people to put much more emphasis on making connections with each other. At the same time, I don't know. NotebookLM is getting awfully good, and it can handle any topic on demand. I notice that in myself. NotebookLM, not in a huge fraction yet, is starting to eat away at my listening to other podcasts. There are times when I want to know about something, and nobody has done a podcast on it yet, but NotebookLM will. It has that background knowledge. So even if it's a little worse in some ways, or maybe... And of course, the expressiveness of the voice and all that is getting pretty good too. It hits the mark-

Logan Kilpatrick: But I think-

Nathan Labenz: ...in some other ways that really matter.

Logan Kilpatrick: I think the fundamental... This is actually, and I was just listening to Sundar talk about this, and I think it was a good example he gave, which is around a great search versus AI chat product experience. Everyone two years ago thought, "Oh, now that ChatGPT has hundreds of millions of monthly active users," but that hasn't remained true. Across even other products, I'm sure there are more hundreds of millions of users. Google searches are growing; the number of queries on search is growing. The search business is still growing because they're actually, in some sense, solving fundamentally different problems. That NotebookLM example you gave, "I want on-demand entertainment about this very specific topic," and maybe no one has created that type of content before, is a different use case. Maybe this doesn't fully track across all podcasts, but there are lots of podcasts I listen to where I just want to hear what this person has to say about something. That's what I care about, and I'm willing to listen to them talk about whatever. But if I'm trying to learn about a very discrete task that I know nothing about, the chance that one of my top five favorite podcasts has talked about that is probably slim. So you need some other mechanism to do that. I think this, and maybe that's not fully true, so I'm curious what your reaction is to that, but I think that is going to play out across a lot of other domains and dimensions. This thing is actually net additive and creates... It puts pressure in some capacity because there's a limited amount of time in the human day, but it doesn't end up being as disruptive as it would look on paper.

Nathan Labenz: I hope not. That time limit is a very hard constraint as it stands right now. I've recently started to increase my listening speed. It had been 2X as the default, and YouTube just increased the mobile max speed from 2X. I don't even know what the max is now, but you can go well beyond two. So-

Logan Kilpatrick: That's crazy.

Nathan Labenz: Now I can listen to things at 2.5X by default and save myself another six minutes. That saves me six minutes on a one-hour piece of content. This is how I'm trying to pack more in. I don't think I can go too much farther down that path. At some point, in terms of competition for time and attention, you hit some fundamental limits. Maybe we get Neuralink working, and that's the next big unlock where the bandwidth increases so dramatically that all bets are off. I did think there was a credible line of thought that upgrading human cognition in very deeply integrated ways is going to be necessary. That's sort of Elon's brief pitch for why he founded Neuralink: to be able to go along for the ride with AIs. Especially when you see these diffusion models. Already, the autoregressive ones are faster. Gemini can write much faster than I can read. And then the diffusion models are another order of magnitude faster. The speed of it all is just going to be another wild thing to contend with. Now you have these video models. I just saw one that is generating video dynamically in real time, and you interact with it. I share a lot of the excitement and enthusiasm, but I also wonder if I can compete with all this stuff. It just seems like it's going to get really good at everything, be ubiquitous, and be so personalized to each individual user, knowing what they know and what they don't need to know. How do I compete as someone who's a one-size-fits-all with an audience that's not huge, but certainly enough people that I can't customize the podcast for each one, when AI can do that?

Logan Kilpatrick: The point is people want the Nathan experience. That's my fundamental bet and conviction: that's long term why, in a world where I could spin up a thousand podcasts that look similar to yours, people want your perspective. There's value in that, even if it's not the highest optimization of content delivery. That's my bet. We'll see if that ends up being true, but I have conviction in that bet. Hopefully, it will turn out right.

Nathan Labenz: I hope you're right. I know many people want the Logan experience, and I know we're over time, so I appreciate you sharing so much time and info with us today. Look forward to doing it again in the not too distant future.

Logan Kilpatrick: This was great, Nathan. Thank you always for having me. It's fun to chat, and hopefully I'll see you in person again soon.

Nathan Labenz: Cool. 303 Diffusion model, put me on your list.

Logan Kilpatrick: I will.

Nathan Labenz: And with that, Logan Kilpatrick. Thank you for being part of the cognitive revolution.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.