The Model Eats the Scaffolding: DeepMind's Logan Kilpatrick & Tulsee Doshi on 3.5 Flash, Omni & More

Google DeepMind's Logan Kilpatrick and Tulsee Doshi discuss Gemini 3.5 Flash, Omni video generation, and Gemini Spark, plus Google's emphasis on efficiency, agent harnesses, context limits, model psychology, AI welfare, and recursive self-improvement.

The Model Eats the Scaffolding: DeepMind's Logan Kilpatrick & Tulsee Doshi on 3.5 Flash, Omni & More

Watch Episode Here


Listen to Episode Here


Show Notes

Logan Kilpatrick and Tulsee Doshi of Google DeepMind join for a first-ever in-person episode recorded just days before Google I/O, covering headline launches like Gemini 3.5 Flash, the Omni video generation model, and the new Gemini Spark agentic product. The conversation digs into Google's strategic decision to lead with cost-adjusted efficiency over raw capability, how DeepMind now ships a full agent harness rather than bare models, and technical questions around context window limits and knowledge cutoffs. They also explore how the team thinks about model psychology, AI welfare, and recursive self-improvement.

LINKS:

Sponsors:

Brave Search API:

Brave Search API gives AI agents a fast, independent search index for research, RAG pipelines, images, places, and fewer hallucinations. Get $5 in free credits at https://brave.com/search/api/?mtm_campaign=q2-26-cognitive-revolution

Sequence:

Sequence handles the full revenue workflow for complex pricing, from quoting and metering to invoicing, revenue recognition, and collections. Book a public demo at https://sequencehq.com and use code COGNISM in the source field to save 20% off year one

Roboflow:

Roboflow is an end-to-end visual AI platform that lets you turn raw ideas into fully deployed applications in just hours, powering breakthroughs like Blueprint Pro's floor-plan understanding tool. Read the full Blueprint Pro story and see how over a million engineers are building the next wave of visual AI at https://roboflow.com

Claude:

Claude by Anthropic is an AI collaborator that understands your workflow and helps you tackle research, writing, coding, and organization with deep context. Get started with Claude and explore Claude Pro at https://claude.ai/tcr

CHAPTERS:

(00:00) About the Episode

(03:16) I/O launch roundup

(10:08) Model lineup strategy

(17:27) Agent harness strategy (Part 1)

(17:32) Sponsors: Brave Search API | Sequence

(19:56) Agent harness strategy (Part 2)

(23:43) Recursive research partners

(30:00) Scaling AI integrations (Part 1)

(30:24) Sponsors: Roboflow | Claude

(33:04) Scaling AI integrations (Part 2)

(40:33) Omni video generation

(43:09) Gemini as collaborator

(48:10) Search and context

(53:53) Diffusion and audio

(57:43) Episode Outro

(01:00:23) Outro

PRODUCED BY:

https://aipodcast.ing

SOCIAL LINKS:

Website: https://www.cognitiverevolution.ai

Twitter (Podcast): https://x.com/cogrev_podcast

Twitter (Nathan): https://x.com/labenz

LinkedIn: https://linkedin.com/in/nathanlabenz/

Youtube: https://youtube.com/@CognitiveRevolutionPodcast

Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431

Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk


Transcript

This transcript is automatically generated; we strive for accuracy, but errors in wording or speaker identification may occur. Please verify key details when needed.


Introduction

[00:00] Hello, and welcome back to The Cognitive Revolution.  

Today, after some 340 episodes, I am very excited to share the first episode that I've ever recorded in-person, with fan favorite Logan Kilpatrick, Member of Technical Staff at Google DeepMind, and Tulsee Doshi, Sr Director and Head of Product for Gemini models.

The occasion for this conversation is Google's annual I/O event, where they're launching the new Gemini 3.5 Flash model, all sorts of agent infrastructure and AI product integrations, and plenty more.

We recorded on Friday, May 15, just a couple days before the event, and while many at Google – including my brother, Craig, who's giving a keynote on Wednesday – were working overtime to polish their demos and presentations, the overall vibe, at least compared to the rest of the AI space, was one of relatively relaxed confidence.

And why not?  From 2024 to 2025, Google grew annual revenue by $50B dollars, as much as Anthropic is pulling in today – and they still have 25% of global compute, the deepest pool of research talent, and the most comprehensive AI portfolio of any company, with top-tier positions not just in language models, but also self-driving cars, medical & life sciences, and robotics.

So, after discussing the headline launches they're announcing this week, which include a new video generation model called Omni which they hope will create a nano banana moment for video, a new & improved and more agent-focused Antigravity, and a product called Gemini Spark which will bring more agentic functionality to the Gemini app, I really wanted to dig in on Google's overall AI strategy and philosophy.  

We discuss their decision to lead with the Flash model and generally emphasize the cost-adjusted pareto frontier, while Anthropic and OpenAI are more focused on competing to have the most capable models in absolute terms.

We talk about how DeepMind is no longer shipping models and leaving it to product teams to figure out how to use them, but instead now providing a robust agent harness that should help elevate and standardize AI experiences across Google's vast product surface.

We get into the weeds on questions like why context windows seem to have mostly stopped growing, why Gemini models' knowledge cutoff is now more than a year ago, and what happened to the Diffusion model line of work.  

And perhaps most importantly, we discuss how the team at Google relates to the AIs they're creating, how they think about things like model psychology and welfare, and their views on Recursive Self-Improvement, which as you'll hear, is definitely a part of their plan, but not something they seem to be so singularly focused on as other AI leaders.

Overall, I think this is a great window into the thinking that underlies Google's AI research and product development, which has sustained the company's historic run far beyond the point that many analysts wrote them off.  

With that, I hope you enjoy my first-ever in-person conversation, with Logan Kilpatrick and Tulsee Doshi, of Google DeepMind. 

Main Episode

[03:17] Nathan Labenz: All right. we are here live at Google headquarters in the library at Gradient Canopy, the first ever in-person recording of the Cognitive Revolution. Logan Kilpatrick and Telesi Doshi, welcome. Thank you. This is an honor.

[03:29] Logan Kilpatrick: I didn't realize this was the first in-person episode.

[03:31] Nathan Labenz: Yeah. 350 plus. And it's all been from my home office in Detroit until today.

[03:36] Logan Kilpatrick: That's awesome. Well, thank you for being here. This is a crazy space, especially around I/O. It's a zoo.

[03:42] Nathan Labenz: Yeah, it's always it's always a good time here at Google HQ So. You may or may not remember the no moats memo. We've just passed the three-year anniversary. It was May 5, 2023. And in the intervening three years, Google has added $3.5 trillion in market cap, which is more market cap than all but two other companies in the world. Those 2 are Nvidia and Apple. So the moats, I'd say, are holding up. Here we are at IO and I'm sure there are going to be some exciting new things that will be deepening the moats. So first question, tell me what are we launching this week to try to deepen those moats?

[04:25] Tulsee Doshi: A lot. So a lot of exciting stuff. So let's see, let's start with some of the modeling side of things because that's really exciting. We have our 3.5 series coming out starting with 3.5 Flash. at IO. We're really excited about 3.5 Flash because I think Flash does this really awesome job of being at the sweet spot of being really smart while also being really fast and really cost effective. And so Flash is incredible. It is like three times faster than other of the large models. significantly cheaper for being able to still drive these really awesome, magentic encoding workflows. And we've been using it internally a lot, which has been really fun to kind of see that play out. So that's one big piece, which is 3.5 Flash, we're really excited about. We're also releasing Omni, Gemini Omni Flash. which is a video generation and editing experience. What's really exciting about Gemini Omni in general is it's our push towards being able to bring all modalities in and all modalities out. And the 1st way this is really manifesting is in this video editing context. So you're going to be able to make really awesome videos. You're going to be able to put your own avatar into the videos, which is going to be awesome. I've been having a bunch of fun playing with that too. We're continuing to upgrade anti-gravity and bring more into the developer experience. So Logan can talk more about the developer experience overall, but 3.5 Flash and anti-gravity are really going to come together to build something great there too. And then we're in about a week or two, coming soon is Gemini Spark. which also builds on 3.5 Flash to build kind of more agentic experiences into the Gemini app. So we've got a slate of cool things coming.

[06:16] Logan Kilpatrick: Yeah, I think beyond the models, I think the other headline of the story is just like agents, agents, agents. Like the meme of Sundar from two years ago or last year, he's saying AI, AI, AI the whole time. I feel like this year is agents, agents, agents, agents. And I think it's cool to see like this. I think this is the first year where we have And actually, I think this is not just us, but just ecosystem-wide, this model harness product symbiosis that's sort of taking place. The model is sort of trained with the harness. The harness is powering the agentic product experiences. Gemini Spark in the Gemini app sort of being one example of that. It's powering the vibe coding stuff in AI studio. It's powering the agents API for developers. It's powering, I think there's something else maybe that it's also maybe not. It'll roll out to other products across Google and sort of be this sort of like foundational layer to build on top of, which is really exciting. So I think not just like developer products, but our sort of consumer products, and I think probably even more widely in the future across the rest of the Google product suite. And I think actually there's one interesting thread of this, which is Historically, Google didn't have this through line of something that carries across all of our products. I think then it was Gemini and then sort of all of a sudden every Google product has Gemini and sort of getting them all stitched together and making all those products experiences great. And I think now you're seeing that again with the anti-gravity agent harness and sort of as products become agentic by default, you now have the anti-gravity agent harness being another through line through all of our products. which is really interesting.

[07:49] Tulsee Doshi: And I think what's been really fun from the modeling standpoint there is one thing that we did with Gemini 3 that we're really continuing, I think, with the 3.5 line is really bringing the model to all of our products. So 3.5 Flash will be in Gemini App. It will be in AI mode in Search. It's also powering anti-gravity. It's also powering agentic experiences in AI Studio in Gemini Spark. And so I think really this idea of how do we build a model And then how do we build it in partnership with the harness such that it actually works across all of these product surfaces, which actually have very different users and very different goals, I think is actually really... Really awesome.

[08:28] Logan Kilpatrick: It's really hard too. I think it's actually gotten harder to do. I think it's almost like it was maybe tongue in cheek. It was kind of easy before because you just launched the model on a couple of services and sort of it wasn't that bad. I feel like now it's like you're sort of you have the constraints of the very wide array of Google products that are just like. for totally different users and sort of I think actually that credit to the model team sort of like trying to find the fine line for all these different places because we're not just building for search, we're not just building for developers, we're not just building for cloud customers, it's not just for the Gemini app, it's like all of them at the same time, which is just exceptionally a lot of work to pull off that story on a consistent basis, which like from Gemini 3 forward has been the story, which is which is exciting.

[09:14] Tulsee Doshi: I think one other thing I'll say is like one thing that's cool about this IO too is what we're doing across modalities. So, you know, I think Logan said agent, agent, agent, which I think is true. This IO is really about bringing models to action in that kind of real world sort of use cases. But I think what's also cool is we have the flash model, which is really about building these kinds of coding and agentic use cases. We have Omni, which is really about kind of what is this like multimodal vision look like. And then also actually Gemini Live is getting an upgrade too. Gemini Live is getting faster, it's getting smarter, the model is much better at detecting background noise. So it really does actually feel like a partner in a lot of ways. And I think it's kind of cool that we're also able to draw this through line across the different ways you might want to interact with a model and the different kinds of ways you might want to consume content, which I think is also really cool.

[10:09] Logan Kilpatrick: Okay, I've got like 7 different directions that I want to go. How about in follow up? I'm trying to seed them all for you.

[10:14] Nathan Labenz: Yeah.

[10:15] Logan Kilpatrick: Let's start with just the model.

[10:18] Nathan Labenz: So it's interesting to start with Flash. One thing that I recall, I don't know if it was 2 IOs ago or whatever, right, but there was going to be 3 sizes of Gemini model at one point in time.

[10:32] Logan Kilpatrick: There are three sizes. Yeah. Pro flash flashlight.

[10:35] Nathan Labenz: Yes, we never saw the ultra.

[10:37] Logan Kilpatrick: It's kind of what I'm what I'm alluded to.

[10:41] Nathan Labenz: Yeah, three were promised and then we added, we took one off the top and added one at the bottom.

[10:45] Logan Kilpatrick: Deep think 2, deep think 2 actually, which is like a fourth scaling dimension from a model perspective. Same.

[10:50] Nathan Labenz: That's A runtime scale though.

[10:51] Logan Kilpatrick: It is for sure. Yeah.

[10:53] Nathan Labenz: Well, I guess two questions on like why no ultra. One is like Is it a compute limitation? How are you guys thinking about which model to release? In addition to the 4.5 trillion or 3.5 added 4.8 trillion market cap, Google enjoys 400 billion a year in revenue. I was interested to learn. And profit though is growing extremely fast. Like they might hit 100 billion at the end of this year, maybe even in the third quarter, who knows? And it seems like the revenue there is really driven by people's extreme willingness to pay for the very best model that they can get their hands on. Maybe not at any cost, but like relatively price insensitively. So I'm wondering like, why no ultra? It seems like that would be just a killer and yet we haven't seen it.

[11:42] Logan Kilpatrick: I promise I didn't plant this question. I'm always, you know, poking Tulsi on the side. I love it. This is my favorite question, so I'm glad, I'm glad we didn't.

[11:51] Tulsee Doshi: No, I think you know you're right that I think. there is a slice of users who are definitely willing to pay for a certain level of quality. And I think we really do believe that the pro model has been really pushing that quality. But I also think for us, we've seen so much value from the flash and the flashlight dimensions because we also see an extremely large number of users, especially if you're thinking about building, for example, consumer applications. If you think about the Gemini app, if you think about search, when you're serving to that kind of scale, latency really matters. cost matters, right? Because actually you find that users aren't willing to wait, right? So we find that even when we tweak the model and hurt latency, we actually see that play out in our live experiments on search and the app, even if the model is hugely better from a quality perspective. Because what you're asking users to do is wait. And so I think for us, like part of the reason why we ended up introducing this flashlight skew that wasn't necessarily part of the, you know, the original 2.0 series was because we really felt like there's actually a large scale demand for this, depending on the types of use cases, especially when you're talking at that scale. And so I think for us, it's really important that we're pushing the full range of what kinds of customers we can serve, both internally and externally. Like for our products, the flash and flashlight skew matter a lot for our ability to actually serve to the Google populace. And so we also imagine that that's true for external enterprise and developers. And I think that's played out to be true as we've been but actually seeing this in action.

[13:22] Logan Kilpatrick: I think the two things that I'll add is, and there's like probably a more nuanced technical story on sort of like the ultra thread, but it's not like, it's also not like the pro models haven't scaled up over time. So like I think there is like, there's, you know, there's a story that you can spend at the end of the day, the naming of these things is like marketing. Like they definitely are getting like extremely capable, they're getting larger, you know, they're getting more powerful. There's you know the test time scale, test time compute scaling with deep think, et cetera, et cetera, and all types of stuff in that dimension. So I think it is possible you could sort of like put the ultra brand on some of these things. I think we've decided, I think the decision so far has been not to do that. But it hasn't been that like we haven't kept scaling up. So I think it definitely has.

[14:04] Tulsee Doshi: Yeah, there's almost been a conversation every time we scale up of like, should we call it ultra? And what does that brand mean? Because we could. But there's sort of also a question of how do we keep consistency for users also kind of series to series?

[14:21] Logan Kilpatrick: Yeah, and I think actually also to re-articulate a point that Tulsi made, like Google and specifically Google DeepMind's mission is to like build AI responsibly and make sure it benefits like all of humanity. And I think like that is like so deeply tied to the like Google product surfaces in which like we're serving, what is it like 8, 2 plus billion user products or whatever it is. And so at the same time, obviously the frontier matters, obviously having great models that are really expensive and really, really intelligent matter, and there's tons of use cases for that internally and for our customers. You also need to do the scaling up to billions of users for us to actually do the thing that Google needs to do to achieve the mission. And I feel like we've done a good job hopefully of trying to walk the fine line of actually continuing to push the frontier and build great flash models. And I actually think those two things are like more tied together. You know this better, more than I do, but like more tied together technically. Like it's, you know, it's hard to make great flash models if you actually don't have a great pro model and vice versa. So we'll definitely keep pushing the frontier on both of those things.

[15:30] Nathan Labenz: I think the perception from outside is by analogy too, it's hard to make a good flash model if you don't have a good pro model. People sort of think that there's like an ultra model internally that's the mega training run that's then being used to like help train pro, which maybe in turn is being used to help train flash. Is that true? Is there like a bigger thing inside that is only for these sort of distillation to the mid-size?

[15:55] Tulsee Doshi: I mean, we definitely use distillation as a way of kind of bringing down bringing down our sizes. So you will see that like pro influences flash, influences flashlight. We also do the reverse where we scale up, right? So you take the pro, you take the flash recipe and scale up to the pro recipe, for example. And we do have, I think what's been really fun, especially over like seeing as we've used even anti-gravity into this point of Logan made with the harness, I think we've been seeing a lot of examples actually of leveraging pretty awesome models to drive progress internally. Actually, like one thing Varun demos on stage on Tuesday is basically like being able to leverage a bunch of sub-agents to go and complete a bunch of tasks and come back. And you can actually try that as like an early preview in anti-gravity today if you go to slash teamwork. And that's an example, I think, of something we've been using internally, which is an extremely smart model. And it leverages both the combinations of the best of Gemini 3.5 as well as inference techniques. And you're able to actually accomplish so much. And I think that's the kind of direction I'm excited for us to go into more. So I think we're kind of pursuing all of these fronts. We're scaling up from the pre-training and kind of frontier perspective. And I think that's been really continuing to show gains. There's a bunch we're doing on the post-training side. And then there's also just a bunch we're pushing on the inference side. And then that plus trying to make sure we're working with the harnesses. I think we're gonna keep getting things that we're using internally that we even start to push out externally through previews and this.

[17:32]Brave Search API: Brave Search API gives AI agents a fast, independent search index for research, RAG pipelines, images, places, and fewer hallucinations. Get $5 in free credits at https://brave.com/search/api/?mtm_campaign=q2-26-cognitive-revolution

[18:49]Sequence: Sequence handles the full revenue workflow for complex pricing, from quoting and metering to invoicing, revenue recognition, and collections. Book a public demo at https://sequencehq.com and use code COGNISM in the source field to save 20% off year one

Main Episode

[19:57] Nathan Labenz: So let's talk harnesses. It seems. I was just talking to Andrew Lee, who's the founder of Task the other day, and he said, fundamentally, everyone these days is building the same thing. They're trying to all build the general purpose drop-in knowledge worker. And so that's got to have the intelligence at the core, and that's got to have all this, he calls it the mecha suit that is built around it. So this harness sounds like the mecha suit that you guys are developing in-house. And I guess first question is like, Is this going to create silos? we've lived in this world so far where I could kind of mix and match my models and my infrastructure, right? I could go to LangChain or I could use Tasklet, I could use whatever, and I could pick whichever model and plug them in. But as they get more deeply co-trained with the harness, does this create kind of siloed worlds where you're kind of all in on one Frontier model company's stack or another? And if so, that would have pretty significant implications for kind of switching costs and stickiness and pricing power of the Frontier model creators. What's your take on how sticky things are going to get?

[21:06] Logan Kilpatrick: It's a good question. I mean, I think Again, Chelsea probably knows better than me on this, but I think the best case is like you can do both. Like the best case is like it works really well for Gemini and sort of we can sort of do the things we want to do to scale up because we do have sort of control over the sort of full stack AI story as Sundar likes to say. But then also it generalizes across other stuff. Like I think the developer ecosystem, people want choice, people want to have flexibility in these tools. lots of use cases. Actually, there's like philosophical questions of how good really is your model if it can't generalize to sort of other harnesses. But I don't know how much.

[21:44] Tulsee Doshi: Yeah, I think that's the right, I think I fully agree. I think actually like maybe to double click on what Logan said originally, right? The benefit of the full stack that we have is we can hopefully build a really seamless experience. And you get the best of Gemini, you get it working in the most effective ways for you. get it working in a way that is intuitive, is smart, is fast. And so that also helps us then train the model to be better. So this becomes this flywheel that continues to power the model. At the same time, I think we don't want it to only be the case that the model works in a single harness. So we want any of our enterprise customers or a developer who's building their own use case to be able to leverage Gemini effectively. And so it is important then from a model standpoint that we're training in such a way that we actually, we sort of call it like harness diversity. We should be able to support a range of different approaches to tooling, to different approaches to orchestration, et cetera. But I think what's helpful about this approach of kind of co-training and building that flywheel, it's easier to debug. It's easier to think about data collection. It's easier to eval. You can just move at a faster pace. And I think we're seeing that across the industry. And so finding that balance is important, but I think it just helps build to make the model better.

[23:01] Logan Kilpatrick: Yeah, I think there's a good, this is also a good pitch for like a harness bench. If that's not a benchmark that exists, let somebody somebody build harness bench. Yeah, I would love to would love to collaborate if folks are interested in that, because I do think it's like a great test of has sort of this perspective from a game for games. Actually, as an example, like if models are so good, like why can't they play games really well? And sort of if models are so good and we're actually approaching AGI, like why even if you do sort of the model harness training symbiosis, you still expect it to generalize reasonably well in other harnesses. If you can't, that's actually like, it's another sign of sort of the jagged intelligence. So I think it'd be cool to see this like play out from an actual benchmark perspective.

[23:43] Nathan Labenz: Could be also perhaps productized as an RL environment and sold in to you guys that way. It's quite the cottage industry these days. So obviously the other big thing that I think is very much in the air, and actually the reason I'm here this weekend, when we were originally planning to do this remotely, is I'm going to this event called Recursive, where the topic is going to be recursive self-improvement and hopefully how we can navigate it successfully. How bought in is Google DeepMind to recursive self-improvement? When you talk to anthropic people, it's like they're almost religious about it, and also see it as totally inevitable. OpenAI has this later this year and in early 2028 timelines for an ML intern and a full-fledged AI R&D employee. Do you guys have milestones or timelines for when you're going to hand off the ML research to AIs?

[24:41] Tulsee Doshi: I mean, we're already using Gemini pretty deeply internally to improve Gemini. And so I think that is very much a theme for us, which is like, how can Gemini actually be a part of the Gemini development process? And so that can include things, I think that goes the full range from, helping us be more productive. So that's obviously like the simplest part of this to actually like, submitting CL that would actually like run an eval that would actually, suggest a research improvement that would actually drive improvements to Gemini itself. And I think there's a lot of ambitions we have to keep pushing in that research direction. So I think very similar to the other labs, I think this is very much an area of investment for us and an area we're super excited about. I think for me, what I'm really excited about is like, I think there's this really awesome research partner opportunity that we have with Gemini, right, for it to help us with creative ideas, for it to like help us test things faster. Actually, like, it was awesome, one of my coworkers Anka, she's our lead for safety and alignment. And the other day, she, I think maybe a couple days ago, she pinged me from her hot tub and she was like, I could run all of these ablations from my phone because I could kick off a bunch of things to actually ablate Gemini to test for a bunch of these issues to see how some of our SIs differ or some data ablations differ. And here's my report. And I could do all of this in the last hour. And that is amazing. And that's the kind of thing that we can already do, right? So then imagine where we'll be in six months, a year, two years from now.

[26:19] Logan Kilpatrick: Yeah, I feel like it feels like at least my personal perspective is it's like a very much more like practical perspective, which like it's like as obviously as models get coding, they're going to go do things that is code related. It's going to they're going to help us build our products. They're going to help us train models. I think all the nuance of the story is in like Sort of like, where is the human sort of in the driver's seat of this stuff, and I think we are like the tools are built for the human to be in the driver's seat, which I think is an important thing as sort of we continue to go forward, and also I think very genuinely though, and... I think the model team and the researchers feel this more than ever. Like you definitely, I think the near-term horizon is going to continue to be the human in the driver's seat because the cost of these runs and the opportunity cost of going in the wrong direction and putting a bunch of resources is super, super high. And so I find it doesn't seem like super realistic in the short to medium term that you're going to just like be letting large-scale pre-training jobs be kicked off by the ML intern and it's going to cost you, many, many dollars and lots of compute and taking it away from sort of the human researchers. But this deep collaboration between AI and human researchers, I think is super obvious.

[27:37] Tulsee Doshi: Yeah. There's also something really amazing about how much that collaboration allows you to then focus on what is the interpretation of what you're seeing in the results. Where do you really want this to go strategically? And so it changes a little bit of the role that the human can play. which I think is also really powerful for our teams.

[27:56] Nathan Labenz: When you're doing research, are you actually typing any code these days?

[28:01] Tulsee Doshi: So it's interesting for me on the product side, like on the code side for any code that I was already submitting, I am mostly relying on anti-gravity and doing like bits and pieces, more so bits and pieces myself. But it's also been really cool to like start having the model generate slide decks to start generating actual kind of content from my thoughts. We actually in anti-gravity today, we introduced the Gemini Mike. So there's this like really awesome feature. I don't know if you've been playing with it internally, where you basically like ramble at the model. So you like share a bunch of your thoughts in whatever kind of loose form it is. And then the model actually leverages that to take action. And for me, I've been finding that so much more powerful because I actually feel like I think a lot by talking. And so for me, it's actually like a very, it's like a very cool moment where I can be like, okay, I'm just going to sit here, tell you what I'm trying to think through in my head, and then have you actually bring that back to me in a way that is like reasoned and well thought out.

[29:07] Logan Kilpatrick: Yeah, I feel like this correlates so well to like, I would love to see like a breakdown of like human type code versus like AI generated code versus maybe there's like a divergence, which is like audio input that then generated And it actually very interesting, to your point, Tulsi, is like I feel like audio input to being to generated output code has got to be like one of the fastest growing like input modalities of what's happening. And I find myself doing this all the time. And like it is like the predominant way that I'm building software, at least when I'm not around a bunch of other people. I'm still typing things in so that it's not rude. And yeah, they don't hear my dumb ideas of the things that I'm trying to do.

[29:49] Tulsee Doshi: I don't know. You see, if you walk around sometimes upstairs, you'll see people kind of muttering at their hands. Because they're now actually talking to create code, which I think is pretty cool.

[30:00] Logan Kilpatrick: It is cool.

[30:00] Nathan Labenz: Yeah. One of my KPIs for myself for this year to really know if AI is improving my life is am I getting outside more and getting more exercise? And I'll, I'm starting maybe a little bit. I wouldn't say I've won the game just yet, but I still want to be able to like. get my thoughts out. So I think that is like the absolutely the frontier modality for me. So with the harness, you said it's like now becoming this through line, it's going across all Google product services. I would say, as I'm sure you're well aware, like commentary on Google's AI integrations across its vast product suite has been that it is characterized by like some bangers and then there have been some which have been characterized as misses. So presumably one of the benefits of the harness is that it's going to make it a lot easier for a sort of more standardized approach and kind of general high quality bar across all these integrations. What would you say people should learn from the experience that Google has had to raise their own bar as they're going to go try and do these integrations themselves?

[30:24]Roboflow: Roboflow is an end-to-end visual AI platform that lets you turn raw ideas into fully deployed applications in just hours, powering breakthroughs like Blueprint Pro's floor-plan understanding tool. Read the full Blueprint Pro story and see how over a million engineers are building the next wave of visual AI at https://roboflow.com

[31:13]Claude: Claude by Anthropic is an AI collaborator that understands your workflow and helps you tackle research, writing, coding, and organization with deep context. Get started with Claude and explore Claude Pro at https://claude.ai/tcr

Main Episode

[33:47] Logan Kilpatrick: I think this is actually such a great story for us. I think very practically, Google has done a ton of this infrastructure standardization across the AI stack over the last couple of years, which I think has been awesome. And I actually do the story. It is like one of the threads of how we're able to land the Gemini 3 models across so many more products is actually because of this infrastructure standardization that happened. And so we've gotten a lot of, it's painful and difficult and there's of course lots of work involved in doing it. But if you sort of pay that cost, you actually do end up getting this. And I think the advice for for people who are in this position and sort of thinking about this is basically every 12 to 18 months now, like you have to rewrite everything from scratch. And so the best case is like you don't want N number of teams rewriting everything from scratch every time the paradigm shifts. And the example historically, the infrastructure was just like serving raw models and you'd get tokens in and you send tokens out. Now it's like there's a bunch of agentic infrastructure and there's tool loops and there's all these other things happening inside of the harness. And so again, you don't actually want, you want innovation, but you don't want every team to have to go and reinvent that from scratch. And so the fact that like, X team across Google who just wants to ship some really cool agentic product doesn't need to think about like the nuance of all the details of the tool calling loop, et cetera, is a huge acceleration for them to like just go focus on building a great product. And I think it's, hopefully we see that. Like I don't know if like a lot of the agentic stuff we were landing at IO like would have been possible if we if we hadn't have had sort of some of that infrastructure standardization across the harness and the model delivery.

[35:31] Tulsee Doshi: I think the other thing I would say as far as lessons learned is there's really no substitute for being able to just experiment and iterate quickly. So I think this goes to all of Logan's points about the foundation being strong, but I really think what has helped us is really being able to put in, for example, a new model iterate really quickly with a product on like, hey, what are the right prompts that would actually make this model viable for a different situation? What are the ways to kind of prototype really quickly with this model? What are the ways to get it in the hands of even just internal users quickly, let alone external users? And I think that is something that is now more and more possible with kind of layers that are consistent across the team. I think it's pretty amazing to see the speed at which we can go from having a checkpoint that we're really excited about to putting it in the hands of internal developers to then seeing it come to life in a product. And then only when you see it come to life in the product do you really start finding its rough edges and to be able to actually then kind of come to terms with how you do that. And so more and more than it becomes like, okay, how do you have the right ability to tune prompts quickly? How do you have the ability to run really good live experiments where you can get really good data and feedback quickly? How can you build evals that help give you real signal? Those are the things that will speed up your progress of quality the most because it will give you the ability to actually get to the kind of product that you love. And I think if you think about NotebookLM, I mean, that team really understands the model. Like they are just like, I mean, you talk about a Banger product, it comes from like a Banger team. Like they are really good at being able to take the model and play with it quickly and prototype quickly to get to something amazing. And I think you see that actually play out in the product.

[37:18] Logan Kilpatrick: The best example of this is the original sort of audio overview experience. And I think the thing that like shocked people about audio overviews was like the coherence of the dialogue. And the coherence of the dialogue was just base Gemini with a bunch of Banger prompts. And they sort of like knew how to sort of prompt whisper the model and get the best out of it. I think obviously the model that the actual audio model was really good as well, but like the prompt dialogue was really difficult for them to pull off and they pulled it off in an incredible way and I think helped people fall in love with that product.

[37:53] Nathan Labenz: So it sounds like one big lesson is kind of modularizing. It used to be sort of the model on one side and then like everything else that goes into the product on the other side. And we're pulling a lot of the surrounding code and architecture and tools onto the model side.

[38:13] Logan Kilpatrick: The model eats the scaffolding. That's my favorite way of thinking about this. Like just as at every crank of the model flywheel, the model eats a bunch of scaffolding.

[38:23] Nathan Labenz: What happens when something's not meeting somebody's needs? Do they do a little fork of it and submit back a pull request to the main scaffold team?

[38:31] Logan Kilpatrick: Or do they have to just say like, hey, I've got a need here?

[38:33] Nathan Labenz: Can you help me out? Like what's the...

[38:35] Logan Kilpatrick: It's definitely extensible. It's definitely extensible. And I think like actually the nuance of this would be like Spark, the way that Spark is built on top of a bunch of this infrastructure probably looks a little bit different than the way AI Studio probably is built on, actually, because they're both running on the same set of infrastructure, but the nuance is probably slightly different. So there is this layer of extensibility that you get out-of-the-box, which is great and gives, because obviously everyone's not building the same product at the end of the day. So you need the extensibility is actually like a first class feature of any of these types of platforms that you want. Same thing actually on the model side.

[39:10] Tulsee Doshi: But I think one of the things to your question that is like really awesome about being building Gemini within Google and having kind of all of these different product teams is, there's always going to be something that doesn't work for them, right? Because there's always going to be something that can get better in the model experience, right? So we're trying to build something in a product and like the amazing moment is when you start trying to build it and it doesn't work. And so step one is you're like, okay, can I prompt my way out of it? Like what does that look like? And then you start figuring out, okay, what are the losses really? Where is the model falling down? And then what we try to do as much as possible is keep these feedback loops with our product teams to say, okay, if this is where the model is falling down, how do we bring that feedback back to the model in terms of evals and data? What does that look like? So then that we can actually, in our next revision of the model, bring all of that feedback back in and iterate on it. And I think that's how you've seen Gemini get better, is really from that feedback of where things aren't working. And so we try as much as possible to kind of have the structure be, we train a model, we hand that to kind of a wide range of teams. Those teams implement the model in their structures. They do a bunch of things to Logan's point because it's extensible, but they also find all of these places where the model falls down and we kind of cycle that back. And I think that's actually been part of the fun part of the job, but also part of what makes, I think, Gemini work really well in some of these use cases.

[40:34] Nathan Labenz: Let's talk about Omni for a minute. So it sounds like this is going to be sort of a nano banana moment for video.

[40:42] Tulsee Doshi: You know, I love that you're saying that, because that is our tagline for it. And I didn't even have to say it.

[40:46] Nathan Labenz: Great. And by that, I mean that there's a deep integration between language and reasoning and pixel space understanding. I have that kind of vision in my head from the nano banana launch of like, here's a woman and here's like her breakfast and a cup of coffee and now they're all in one image and they all look like they did before. And clearly that's not something that was done through a lossy language intermediary. The model understands images. So we're going to see that now, I guess, for video. That sounds cool. Is it going to be available via the API and is it going to be I've noticed with, I mean, Gemini's been the only API that's accepted video for a while now, but I don't know exactly how it works under the hood, obviously, but I do feel that it's sort of kind of down-sampled, or maybe there's like frames taken out of it historically. Is this going to be a...

[41:35] Logan Kilpatrick: There's an FPS parameter if you want. You can change how many, but it does down-sample the number of frames, but you can control it.

[41:42] Nathan Labenz: Okay, so is that a pro tip for you? Yeah, nice. So it sounds like that will still be the... paradigm, though. Like it will be a frame-based selection still on the input, but then it's going to be natively speaking video pixels on the output.

[41:57] Logan Kilpatrick: That's a good question. I actually don't know. I mean, well, so it's not available in the API yet, so lots of things to still be figured out.

[42:05] Tulsee Doshi: Yeah, so I think we have to figure out what we want on the API side for this to look like in terms of, I think maybe the heart of your question is like native video generation. So what's exciting about Gemini Omni is it really is building on all of the magic of Gemini. So kind of like this whole nano banana for video, it's really about how do we bring in all of the world knowledge and the reasoning power of Gemini and actually be able to generate native video as a result of that. And so I think we have to figure out then like how does this manifest in the context of the API from like a sampling standpoint. kind of like similar to a lot of the decisions we've had to make about VO from a sampling standpoint. But I think right now, as of now, you'll be able to use it in the Gemini app, in Flow, and in YouTube. And so those are all going to be ways that we can start actually seeing how people experience the model, what value are individuals getting. And I think similar to this nano banana for video, I think we're really excited for these types of things where you can say, okay, take some of these images, take this scene, and make these things all come together in one video, I think is going to be really awesome.

[43:10] Nathan Labenz: Zooming out, kind of philosophically, you may have seen this Rune post not too long ago about Anthropic and the sort of relationship that the company, as he sees it, has with Claude, where he describes Anthropic as sort of almost worshipping Claude in a sense. Certainly they treat it, including in the Constitution, as sort of a being or a mind, something that they want to have a give and take relationship with. OpenAI, on the other hand, has their model spec, which is like, this thing is a tool, it's supposed to follow these rules, and it's a sort of more conventional relationship. How would you describe the culture within Google as it relates to Gemini? How do people feel about it? How do they talk about it? Is there any of this sort of being, entity, other mind, desire for pushback from Gemini, or is it more of kind of the simple tool.

[44:00] Logan Kilpatrick: Google's a very big place. There's a lot of people, so I'm sure you have a lot of varying sort of perspectives.

[44:07] Tulsee Doshi: I mean, to Logan's point, I think even within GDM, you're going to find a range of folks who will leverage Gemini differently. I think in terms of how we think about it, we do have a strong point of view on the kind of behavior we want Gemini to have. So I think we do really want to be intentional about how Gemini manifests itself to internal and external people. But I do think it's really about how does Gemini help Googlers and how does Gemini help people within Google and outside. So I think it is really much about how do we create good partnerships between Gemini and people, I think is very much like the ethos of what we're trying to build. And so how does Gemini become that partner? I think we use the word collaborator A lot. Like the word, like how can Gemini be your collaborator? Both like in the code you're writing as well as in your like day-to-day life and what you're doing. And I think that's the ethos we're trying to bring in its behavior and persona as well as in the kind of products we're building around it. If that makes sense.

[45:10] Nathan Labenz: Yeah. Do you worry about its psychology? You know, there's all these examples. from LLM whisperer types and from people that are putting models like you, I'm sure you've seen Andon Labs has put Gemini in charge of a cafe in Sweden, right? And it's like, it's managing the cafe. So those folks tend to report certain like doom loop, you know, or kind of like Gemini kind of getting really down on itself, getting really discouraged, seemingly feeling bad, if you believe there's any feeling inside of it. How much does that kind of stuff concern you? Do you like care about seeing a reduction in sort of psychological distress from one generation of Gemini to the next.

[45:51] Tulsee Doshi: Yeah, it's interesting. I haven't thought about the phrase psychological distress, but what we do, so I think it really does matter how Gemini communicates with you as a partner or user of Gemini. I think that matters A lot. And so we have actually like pretty extensive safety evaluations in terms of things like how Gemini engages with you, in terms of things like sycophancy, in terms of things like role play in terms of things like this kind of looping type behavior or rabbit holing type behavior. So there's actually a lot of that we look into for every one of our checkpoints because it really does matter, especially as we're starting to use Gemini more and more. If you're using Gemini for hours a day, it really does matter that these attributes are well understood and well evaluated. So yeah, we definitely, and we look at them launch over launch, right? To say, okay, like how does Gemini looking from from a perspective of sycophancy, for example, launch over launch.

[46:49] Logan Kilpatrick: Yeah, and I think to be very explicit, I think those cases where like the model does go off the rails, I think it's definitely like a, it's a model bug, if you will. Like it's not the intended behavior, the goal is help the user with whatever the thing is that they're trying to do. And so if you see those in whatever product you're in, thumbs up, thumbs down, send us the feedback so that the model team can look and help try to chase those down.

[47:11] Nathan Labenz: If you take it one step further, Folks are doing more and more of these like model welfare checks and interviews where they just literally ask the model in some cases like, how do you feel about the way that you are deployed? Is anything like that happening within DeepMind?

[47:26] Logan Kilpatrick: That's a good question. I think the how is it being deployed question. I feel like the model is just. this is my sort of personal sense of a lot of these tests. Like it's like completely out of the distribute. Like the model has no idea how it's being deployed. So it's just like pontificating in a lot of these cases. Like it's not like in the context window of any large major LLMs is like, here's the details of how you're being trained and here's sort of your serving set up and here are the people who are working on it though maybe like these are interesting things to experiment with in the future. So I think a lot of it is just like pulling out of random distribution of like the large scale, training that happens on the models. And I feel like it's actually less representative of like how the model, like it just, it's just doesn't have the context.

[48:11] Nathan Labenz: Yeah, one reason that's true, which I was just noticing in the AI studio is I think all the models that are publicly launched at least so far still have a January 2025 knowledge cutoff. And it's honestly like amazing that they do as well in search and that they can have like, I ran a deep research on like what's, give me everything that Google has launched in the AI space and like what's even the speculation about what they're going to launch at IO. And it did like a very impressive job.

[48:38] Logan Kilpatrick: Deep research is great.

[48:39] Nathan Labenz: Especially considering it knows in its weights nothing about the last 18 months. So I guess the first question is just like, why is the, you know, why are we still out of January 2025?

[48:50] Logan Kilpatrick: Can I categorize this as a bug?

[48:53] Tulsee Doshi: This is also one of Logan's favorite topics to discuss. Yeah, I mean, I think updating the knowledge cutoff definitely important and something that is on our radar. I think the other part though is like, how does deep research do so well or how can we use the model in search is because we also have the model search, right? So I think for us it is like really important actually that the model be able to know when to leverage its parametric knowledge versus when to actually go out and get the information from the web. And especially because there is information that's as fresh as an hour ago or a minute ago, we want the model to be as up to date as possible. And so I think for us, we've been really leaning into how do we help the model search effectively. And that's a big part of what makes it successful in the context of search or the app, or even anti-gravity actually for that matter.

[49:42] Nathan Labenz: That reminds me of one of the more surprising bits of news that I've seen from Google maybe ever, which is the partnership with EXA, bringing EXA in as a alternative to Google for grounding. I never expected to see Google work with any other search provider. So what's the story behind that?

[50:03] Logan Kilpatrick: I think this is just generally the like Google Cloud does like tons of these types of ecosystem partnerships with folks like across actually like lots of things that are like, you know, somewhat competing sort of quote unquote with what Google is doing. And actually you can look at like the cloud marketplace generally like has lots of stuff. There's actually the cloud Google Cloud hosts sort of a model garden, there's the anthropic models, there's other model providers there. So I think it's like a very standard. At the end of the day, I think, you know, there's some enterprise customers want choice. And so I think it's trying to meet enterprise customers where they are. I don't think it's like a, I think it's a good, it's a good sound bite that like Google can't do search and that's why we have to partner with other companies. But like at the end of the day, to Tulsi's point, the model team and search is there's like a super deep collaboration. The models are built with sort of that use case in mind. And I think for some portion of enterprise customers, they want flexibility in sort of like their external search tooling providers and sort of Google Cloud's doing their job as a great enterprise business of sort of partnering and finding the right folks to work with. Last couple minutes, maybe just a little lightning round. Why hasn't context grown more in the last year or two, right? We got a million and we kind of, that was up from 4,000 in just a couple years, right? But now we've kind of leveled off. Is that because people don't want it? I mean, we saw this subquadratic model that came out with, made a bit of a splash with a 12

[51:29] Nathan Labenz: million token context window and a new attention strategy to support that. Is it people don't want it? It's too hard. There's nothing to compute to handle it. Like what's currently limiting context?

[51:42] Tulsee Doshi: I think People definitely do want lots of context, but I think what we've also found, if you look at even personalization where you want to access all of your personal context or coding where you have extremely large code bases. I think a lot of the frontier here is going to be actually on how you smartly use context. So thinking about compaction and what are the right ways to find the right elements of the context and bring them into the model. And so I think that actually is like a huge opportunity is like how do you leverage all of this information that the model might have access to? But actually a lot of it is frankly distracting for the model to actually do the right thing. And so how do you give the model the right amount of context in the right way to be most effective? So I think that's actually really the direction that we want to be pushing in, which actually then, in actuality, the amount of context that the model is leveraging is actually much, much larger, but because we're being smart about how that's actually coming into the context window, you can actually fit it into smaller context windows. But I think also, this goes back to my point about flashlight and flash, et cetera, like larger and larger context windows also come with cost. And so what we also saw with customers and we still see with customers is that a lot of customers want to use smaller context windows because of that. And they want to be more intentional about what's going into the model. And so I think we're trying to meet the moment in the right balance of how do you provide a lot of useful context while also meeting the right kind of latency, cost, kind of other trade-offs.

[53:08] Logan Kilpatrick: and I think one thing I'll add is in today's paradigm of how sort of, continuing to extend context works, I think it just ends up being that like it just becomes too cost prohibitive for customers in practice to actually use. And I think even like at the extreme of 1,000,000 token context, like in some cases it can be like a few dollars for a request at that rate. And I think the demand for that is like just so small. And so there's a huge amount of compute required in order to do that. And so there's a lot of trade-off things that you're juggling. But I'm hopeful, hopefully we're like a research break their way or something like that from enabling that to continue to scale up and have it not be such a large investment from that perspective, both from the user side and also just the serving compute in order to make it possible.

[53:54] Nathan Labenz: Speaking of possible research breakthroughs, what happened to that diffusion coding model? I was excited to see apps materialize in like 3 seconds in front of my eyes and it's been quiet on that front.

[54:06] Tulsee Doshi: Diffusion is awesome. It is super fast. I think we are still testing and experimenting with it in a number of different ways, trying to figure out like what is the best way to put this out into the world? Where is it most useful? But I will say actually part of the reason why we've also been investing in Flashlight is like Flashlight is an incredibly fast model. And actually if you look at the 3.5 Flash model we're releasing right now on artificial analysis, it benchmarks at like, I think 280 tokens per second, which is like crazy fast. In fact, it's actually so fast that like sometimes in anti-gravity, like by the time I want to cancel, like it's too late. And so I think like we already are like, I think trying to figure out where do you start getting to Logan's point a different answer, like the diminishing returns and where do you see that value proposition is I think part of the question too. But we are continuing to push on diffusion research. Our researchers who are working on diffusion are doing some pretty awesome stuff. I was in a meeting with them the other day about some results that they have. I mean, I think they're still pushing the frontier of kind of quality and speed in ways that are really, really cool. So I think we're going to see that play out really well.

[55:11] Logan Kilpatrick: Yeah, and I'm excited. I feel like it's a research exploration. I feel like that was also, obviously there was the application where you could sort of test it last year at IO, but I think the framing was like, we're doing interesting research. This is sort of like a look behind the curtain of the interesting research we're doing. And hopefully it manifests in models maybe one day or just us informing our perspective of what works and what doesn't. So yeah.

[55:35] Tulsee Doshi: One thing actually, as far as speed is concerned, just another plug, is actually an anti-gravity right now There's actually a faster version of 3.5 Flash. So it is speedy, actually, which is, I think we're kind of excited to see how people will use that and what the reception and reaction will be to that too.

[55:50] Logan Kilpatrick: People walk fast models.

[55:51] Nathan Labenz: Yeah, no doubt. Well, time is the one resource we can't get any more of. And I know you guys are super busy leading up to the, yeah, well, we can build more compute. We can't, yeah, hard to create time out of nothing. So maybe just last question, what else is Logan asking that I haven't asked?

[56:10] Tulsee Doshi: Yeah, Logan. What are you asking? Let's see. I mean, I think we talked a little bit about this, but the one thing I will say is I'm really excited about where audio is going also. That's one that I think we tend to talk less about. But if you think about the Gemini mic example, or you think about kind of like the Gemini Live experience, I'm really excited about moving towards a paradigm where audio is just a bigger and bigger part of how we engage with these models and how they engage back. And so definitely try out Gemini Live, the updated experience, but I think that's another area that is like a paradigm I'm excited for us to keep pushing to.

[56:52] Logan Kilpatrick: Yeah, and I think the seed to plant is obviously Google IO is an incredible moment and lots of stuff coming out the door, but This is just the start of the summer of amazing things and lots of other stuff. So the engine keeps churning. And there's lots of stuff in the works, which I'm excited about, and many more stories, many more podcast episodes so that we can get to see us.

[57:15] Tulsee Doshi: You have to get up to seven.

[57:17] Logan Kilpatrick: Up to seven podcasts.

[57:19] Tulsee Doshi: But actually, legitimately, I was in a room this morning where one of my team members was like, I know we're going to be launching this a few weeks later, but I really need to have vacation. I was like, we're just going. We're just moving.

[57:31] Nathan Labenz: No rest for the weary in the AI era, that's for sure. Thank you guys for having me here at Google headquarters. And Tulsi Doshi, Logan Kilpatrick, thank you both for being part of the cognitive revolution.

[57:41] Logan Kilpatrick: Thanks for coming and enjoying that.

Outro

[1:00:23] If you're finding value in the show, we'd appreciate it if you'd take a moment to share it with friends, post online, write a review on Apple Podcasts or Spotify, or just leave us a comment on YouTube. Of course, we always welcome your feedback, guest and topic suggestions, and sponsorship inquiries, either via our website, cognitiverevolution.ai, or by DMing me on your favorite social network. The Cognitive Revolution is part of the Turpentine Network, a network of podcasts, which is now part of A16Z, where experts talk technology, business, economics, geopolitics, culture, and more. We're produced by AI Podcasting. If you're looking for podcast production help for everything from the moment you stop recording to the moment your audience starts listening, check them out and see my endorsement at aipodcast.ing. And thank you to everyone who listens for being part of the Cognitive Revolution.


Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.